luckybuck Posted April 30, 2021 Share Posted April 30, 2021 (edited) Hi together! First of all, we will give Charles W. Marslett a Zotta (10^24) thank you for all his work he has done and another one for giving us the source code of his work into PD. Charles, from all 5 continents from all Atari users: Thank you so much!!! After a long search and loop verfication with Charles, we now can offer you: FAST FLOATING POINT source code for the ATARI, Revision F The first publication was made in 1981, improved and adapted to more and more Atari computers over the years. With Charles's work it was possible for the first time officially to make reliable calculations! All this up to 3.5 times faster than the original Atari rom for the floating point routines from $D800 to $DFFF. Another great advantage: all addresses for the floating point routines are the same as in the original Atari one! With the now final version F, sorry to say, all Atari OSs need to be vaccinated... Luckily, this can be done in just one shot by replacing the specific OS rom. Please take into account, Charles did this in 1981, while: https://en.wikipedia.org/wiki/IEEE_754 is from 1985 on... This shows how far ahead of time Charles was and still is! For the gamers this could be a nice increase in calculation speed, like: https://en.wikipedia.org/wiki/Fast_inverse_square_root in the game Doom later. For serious calculations, this is a no miss under all circumstances. We further would like to thank Robert "Bob" Puff for translating the original AMAC source code from Charles to the MAC/65 and drac030 for finding the very last byte to be changed. A big thank you goes to the University of Michigan for hosting the file: faschips.arc Have fun and warm up the EPROM bruners... All the best. Edited April 30, 2021 by luckybuck 13 4 Quote Link to comment Share on other sites More sharing options...
+Allan Posted April 30, 2021 Share Posted April 30, 2021 Somebody want to make a short comparison video using an emulator and couple of examples? Quote Link to comment Share on other sites More sharing options...
scitari Posted April 30, 2021 Share Posted April 30, 2021 Thanks to the team for all the hard work on this! Quote Link to comment Share on other sites More sharing options...
ClausB Posted April 30, 2021 Share Posted April 30, 2021 Why does the Wiki say "The routines make use of the CORDIC algorithm"? They use polynomial approximations, not CORDIC. Quote Link to comment Share on other sites More sharing options...
TGB1718 Posted April 30, 2021 Share Posted April 30, 2021 Tried a quick test under Altirra, maybe it's me, but don't notice any difference. I ran a small BASIC prog timed it, went to DOS, ran the LDFAST program after compiling. Went back to BASIC, made sure the OS was in RAM by poking in a RAM location and peeking the new value. Ran the same test program and the time was exactly the same. When I have some time will try it on a stock 130XE Quote Link to comment Share on other sites More sharing options...
drac030 Posted April 30, 2021 Share Posted April 30, 2021 It is undoubtedly faster than the original. Two warnings, though: 1) in XL OS, after replacing the FP routines, one has to recompute and fix the checksum at $c000. 2) when this FP package is in ROM, Turbo-BASIC XL will not work correctly: its LOG(), CLOG() and power functions are referencing FP constants in the original FP package, and the Marslett's package has them in different places. 1 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted May 1, 2021 Share Posted May 1, 2021 (edited) 20 hours ago, luckybuck said: After a long search and loop verfication with Charles, we now can offer you: FAST FLOATING POINT source code for the ATARI, Revision F VERY nice!!! But.... This page is not working! Can anyone (please) compare Rev. F rom with this one? FPP-NEWELL-FastFP-1984.bin Thanks! UPDATE: in the mean time, you can try the following FP performance test, which generates a relative (speed & precision) index of ~1.00 in Atari Basic + OEM FP pack. Microsoft Basic clocks about 2.55 and Altirra Basic + Altirra FP clocks near 6.9 (!!), and FastBasic with Altirra Packs hits +7!: FPTEST34.BAS Edited May 1, 2021 by Faicuai Quote Link to comment Share on other sites More sharing options...
Tempest Posted May 1, 2021 Share Posted May 1, 2021 So what games might benefit from this? Quote Link to comment Share on other sites More sharing options...
drac030 Posted May 1, 2021 Share Posted May 1, 2021 Yes, it is identical (save 15 filler bytes which may be different). I already complained about it few days ago to @luckybuck, as this new rev. F is not really new, it just was going around as "rev. E" due to two lines of comment missing at the beginning. But apart of that, it is identical with the stuff which has been available, for example, on my website since about 2008. It is good however that a more complete source has finally emerged. 1 Quote Link to comment Share on other sites More sharing options...
luckybuck Posted May 1, 2021 Author Share Posted May 1, 2021 (edited) The Wiki seems to be down, have already asked the admin. Meanwhile please take the atr attached. FastChip-final.atr @Allan: yes, calculations, a video should be made by the gamers? @ClausB: This is in the pre IEEE 754 time from 1981. I just flow over the source code and saw constants used in tables, therefore, I assumed, I must admit, it is Cordic. Do you may have an example for polynomial approximation in floating point under the use of fixed constants way back in this time? – I will watch the V2 rocket movie and return to you. I did not forgot. But time is critical, main focus are on the 3 kings: CX402, CX03 and CX412, we have contact... Have searched for 'cordic' in the source code. No entry. But I found: 'POLYNOMIAL EVALUATION ROUTINE', so you are right and I am wrong. I deeply apologize to all. @drac030: to 2): TB used the original FP from Atari, but I am astonished about the different addresses. Anyway a new TB, version 2.x including SD is on our radar. We have the source code, if the Wiki is up again. @Faicuai: thanks, yes, the Wiki is down, please see the 1st sentence. Thank for the input. Yes, a unified FP test running on all languages with benchmarks would be the best option so far. From a good friend in the US, Joerg, we have for this: http://www.datamath.org/Story/LogarithmBug.htm and http://www.datamath.org/Forensics.htm as a 1st approach. @Tempest: good question, in general all thos of high calculations, e. g. Star Raiders the explosion, we just have to test. Trouble can be faced with those games, who are timed with the old FP routines... @drac030: Yes, that was what I am worry about, to really have the final version and green light from Charles. This is done now. With your finding, it is even more complete, so ready for the burners.. I will be back... Edited May 1, 2021 by luckybuck forgot something Quote Link to comment Share on other sites More sharing options...
luckybuck Posted May 1, 2021 Author Share Posted May 1, 2021 @ClausB: part 2: Do you know a 'POLYNOMIAL EVALUATION ROUTINE' by name using these COEFFICIENTS? ; COEFFICIENTS USED IN THE LOG POLYNOMIALS ; LOGPLY DB $C0,$08,$19,$08,$00,$45 ; -8.19080045 DB $40,$16,$96,$69,$81,$40; 16.96698140 DB $C0,$10,$07,$04,$06,$95 ;-10.07040695 ; LOG10E = *+4 ;RETURN INSTRUCTION DB $BF,$67,$35,$81,$60,$15;-0.6735816015 DB $40,$03,$16,$30,$34,$92 ; 3.16303492 DB $C0,2,$91,$56,$81,$44 ;-2.91568144 ; DB $3F,$86,$85,$88,$96,$38; 0.8685889638 LN10 DB $40,$2,$30,$25,$85,$9; 2.30258509 ; INVL10 DB $3F,$43,$42,$94,$48,$19 ; C10 DB $40,$10,$00,$00,$00,$00 ; ; POLYNOMIAL FOR SIN/COS FUNCTIONS (11 COEFFICIENTS) ; ORG AFP+$7AE PLYSIN DB $3E,$16,$05,$44,$49,$00 ;REF BY BASIC SIN/COS ROUTINES DB $BE,$95,$68,$38,$45,$00 DB $3F,$02,$68,$79,$94,$16 DB $BF,$04,$92,$78,$90,$80 DB $3F,$07,$03,$15,$20,$00 DB $BF,$08,$92,$29,$12,$44 DB $3F,$11,$08,$40,$09,$11 DB $BF,$14,$28,$31,$56,$04 DB $3F,$19,$99,$98,$77,$44 DB $BF,$33,$33,$33,$31,$13 NONE DB $3F,$99,$99,$99,$99,$99 ;ALMOST EQUAL TO 1.0 (USED FOR ROUNDOFF PROBLEM) ; ; SIN OF 45 DEG. ; SIN45 DB $3F,$78,$53,$98,$16,$34 ; Quote Link to comment Share on other sites More sharing options...
ClausB Posted May 1, 2021 Share Posted May 1, 2021 (edited) 12 hours ago, luckybuck said: Do you know a 'POLYNOMIAL EVALUATION ROUTINE' by name using these COEFFICIENTS? I do not. Here is the description of LOG and LOG10 in the Atari OS Manual page 116: Floating point logarithms (LOG & LOG10) Function: These routines take the natural or base 10 logarithms of a floating point number. Calling sequence: FRO = floating point number. JSR LOG [DECD] for natural logarithm or JSR LOG10 [DED1] for base 10 logarithm BCS negative number or overflow. FRO = floating point logarithm. FRl is altered. Algorithm: Both logarithms are first computed as base 10 logarithms using a 10 term polynomial approximation; the natural logarithm is computed by dividing the base 10 result by the constant LOG1O(e). The logarithm of a number Z is computed as follows: F * (10 ** Y) = Z where 1 <= F < 10 (normalization). L = LOG10(F) by 10 term polynomial approximation. LOG10(Z) = Y + L. LOG(Z) = L0G10(Z) / LOG10(e). Also, on page 117 is a description of the polynomial evaluation routine: https://archive.org/details/atari-phc-os-jan-1982/page/117/mode/1up The sin/cos routines are not described there as they are in the BASIC ROM. However the constant in your source code labeled SIN45 is not sin(45°), rather it is π/4, which is 45° in radians. BTW, which V2 rocket movie did you watch? Edited May 1, 2021 by ClausB Quote Link to comment Share on other sites More sharing options...
+CharlieChaplin Posted May 1, 2021 Share Posted May 1, 2021 Hmmm, he is from Germany, so maybe he is watching some german (Wernher von Braun) V2 rocket movie ? If I interpret things correctly, V2 was the same as A4 (Aggregat 4)... 1 Quote Link to comment Share on other sites More sharing options...
luckybuck Posted May 1, 2021 Author Share Posted May 1, 2021 (edited) @ClausB: Thanks for your reply, yes I know of, Nezgar did upload this last year. There are still papers not yet disclosed about the LOG problem. Besides this, 2 year before(!), Carol had the FP routines running in a marvelous way. Atari did not take them. She used them in Calculator in 1979, which was published by Atari in 1981, as far as I have in mind. https://atariwiki.org/wiki/Wiki.jsp?page=Atari Calculator Besides this, we have the source code chapter on the Wiki: https://atariwiki.org/wiki/Wiki.jsp?page=Articles#section-Articles-SourceCode please scroll down to OS and there point 5. Further: Atari_Basic_Reference_Manual-Product_Update-C061038_Rev._A-©_1982_Atari,_Inc.pdf ; please go to page 5 inside the pdf file, there, the first 2 topics. The PDF file is attached. From the above link, at point 3, we have published the source code of the Colleen OS, please see attached as ASM file. Inside we find: ' ; FLOATING POINT SUBROUTINES ; FPREC = 6 ;FLOATING PT PRECISION (# OF BYTES) ; IF CARRY USED THEN CARRY CLEAR => NO ERROR, CARR AFP = $D800 ;ASCII->FLOATING POINT (FP) ; INBUFF+CIX -> FR0, CIX, CARRY FASC = $D8E6 ;FP -> ASCII FR0-> LBUFF (INBUFF) IFP = $D9AA ;INTEGER -> FP ; 0-$FFFF (LSB,MSB) IN FR0,FR0+1->FR0 FPI = $D9D2 ;FP -> INTEGER FR0 -> FR0,FR0+1, CARRY FSUB = $DA60 ;FR0 <- FR0 - FR1 ,CARRY FADD = $DA66 ;FR0 <- FR0 + FR1 ,CARRY FMUL = $DADB ;FR0 <- FR0 * FR1 ,CARRY FDIV = $DB28 ;FR0 <- FR0 / FR1 ,CARRY FLD0R = $DD89 ;FLOATING LOAD REG0 FR0 <- (X,Y) FLD0P = $DD8D ; " " " FR0 <- (FLPTR) FLD1R = $DD98 ; " " REG1 FR1 <- (X,Y) FLD1P = $DD9C ; " " " FR1 <- (FLPTR) FSTOR = $DDA7 ;FLOATING STORE REG0 (X,Y) <- FR0 FSTOP = $DDAB ; " " " (FLPTR) <- FR0 FMOVE = $DDB6 ;FR1 <- FR0 PLYEVL = $DD40 ;FR0 <- P(Z) = SUM(I=N TO 0) (A(I)*Z**I) CAR ; INPUT: (X,Y) = A(N),A(N-1)...A(0) -> PLYARG ; ACC = # OF COEFFICIENTS = DEGREE+1 ; FR0 = Z EXP = $DDC0 ;FR0 <- E**FR0 = EXP10(FR0 * LOG10(E)) CARRY EXP10 = $DDCC ;FR0 <- 10**FR0 CARRY LOG = $DECD ;FR0 <- LN(FR0) = LOG10(FR0)/LOG10(E) CARRY LOG10 = $DED1 ;FR0 <- LOG10 (FR0) CARRY ; THE FOLLOWING ARE IN BASIC CARTRIDGE: SIN = $BDB1 ;FR0 <- SIN(FR0) DEGFLG=0 =>RADS, 6=>DEG. CA COS = $BD73 ;FR0 <- COS(FR0) CARRY ATAN = $BE43 ;FR0 <- ATAN(FR0) CARRY SQR = $BEB1 ;FR0 <- SQUAREROOT(FR0) CARRY ' therefore, the routines for SIN, COS, ATAN and SQR are in BASIC, but the rest remain in the OS, not in the BASIC.ROM, as you can see in the BASIC source code, which is publish on the Wiki, too. That was the trick in those times, where a 'normal' Basic took 10 K, to divide it into 2 x 4 K roms and 2 K where put in the OS. The 2 K can be used by other programs, of course. It is not my source code, all is from Charles. We just have green light to publish it. As written, I just took a short look at it and could discover many constants. For what they are used, I do not know yet, that takes a deeper investigation, for which I sadly have no time now. But later, I sure will do. I plan to do Calculator 2.0, if the preservation is all done. For this purpose I need the above... In the meantime, we have the IEEE and some smart routines used for the WP 34S calculator. ? Sadly, due to lack of time, I could not manage to watch: https://www.youtube.com/results?search_query=A4-V2+Rocket+in+detail-Turbopump here the 1st part with 1 h and 51 min. But will do in the future, promised. @CharlieChaplin: Yes, in Germany, V2 and A4 are well known, outside Germany, mostly V2, so for a better understanding, I leave A4 out. Wernher did name it: 'Aggregat 4' (English: aggregate), but Goebbels rename it into V2 for propaganda purposes. Besides the V1, there was also the V3, most people did not know of, until the tried relaunch in Iraq. There are rumors about a V4 to V6, but no hard evidence was officially introduced yet. Plans for a missle/carrier plane againt New York were real. Atari_Basic_Reference_Manual-Product_Update-C061038_Rev._A-©_1982_Atari,_Inc.pdf Atari_800_OS_Rev.B.asm Edited May 1, 2021 by luckybuck forgot something 2 Quote Link to comment Share on other sites More sharing options...
ClausB Posted May 1, 2021 Share Posted May 1, 2021 Thanks for the links. All good stuff! I have seen that V2/A4 video. Very detailed and informative. Enjoy! Quote Link to comment Share on other sites More sharing options...
luckybuck Posted May 1, 2021 Author Share Posted May 1, 2021 I have made my master in... The PhD thesis from Wernher is still today a PhD thesis! This man was so above his time... The documentation is sure of interest, but my focus is towards field propulsions, they are way smarter, if running... Quote Link to comment Share on other sites More sharing options...
thorfdbg Posted May 1, 2021 Share Posted May 1, 2021 The MathPack log10 algorithm is not so complicated, really. First of all, log10(a*10^x) = x+log10(a), so the decimal exponent can be removed. Then log10(x) with x normalized is approximated by log10(x) = p(((x-a)/(x+a))^2) where a = sqrt(10) and p is a suitable polynomial. If I recall, Atari just uses the Taylor approximation as polynomial, but that is definitely a bad choice. For Os++, I replaced it by the minimax polynomial, i.e. the polynomial that minimizes the maximal error. Atari also uses a 10th order polynomial, which is total overshoot. The 8th order minimax polynomial is better, and faster. The minimal polynomial is just that: (from my sources) ;;; The following is the minimax polynomial for the log approximation: ;;; 0.8685889625 + (0.2895298827 ;;; + (0.1737063251 + (0.1243413535 + (0.09348240142 + (0.09879885753 + (-0.004411453333 + 0.18 06407195 ;;; x) x) x) x) x) x) x ;;; ;;; It causes an approximation error that is as small as 4^-11. ;;; 2 1 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted May 1, 2021 Share Posted May 1, 2021 (edited) 21 hours ago, drac030 said: Yes, it is identical (save 15 filler bytes which may be different). I already complained about it few days ago to @luckybuck, as this new rev. F is not really new, it just was going around as "rev. E" due to two lines of comment missing at the beginning. But apart of that, it is identical with the stuff which has been available, for example, on my website since about 2008. It is good however that a more complete source has finally emerged. Correct. The only difference is the first 6-bytes (I finally got a chance to binary-compared them). Tested on FP-Index bench (FPTEST34.BAS), posted above, and got x2.90 [Atari Basic-C + Rec.F FP]. In comparison, [Atari Basic-C + Altirra FP] reaches x5.50 (being the latter a notch slower, but MUCH more precise, hence the clearly higher relative-index). Edited May 1, 2021 by Faicuai 1 Quote Link to comment Share on other sites More sharing options...
luckybuck Posted May 2, 2021 Author Share Posted May 2, 2021 Thank you sooo much Thor, I must deeply apologize, not yet finished your Basic++ and OS++ on the Wiki. It is on my list and I will do in the future. Thank you, your contribution is highly appreciated! ? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.