Jump to content
luckybuck

FAST FLOATING POINT, Revision F is found! :-)))

Recommended Posts

Posted (edited)

Hi together!

 

First of all, we will give Charles W. Marslett a Zotta (10^24) thank you for all his work he has done and another one for giving us the source code of his work into PD. Charles, from all 5 continents from all Atari users: Thank you so much!!!

1173114521_goteam(animated).gif.fd353a2961e0c5c75d809fed1b7be5d1.gif  1173114521_goteam(animated).gif.fd353a2961e0c5c75d809fed1b7be5d1.gif  1173114521_goteam(animated).gif.fd353a2961e0c5c75d809fed1b7be5d1.gif  1173114521_goteam(animated).gif.fd353a2961e0c5c75d809fed1b7be5d1.gif  1173114521_goteam(animated).gif.fd353a2961e0c5c75d809fed1b7be5d1.gif

657520066_bedanken(animated).gif.27d655c71327958da16dc69882cac7bf.gif  657520066_bedanken(animated).gif.27d655c71327958da16dc69882cac7bf.gif  657520066_bedanken(animated).gif.27d655c71327958da16dc69882cac7bf.gif  657520066_bedanken(animated).gif.27d655c71327958da16dc69882cac7bf.gif  657520066_bedanken(animated).gif.27d655c71327958da16dc69882cac7bf.gif

 

After a long search and loop verfication with Charles, we now can offer you:

 

FAST FLOATING POINT source code for the ATARI, Revision F

 

The first publication was made in 1981, improved and adapted to more and more Atari computers over the years. With Charles's work it was possible for the first time officially to make reliable calculations! All this up to 3.5 times faster than the original Atari rom for the floating point routines from $D800 to $DFFF. Another great advantage: all addresses for the floating point routines are the same as in the original Atari one! With the now final version F, sorry to say, all Atari OSs need to be vaccinated...

Luckily, this can be done in just one shot by replacing the specific OS rom.

 

Please take into account, Charles did this in 1981, while:

https://en.wikipedia.org/wiki/IEEE_754

is from 1985 on...

This shows how far ahead of time Charles was and still is!

 

For the gamers this could be a nice increase in calculation speed, like:

https://en.wikipedia.org/wiki/Fast_inverse_square_root

in the game Doom later.

 

For serious calculations, this is a no miss under all circumstances.

 

We further would like to thank Robert "Bob" Puff for translating the original AMAC source code from Charles to the MAC/65 and drac030 for finding the very last byte to be changed.

A big thank you goes to the University of Michigan for hosting the file: faschips.arc

 

Have fun and warm up the EPROM bruners...

 

All the best.

 

Edited by luckybuck
  • Like 13
  • Thanks 4

Share this post


Link to post
Share on other sites

Somebody want to make a short comparison video using an emulator and couple of examples?

Share this post


Link to post
Share on other sites

Tried a quick test under Altirra, maybe it's me, but don't notice any difference.

I ran a small BASIC prog timed it, went to DOS, ran the LDFAST program after compiling.

Went back to BASIC, made sure the OS was in RAM by poking in a RAM location and peeking the new value.

 

Ran the same test program and the time was exactly the same.

 

When I have some time will try it on a stock 130XE

 

Share this post


Link to post
Share on other sites

It is undoubtedly faster than the original. Two warnings, though:

 

1) in XL OS, after replacing the FP routines, one has to recompute and fix the checksum at $c000.

 

2) when this FP package is in ROM, Turbo-BASIC XL will not work correctly: its LOG(), CLOG() and power functions are referencing FP constants in the original FP package, and the Marslett's package has them in different places.

 

 

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
20 hours ago, luckybuck said:

After a long search and loop verfication with Charles, we now can offer you:

 

FAST FLOATING POINT source code for the ATARI, Revision F

 

VERY nice!!! But.... This page is not working!

 

Can anyone (please) compare Rev. F rom with this one? 

 

FPP-NEWELL-FastFP-1984.bin

 

Thanks!

 

UPDATE: in the mean time, you can try the following FP performance test, which generates a relative (speed & precision) index of ~1.00 in Atari Basic + OEM FP pack. Microsoft Basic clocks about 2.55 and Altirra Basic + Altirra FP clocks near 6.9 (!!), and FastBasic with Altirra Packs hits +7!:

 

FPTEST34.BAS

Edited by Faicuai

Share this post


Link to post
Share on other sites

So what games might benefit from this?

Share this post


Link to post
Share on other sites

Yes, it is identical (save 15 filler bytes which may be different). I already complained about it few days ago to @luckybuck, as this new rev. F is not really new, it just was going around as "rev. E" due to two lines of comment missing at the beginning. But apart of that, it is identical with the stuff which has been available, for example, on my website since about 2008.

 

It is good however that a more complete source has finally emerged.

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

The Wiki seems to be down, have already asked the admin. Meanwhile please take the atr attached.

FastChip-final.atr

@Allan: yes, calculations, a video should be made by the gamers?

@ClausB: This is in the pre IEEE 754 time from 1981. I just flow over the source code and saw constants used in tables, therefore, I assumed, I must admit, it is Cordic. Do you may have an example for polynomial approximation in floating point under the use of fixed constants way back in this time? – I will watch the V2 rocket movie and return to you. I did not forgot. But time is critical, main focus are on the 3 kings: CX402, CX03 and CX412, we have contact...

Have searched for 'cordic' in the source code. No entry. But I found: 'POLYNOMIAL EVALUATION ROUTINE', so you are right and I am wrong. I deeply apologize to all.

@drac030: to 2): TB used the original FP from Atari, but I am astonished about the different addresses. Anyway a new TB, version 2.x including SD is on our radar. We have the source code, if the Wiki is up again.

@Faicuai: thanks, yes, the Wiki is down, please see the 1st sentence. Thank for the input. Yes, a unified FP test running on all languages with benchmarks would be the best option so far. From a good friend in the US, Joerg, we have for this:

http://www.datamath.org/Story/LogarithmBug.htm

and

http://www.datamath.org/Forensics.htm

as a 1st approach.

@Tempest: good question, in general all thos of high calculations, e. g. Star Raiders the explosion, we just have to test. Trouble can be faced with those games, who are timed with the old FP routines...

@drac030: Yes, that was what I am worry about, to really have the final version and green light from Charles. This is done now. With your finding, it is even more complete, so ready for the burners.. ;-)

 

I will be back...

 

Edited by luckybuck
forgot something

Share this post


Link to post
Share on other sites

@ClausB: part 2:

 

Do you know a 'POLYNOMIAL EVALUATION ROUTINE' by name using these COEFFICIENTS?

 

;    COEFFICIENTS USED IN THE LOG POLYNOMIALS
;
LOGPLY    DB    $C0,$08,$19,$08,$00,$45    ; -8.19080045
    DB    $40,$16,$96,$69,$81,$40; 16.96698140
    DB    $C0,$10,$07,$04,$06,$95    ;-10.07040695
;
LOG10E    =    *+4    ;RETURN INSTRUCTION
    DB    $BF,$67,$35,$81,$60,$15;-0.6735816015
    DB    $40,$03,$16,$30,$34,$92    ; 3.16303492
    DB    $C0,2,$91,$56,$81,$44    ;-2.91568144
;
    DB    $3F,$86,$85,$88,$96,$38;  0.8685889638
LN10    DB    $40,$2,$30,$25,$85,$9;  2.30258509
;
INVL10    DB    $3F,$43,$42,$94,$48,$19
;
C10    DB    $40,$10,$00,$00,$00,$00
;
;    POLYNOMIAL FOR SIN/COS FUNCTIONS (11 COEFFICIENTS)
;
    ORG    AFP+$7AE
PLYSIN    DB    $3E,$16,$05,$44,$49,$00 ;REF BY BASIC SIN/COS ROUTINES
    DB    $BE,$95,$68,$38,$45,$00
    DB    $3F,$02,$68,$79,$94,$16
    DB    $BF,$04,$92,$78,$90,$80
    DB    $3F,$07,$03,$15,$20,$00
    DB    $BF,$08,$92,$29,$12,$44
    DB    $3F,$11,$08,$40,$09,$11
    DB    $BF,$14,$28,$31,$56,$04
    DB    $3F,$19,$99,$98,$77,$44
    DB    $BF,$33,$33,$33,$31,$13
NONE    DB    $3F,$99,$99,$99,$99,$99 ;ALMOST EQUAL TO 1.0 (USED FOR ROUNDOFF PROBLEM)
;
;    SIN OF 45 DEG.
;
SIN45    DB    $3F,$78,$53,$98,$16,$34
;

 

Share this post


Link to post
Share on other sites
Posted (edited)
12 hours ago, luckybuck said:

Do you know a 'POLYNOMIAL EVALUATION ROUTINE' by name using these COEFFICIENTS?

I do not.

 

Here is the description of LOG and LOG10 in the Atari OS Manual page 116:

 

Floating point logarithms (LOG & LOG10) 

Function: These routines take the natural or base 10 logarithms of a 
floating point number. 

Calling sequence: 

FRO = floating point number. 

JSR LOG [DECD] for natural logarithm 

or 

JSR LOG10 [DED1] for base 10 logarithm 

BCS negative number or overflow. 

FRO = floating point logarithm. 

FRl is altered. 

Algorithm: Both logarithms are first computed as base 10 logarithms 
using a 10 term polynomial approximation; the natural logarithm is 
computed by dividing the base 10 result by the constant LOG1O(e). 

The logarithm of a number Z is computed as follows: 

F * (10 ** Y) = Z where 1 <= F < 10 (normalization). 

L = LOG10(F) by 10 term polynomial approximation. 

LOG10(Z) = Y + L. 

LOG(Z) = L0G10(Z) / LOG10(e). 

 

Also, on page 117 is a description of the polynomial evaluation routine:

https://archive.org/details/atari-phc-os-jan-1982/page/117/mode/1up

 

The sin/cos routines are not described there as they are in the BASIC ROM. However the constant in your source code labeled SIN45 is not sin(45°), rather it is π/4, which is 45° in radians.

 

BTW, which V2 rocket movie did you watch?

Edited by ClausB

Share this post


Link to post
Share on other sites

Hmmm, he is from Germany, so maybe he is watching some german (Wernher von Braun) V2 rocket movie ? If I interpret things correctly, V2 was the same as A4 (Aggregat 4)...

 

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

@ClausB: Thanks for your reply, yes I know of, Nezgar did upload this last year. There are still papers not yet disclosed about the LOG problem. Besides this, 2 year before(!), Carol had the FP routines running in a marvelous way. Atari did not take them. She used them in Calculator in 1979, which was published by Atari in 1981, as far as I have in mind.

https://atariwiki.org/wiki/Wiki.jsp?page=Atari Calculator

Besides this, we have the source code chapter on the Wiki:

https://atariwiki.org/wiki/Wiki.jsp?page=Articles#section-Articles-SourceCode

please scroll down to OS and there point 5.

Further: Atari_Basic_Reference_Manual-Product_Update-C061038_Rev._A-©_1982_Atari,_Inc.pdf  ; please go to page 5 inside the pdf file, there, the first 2 topics. The PDF file is attached.

From the above link, at point 3, we have published the source code of the Colleen OS, please see attached as ASM file. Inside we find:

'

;       FLOATING POINT SUBROUTINES
;
FPREC   =       6           ;FLOATING PT PRECISION (# OF BYTES)
; IF CARRY USED THEN CARRY CLEAR => NO ERROR, CARR
AFP     =       $D800       ;ASCII->FLOATING POINT (FP)
;                               INBUFF+CIX -> FR0, CIX, CARRY
FASC    =       $D8E6       ;FP -> ASCII FR0-> LBUFF (INBUFF)
IFP     =       $D9AA       ;INTEGER -> FP
;                               0-$FFFF (LSB,MSB) IN FR0,FR0+1->FR0
FPI     =       $D9D2       ;FP -> INTEGER FR0 -> FR0,FR0+1, CARRY
FSUB    =       $DA60       ;FR0 <- FR0 - FR1 ,CARRY
FADD    =       $DA66       ;FR0 <- FR0 + FR1 ,CARRY
FMUL    =       $DADB       ;FR0 <- FR0 * FR1 ,CARRY
FDIV    =       $DB28       ;FR0 <- FR0 / FR1 ,CARRY
FLD0R   =       $DD89       ;FLOATING LOAD REG0   FR0  <- (X,Y)
FLD0P   =       $DD8D       ;   "      "    "     FR0  <- (FLPTR)
FLD1R   =       $DD98       ;   "      "   REG1   FR1  <- (X,Y)
FLD1P   =       $DD9C       ;   "      "    "     FR1  <- (FLPTR)
FSTOR   =       $DDA7       ;FLOATING STORE REG0 (X,Y) <- FR0
FSTOP   =       $DDAB       ;    "     "    " (FLPTR)  <- FR0
FMOVE   =       $DDB6       ;FR1 <- FR0
PLYEVL  =       $DD40       ;FR0 <- P(Z) = SUM(I=N TO 0) (A(I)*Z**I) CAR
;                           INPUT: (X,Y) = A(N),A(N-1)...A(0)  -> PLYARG
;                                  ACC   = # OF COEFFICIENTS = DEGREE+1
;                                  FR0   = Z
EXP     =       $DDC0       ;FR0 <- E**FR0 = EXP10(FR0 * LOG10(E)) CARRY
EXP10   =       $DDCC       ;FR0 <- 10**FR0 CARRY
LOG     =       $DECD       ;FR0 <- LN(FR0) = LOG10(FR0)/LOG10(E) CARRY
LOG10   =       $DED1       ;FR0 <- LOG10 (FR0) CARRY
; THE FOLLOWING ARE IN BASIC CARTRIDGE:
SIN     =       $BDB1       ;FR0 <- SIN(FR0) DEGFLG=0 =>RADS, 6=>DEG. CA
COS     =       $BD73       ;FR0 <- COS(FR0) CARRY
ATAN    =       $BE43       ;FR0 <- ATAN(FR0) CARRY
SQR     =       $BEB1       ;FR0 <- SQUAREROOT(FR0) CARRY

'

therefore, the routines for SIN, COS, ATAN and SQR are in BASIC, but the rest remain in the OS, not in the BASIC.ROM, as you can see in the BASIC source code, which is publish on the Wiki, too.

That was the trick in those times, where a 'normal' Basic took 10 K, to divide it into 2 x 4 K roms and 2 K where put in the OS. The 2 K can be used by other programs, of course.

 

It is not my source code, all is from Charles. We just have green light to publish it. As written, I just took a short look at it and could discover many constants. For what they are used, I do not know yet, that takes a deeper investigation, for which I sadly have no time now. But later, I sure will do. I plan to do Calculator 2.0, if the preservation is all done. For this purpose I need the above...

 

In the meantime, we have the IEEE and some smart routines used for the WP 34S calculator. 🙂

 

Sadly, due to lack of time, I could not manage to watch:

https://www.youtube.com/results?search_query=A4-V2+Rocket+in+detail-Turbopump

here the 1st part with 1 h and 51 min.

 

But will do in the future, promised.

 

@CharlieChaplin: Yes, in Germany, V2 and A4 are well known, outside Germany, mostly V2, so for a better understanding, I leave A4 out. Wernher did name it: 'Aggregat 4' (English: aggregate), but Goebbels rename it into V2 for propaganda purposes. Besides the V1, there was also the V3, most people did not know of, until the tried relaunch in Iraq. There are rumors about a V4 to V6, but no hard evidence was officially introduced yet. Plans for a missle/carrier plane againt New York were real.

Atari_Basic_Reference_Manual-Product_Update-C061038_Rev._A-©_1982_Atari,_Inc.pdf Atari_800_OS_Rev.B.asm

Edited by luckybuck
forgot something
  • Like 2

Share this post


Link to post
Share on other sites

Thanks for the links. All good stuff!

 

I have seen that V2/A4 video. Very detailed and informative. Enjoy!

Share this post


Link to post
Share on other sites

I have made my master in... The PhD thesis from Wernher is still today a PhD thesis! This man was so above his time...

 

The documentation is sure of interest, but my focus is towards field propulsions, they are way smarter, if running...

Share this post


Link to post
Share on other sites

The MathPack log10 algorithm is not so complicated, really.  First of all, log10(a*10^x) = x+log10(a), so  the decimal exponent can be removed.  Then log10(x) with x normalized is approximated by log10(x) = p(((x-a)/(x+a))^2) where a = sqrt(10) and p is a suitable polynomial. If I recall, Atari just uses the Taylor approximation  as polynomial, but that is definitely a bad choice.  For Os++, I replaced it by the minimax polynomial, i.e. the polynomial that minimizes the maximal error.

 

Atari also uses a 10th order polynomial, which is total overshoot. The 8th order minimax polynomial is better, and faster.

 

The minimal polynomial is just that: (from my sources)

 

;;; The following is the minimax polynomial for the log approximation:
;;; 0.8685889625 + (0.2895298827
;;; + (0.1737063251 + (0.1243413535 + (0.09348240142 + (0.09879885753 + (-0.004411453333 + 0.18
06407195
;;; x) x) x) x) x) x) x
;;;
;;; It causes an approximation error that is as small as 4^-11.
;;;
 

 

  • Like 2
  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)
21 hours ago, drac030 said:

Yes, it is identical (save 15 filler bytes which may be different). I already complained about it few days ago to @luckybuck, as this new rev. F is not really new, it just was going around as "rev. E" due to two lines of comment missing at the beginning. But apart of that, it is identical with the stuff which has been available, for example, on my website since about 2008.

 

It is good however that a more complete source has finally emerged.

Correct.

 

The only difference is the first 6-bytes (I finally got a chance to binary-compared them). 

 

Tested on FP-Index bench (FPTEST34.BAS), posted above, and got x2.90 [Atari Basic-C + Rec.F FP]. In comparison, [Atari Basic-C + Altirra FP] reaches x5.50 (being the latter a notch slower, but MUCH more precise, hence the clearly higher relative-index). 

 

 

Edited by Faicuai
  • Thanks 1

Share this post


Link to post
Share on other sites

Thank you sooo much Thor, I must deeply apologize, not yet finished your Basic++ and OS++ on the Wiki. It is on my list and I will do in the future. Thank you, your contribution is highly appreciated! 🙂

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...