Jump to content
IGNORED

Fast Math ROM


ClausB

Recommended Posts

I think it was called "Newell FastROM".

 

It didn't use binary maths. It was just written better than the Atari FP routines. Rumour has it that the guy who did the FP routines for Atari was inexperienced and just adapted some generic routines that were available at the time.

 

ed - The Omnimon OS also includes them. You should be able to rip the relevant 2K and create a modified normal OS. I think that the checksums would have to be adjusted though.

Edited by Rybags
Link to comment
Share on other sites

It used to be available at the Umich archive; if that doesn't work out, I know I have a copy somewhere - the problem si that my Atari stuff is mostly still in storage, so no guarantees as to when I can find it.

 

It's a drop-in replacement within the OS; what I did was take the XL OS, replace the FP routines, add John Harris' Hyper E:, and add a routine from Antic so CTRL-SHIFT-P would dump a GR 0 screen to P:, then burned it into a 27128 (losing the International charset in the process, but still having a few hundred free bytes to play around with). Again, once I dig out my disks I'll upload...

Link to comment
Share on other sites

It does seem to have the fast routines, although the emu gives a bad checksum error on the ROM.

 

To see if a given ROM has the FP routines, just use this program:

10 POKE 20,0:POKE 19,0:POKE 20,0
20 FOR A=1 TO 10000:NEXT A
30 T=PEEK(20):T1=PEEK(19)
40 ? (T1*256+T)/50; "seconds"

 

Change the "50" to "60" if you're on an NTSC machine. A fast ROM should give about 18.3 - a standard one about 21.5

Link to comment
Share on other sites

I remember there used to be a replacement ROM from a thrid party for the 400/800 Floating Point ROM. Anyone know the name? How did it work? Did it just replace the BCD math routines with binary ones? How much faster was it? Anyone have a copy now?

 

Yes, for the 400/800 there was the Newell Fastchip. It was just a replacement eprom for the math chip, as I recall. I had one for my 800. You currently have an 800, don't you? As others have indicated, this version as well as one by Charles Marslett (MyDos) was included in several later replacement OS.

 

Heavy math programs could be as much as 15-20% faster, and even (non-math) Basic programs were slightly faster. There is a product blurb here:

http://www.atarimagazines.com/v1n2/newproducts.html

 

-Larry

Edited by Larry
Link to comment
Share on other sites

As others have indicated, this version as well as one by Charles Marslett (MyDos) was included in several later replacement OS.

 

Installing the Charles Marslett FP routines in ROM makes, IIRC, Turbo BASIC XL 1.5 work incorrectly. To be precise, math functions like LOG, CLOG and raise to power start to fail. It seems then, that Turbo BASIC XL uses some ROM code :)

Link to comment
Share on other sites

I have looked at the code for the stock BCD multiply routine and it actually uses repeated additions of the multiplicand for each digit in the multiplier. It runs much slower when the multiplier has a lot of 9s than when it has smaller digits (like an old rotary phone)!

 

These old posts revealed a great idea from some C64 programmers (they've had a few). It uses a table of squares:

http://www.atariage.com/forums/index.php?s...25&start=25

 

I've worked on a 6502 BCD version and a Z80 version and found a way to cut the table size in half. I estimate that it could do a 10-digit BCD multiply about 5 times faster than stock. The trig functions and other higher maths use the stock multiplier, so they would speed up as well. But with the tables, it won't fit in the same ROM space. I'm considering writing a patch for the XL OS that uses some of the $CC00-$CFFF area. Is there any interest in my doing this?

Edited by ClausB
Link to comment
Share on other sites

I'm considering writing a patch for the XL OS that uses some of the $C000-$C3FF area. Is there any interest in my doing this?

 

I think that writing that as a set of routines an releasing the source would be more useful, actually. I don't see what could you possibly throw off the $C000-$CFFF (except the CHARSET2, perhaps) without loosing much more interesting functionality of the ROM (i.e. the relocatable loader, for example, and PBI routines).

Link to comment
Share on other sites

I'd be interested.

 

It could come in useful as a patch for RAM-based OS, and people with 32in1 OS could easily accomodate it.

 

Just clobber the $CC00-CFFF area - the second chset was a waste of ROM (I still stand by my opinion that they would have been 1000x better off using that and the 2K self-test ROM's area to incorporate a built in DOS).

Link to comment
Share on other sites

I still stand by my opinion that they would have been 1000x better off using that and the 2K self-test ROM's area to incorporate a built in DOS.

 

I think that a built-in DOS, probably (due to limited space) so primitive as DOS 2.5 or worse, would be a disaster and that's good that didn't do that. Now there's at least some real choice.

  • Like 1
Link to comment
Share on other sites

A better idea is to use the page Cxxx area (as it's unused on the 400/800 as well as the xl/xe) for extension routines for 65816/and the terium/terbium proccies...or just a built in mlm (like the apple 2 and cbm pet) so that you don't need qmeg or omni/ultimon etc

 

 

 

 

I'm considering writing a patch for the XL OS that uses some of the $C000-$C3FF area. Is there any interest in my doing this?

 

I think that writing that as a set of routines an releasing the source would be more useful, actually. I don't see what could you possibly throw off the $C000-$CFFF (except the CHARSET2, perhaps) without loosing much more interesting functionality of the ROM (i.e. the relocatable loader, for example, and PBI routines).

Link to comment
Share on other sites

A better idea is to use the page Cxxx area (as it's unused on the 400/800 as well as the xl/xe) for extension routines for 65816/and the terium/terbium proccies...or just a built in mlm (like the apple 2 and cbm pet) so that you don't need qmeg or omni/ultimon etc

 

 

 

 

I'm considering writing a patch for the XL OS that uses some of the $C000-$C3FF area. Is there any interest in my doing this?

 

I think that writing that as a set of routines an releasing the source would be more useful, actually. I don't see what could you possibly throw off the $C000-$CFFF (except the CHARSET2, perhaps) without loosing much more interesting functionality of the ROM (i.e. the relocatable loader, for example, and PBI routines).

 

 

These routines would only benefit machines with these new processors. I would vote for inclusion of Sweet16 (- http://atariwiki.strotmann.de/xwiki/bin/view/APG/Sweet16) which would fit nicely, would provide a virtual 16 bit machine, and would allow to shrink some ML programs considerably. Only that today no coder can be sure that Sweet16 is available in ROM so it needs to be linked to each ML program using it.

 

If I remember correctly Sweet16 is fitting in 2 Pages (512 Byte)

 

Carsten

Link to comment
Share on other sites

I think some people are missing the point here.

 

ClausB is offering an optimisation of an existing routine - it has absolutely nothing to do with 16-bit processors, or any other sort of upgrade.

 

In fact, if he can keep such a patch within a 1K size, it could be used via a RAM-based copy of the OS.

Link to comment
Share on other sites

I think some people are missing the point here.

 

ClausB is offering an optimisation of an existing routine - it has absolutely nothing to do with 16-bit processors, or any other sort of upgrade.

 

In fact, if he can keep such a patch within a 1K size, it could be used via a RAM-based copy of the OS.

That's right. It would patch the standard FPROM multiply routine. If I'm careful, it could also patch the improved FP packages.

 

I'm sure it would fit into 1K. The BCD table of squares would occupy 640 bytes and the code maybe 200-300. The half-table version would run about 10% slower but would save 320 bytes.

Link to comment
Share on other sites

I've coded two versions:

1. In-line version fits into 1K and takes 2200 cycles worst-case.

2. Looped version fits into 0.5K but takes 2900 cycles worst-case.

Both run faster when there are bytes of zero in the mantissas.

They are coded and assembled on a PC and not yet tested on an Atari.

Link to comment
Share on other sites

Well, it's debugged and tested on my 800XL. The worst bug turned out to be in the 6502 itself! Apparently the Z flag does not work correctly in BCD mode. I didn't know that (or I forgot it) until I traced the code and searched the web. A minor code change avoided that.

 

In testing the accuracy of trig functions, I found another dirty little secret. When a product overflows 5 bytes (10 BCD digits), my code rounds up if the discarded byte is 50 or more, as it should. Apparently, the Atari multiplier just discards without rounding, giving slightly worse average accuracy. However, the trig functions give slightly better accuracy using the Atari multiplier than when using mine. Could they have tweeked the trig functions to compensate for their multiplier?

 

Anyway, here are measured average multiplication times in milliseconds for numbers with so many significant digits:

 

Digits ___2___ 4___ 6___ 8___10

Atari ___3.1__4.6__6.1__7.7__9.2

Mine __0.98__1.3__1.7__2.2__2.6

 

So it's about 3.5 times as fast.

 

Here are executable .OBJ files in a zipped .ATR: ROS.OBJ copies the 800XL OS ROM to RAM and leaves the OS RAM switched on. FAFMUL.OBJ loads the fast multiplier into pages $CE and $CF and patches the OS. After loading both, use BASIC normally and give it a try. I'll post the code and benchmark programs soon.

FAFMUL.zip

Edited by ClausB
Link to comment
Share on other sites

Can try benchmarking it against Turbobasic and see if it is faster than that.

 

Did a quick test on all 3, doing a simple multiply with 2 decimal numbers, and repeated it 500 times. OS did it in about 3.7 seconds. Your routines are slightly faster at about 3.15 seconds. Turbobasic did it in 1.5 seconds, but turbobasic probably has other optimizations unrelated to the multiplying that got a faster result. For-Next loops were, line number processing, variable accessing, etc are probably handled differently than Atari Basic.

Edited by peteym5
Link to comment
Share on other sites

Can try benchmarking it against Turbobasic and see if it is faster than that.

 

Did a quick test on all 3, doing a simple multiply with 2 decimal numbers, and repeated it 500 times. OS did it in about 3.7 seconds. Your routines are slightly faster at about 3.15 seconds. Turbobasic did it in 1.5 seconds, but turbobasic probably has other optimizations unrelated to the multiplying that got a faster result. For-Next loops were, line number processing, variable accessing, etc are probably handled differently than Atari Basic.

Yes, my benchmark program subtracted out the execution time of the BASIC code, which, as you saw, can be much slower than the multiplies. The more multiplying a program does, the better the improvement should be. I wrote an astronomy program long ago which uses lots of trig and multiplication, so I'll try that out.

 

Does Turbobasic use the OS floating point package or its own?

Link to comment
Share on other sites

Just gave it a quick go.

This program runs a good deal faster (about 10.4 seconds vs 18.7) with the modified routine vs the standard one.

For comparison, I tried the Omniview OS, which supposedly has the Newell FastROM embedded in it. Time there was 12.64

 

10 POKE 20,0:POKE 19,0:POKE 20,0
20 FOR A=1 TO 1000
30 B=A*2.3456789:C=B*9.876
40 NEXT A
50 T=PEEK(20)+PEEK(19)*256
60 ? T/50;" Seconds"

 

Here's an OS image that can be used in the emulator, based on the "REV02" XL/XE one (note the emulator will give the checksum error msg)

 

 

OS2FMUL.zip

Edited by Rybags
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...