Ahl's Benchmark?

carlsson · December 19, 2013

For those who care, I rerun Ahl's benchmark on my Laser 2001 using A**2 (strange thing I didn't figure it out six years ago) and got the results of 70.6 seconds with accuracy of 1.04E-03 or 0.00104 if you like. That brings it down to a position just about the same as the CoCo 3, but still quite decent.

I haven't yet found the time to rerun it on either of my BBC micros, might do later on.

In the mean time, I ran the Byte magazine benchmark suite from 1977 with supplementary benchmark from Creative Computing 1983. This is different from Ahl's benchmark, but gives a wider picture. It puts the Laser 2001 among the faster 8-bit BASIC computers, slower than BBC Micro, Acorn Atom and Oric Telestrat (which has a compiling BASIC), but faster than VIC-20, original Apple ][, Dragon 32 and the rest. It was this suite that at the same time illustrated how poor results CreatiVision BASIC yields. As the two in practise have the same hardware except for the 2001 has 32K RAM accessible by the CPU, either the VDP access makes a lot of difference or that VTech in the time span of about 2.5 - 3 years between the CreatiVision and the Laser 2001 actually managed to implement or rip a better BASIC. JamesD previously pointed out that the Laser 2001 BASIC syntax to a great deal seem to remind of Applesoft BASIC, but I don't know for sure. Indeed VTech first released the Laser 3000 (Dick Smith CAT) and then later on the Laser 128 which was a true Apple clone, so it is not entirely unthinkable that all three of these have common elements in respective 6502 based BASIC.

Edited December 19, 2013 by carlsson

JamesD · December 20, 2013

For those who care, I rerun Ahl's benchmark on my Laser 2001 using A**2 (strange thing I didn't figure it out six years ago) and got the results of 70.6 seconds with accuracy of 1.04E-03 or 0.00104 if you like. That brings it down to a position just about the same as the CoCo 3, but still quite decent.

That sounds more like what I would expect from a Microsoft BASIC given the CPU speed and differences between the 6809 and 6502.

In the mean time, I ran the Byte magazine benchmark suite from 1977 with supplementary benchmark from Creative Computing 1983. This is different from Ahl's benchmark, but gives a wider picture. It puts the Laser 2001 among the faster 8-bit BASIC computers, slower than BBC Micro, Acorn Atom and Oric Telestrat (which has a compiling BASIC), but faster than VIC-20, original Apple ][, Dragon 32 and the rest. It was this suite that at the same time illustrated how poor results CreatiVision BASIC yields. As the two in practise have the same hardware except for the 2001 has 32K RAM accessible by the CPU, either the VDP access makes a lot of difference or that VTech in the time span of about 2.5 - 3 years between the CreatiVision and the Laser 2001 actually managed to implement or rip a better BASIC. JamesD previously pointed out that the Laser 2001 BASIC syntax to a great deal seem to remind of Applesoft BASIC, but I don't know for sure. Indeed VTech first released the Laser 3000 (Dick Smith CAT) and then later on the Laser 128 which was a true Apple clone, so it is not entirely unthinkable that all three of these have common elements in respective 6502 based BASIC.

I'm not sure where VTech got CreatiVision BASIC but it doesn't look like a Microsoft BASIC to me. VTech wasn't exactly shy about ripping off TRS-80 Level II BASIC so I wouldn't rule out it being Microsoft. If it is, it would probably have the Microsoft Easter Egg.

http://www.pagetable.com/?p=43

Perhaps it grew out of Tiny BASIC. Tiny BASIC is actually more primitive but CreatiVision's almost looks like a mix of old and new approaches to an interpreter.

The Laser 2001 brochure says Microsoft BASIC so it must be licensed. I'm pretty sure the Laser 128's is licensed but I'm not sure about the Laser 3000.

JamesD · March 12, 2017

There are a few things I've run across since this thread was last visited.

The Samsung SPC-1000 supposedly uses a BASIC from Hudson Soft.
Some of the Z80 BASICs of unknown origin may be based on some version of that.

I've been looking at the Tandy MC-10 ROM disassembly, and Microsoft used their 6800 BASIC with little change.
6803 optimizations should let the MC-10 overtake the Apple II, C64, etc... in Ahl's benchmark, and the space savings might have allowed the addition of the ELSE statement.

ELSE would have made MC-10 BASIC almost fully compatible with CoCo Color BASIC, and would give the MC-10 an advantage over every 6502 BASIC made prior to the Oric (1983),

I can only imagine how much better the 6809 version could have been. The CoCo benchmarked slower than the MC-10.

JamesD · March 12, 2017

...

I've been looking at the Tandy MC-10 ROM disassembly, and Microsoft used their 6800 BASIC with little change.

6803 optimizations should let the MC-10 overtake the Apple II, C64, etc... in Ahl's benchmark, and the space savings might have allowed the addition of the ELSE statement.

ELSE would have made MC-10 BASIC almost fully compatible with CoCo Color BASIC, and would give the MC-10 an advantage over every 6502 BASIC made prior to the Oric (1983),

I can only imagine how much better the 6809 version could have been. The CoCo benchmarked slower than the MC-10.

I started with the math library. It seems code dealing with line numbers does us D.

But the math library doesn't and there are a couple other places that might benefit from it's use.

777ismyname · March 27, 2017

An Atari 800XL using the Altirra emulator (version 2.81), with Altirra's OS, Altirra BASIC, Altirra's replacement FP math pack. It smokes other Atari BASICs. phaeron has done a phenomenal job with Altirra.

ACCURACY 5.317E-03
RANDOM 4.867369
7.95 SECONDS

Someone in a previous thread had wondered what a 14mhz Atari would run, so I used Altirra's System>CPU Options to change CPU processor/speeds.

3.58MHz 65C816

ACCURACY 5.317E-03
RANDOM 27.65767
3.08333333 SECONDS

14.28Mhz 65C816

ACCURACY 5.317E-03
RANDOM 8.685467
0.6833333333 SECONDS

21.48Mhz 65C816

ACCURACY 5.317E-03
RANDOM 10.311608
0.4666666666 SECONDS

The code exactly as listed at the Implementation of SieveAhl website (http://www.floodgap.com/retrotech/mac/ahl/imp.html) with the only addition being the generic Atari BASIC reading of the 18,19,and 20 timing registers for timing program runtime.

5 POKE 18,0:POKE 19,0:POKE 20,0
10 REM AHL'S SIMPLE BENCHMARK
20 FOR N=1 TO 100:A=N
30 FOR I=1 TO 10
40 A=SQR(A):R=R+RND(1)
50 NEXT I
60 FOR I=1 TO 10
70 A=A^2:R=R+RND(1)
80 NEXT I
90 S=S+A:NEXT N
100 PRINT "ACCURACY ";ABS(1010-S/5)
110 PRINT "RANDOM ";ABS(1000-R)
120 A=PEEK(18):B=PEEK(19):C=PEEK(20)
130 SC=C+256*B+65536*A
140 SC=SC/60:? SC;" SECONDS"

JamesD · March 27, 2017

An Atari 800XL using the Altirra emulator (version 2.81), with Altirra's OS, Altirra BASIC, Altirra's replacement FP math pack. It smokes other Atari BASICs. phaeron has done a phenomenal job with Altirra.

ACCURACY 5.317E-03

RANDOM 4.867369

7.95 SECONDS

...

That puts it up there with fast compiled BASICs.

You sure that's the correct CPU settings?

If so, it shows just how much time is wasted by interpreters searching for line numbers and variables.

+Larry · March 27, 2017

"There's something strange..."

Ahls Benchmark on my Altirra 2.7x version does run at ~8 seconds with the Fast Floating Point setting checked. But on real hardware, Altirra Basic with the fast math pack XL/XE rom runs at 79 seconds. Or apx 10X slower.

Maybe that "Fast Math Package" on Altirra uses the PC math calculations?

The fastest (Atari) non-compiled Basic for Ahl has been Turbo Basic XL which runs at ~ 41 seconds on real hardware. Basic XE just a tick slower at ~43 sec.

Basic XL which runs just a bit faster than Atari 8K Basic runs at 10.2 seconds on the emulator. If I uncheck the Altirra setting for the Fast Floating Point Math, it runs at 205 seconds. It also runs at 108 seconds on real hardware with a fast math XL/XE rom OS and 396 seconds on real hardware with the stock XL/XE rom.

So somehow that Fast Math Pack setting is doing a whole lot more than we see on real hardware. These all seem to be running at about 10X with the FMP set.

Maybe someone who is more experienced at using the emulator can explain this better.

-Larry

JamesD · March 27, 2017

"There's something strange..."

Ahls Benchmark on my Altirra 2.7x version does run at ~8 seconds with the Fast Floating Point setting checked. But on real hardware, Altirra Basic with the fast math pack XL/XE rom runs at 79 seconds. Or apx 10X slower.

Maybe that "Fast Math Package" on Altirra uses the PC math calculations?

...

79 seconds sounds much more realistic for the MHz, and I think you hit the nail on the head with the math package.

If Atari BASIC is anything even close to what I'm seeing in the MC-10 ROM, speeding up the math library can make a huge difference.

phaeron · March 28, 2017

Ahls Benchmark on my Altirra 2.7x version does run at ~8 seconds with the Fast Floating Point setting checked. But on real hardware, Altirra Basic with the fast math pack XL/XE rom runs at 79 seconds. Or apx 10X slower.

Maybe that "Fast Math Package" on Altirra uses the PC math calculations?

The Acceleration > Fast Math option in Altirra intercepts all math calls with native math replacements and effectively makes all math pack functions take zero time. The math pack in the built-in AltirraOS does not do this; it is pure 6502 code and will also run faster on the real hardware if you have a way to replace the OS ROM. It will never be as fast as the Fast Math option, though.

thorfdbg · March 28, 2017

Ahls Benchmark on my Altirra 2.7x version does run at ~8 seconds with the Fast Floating Point setting checked. But on real hardware, Altirra Basic with the fast math pack XL/XE rom runs at 79 seconds. Or apx 10X slower.

Maybe that "Fast Math Package" on Altirra uses the PC math calculations?

The fastest (Atari) non-compiled Basic for Ahl has been Turbo Basic XL which runs at ~ 41 seconds on real hardware. Basic XE just a tick slower at ~43 sec.

The main ingredient to this "miracle" is that the emulator replaces the math code with PC traps, so the math is performed on the PC instead in the math pack. That has already been explained, and nothing to scratch your head about.

To complete the picture, Basic++ runs the same code without a math pack patch at around 45 seconds, though the main contribution here is the smarter SQR() and the faster ^ (pow) algorithm which is quite a lot faster than the original algorithm. I wonder why Altirra basic doesn't use it.

You can also put this the other way around: The only reason why it is so slow on Atari basic is not really the relatively simple implementation of FOR/NEXT or the basic interpreter in general, it is rather that both ^ and SQR are dominating the whole timing. Or to put it even differently, the only thing this test measures is the performance of ^ and SQR.

phaeron · March 29, 2017

To complete the picture, Basic++ runs the same code without a math pack patch at around 45 seconds, though the main contribution here is the smarter SQR() and the faster ^ (pow) algorithm which is quite a lot faster than the original algorithm. I wonder why Altirra basic doesn't use it.

I have deliberately written Altirra BASIC without using source code from any other BASIC interpreters. Basic++ is a derivative of Atari BASIC, so I would not be able to pull from it in this regard. Also, many programs spend more time in control flow than in these particular functions. Running Ahl's Benchmark faster and/or more accurately is nice but I don't consider it a primary goal, since it is a synthetic benchmark.

thorfdbg · March 29, 2017

I have deliberately written Altirra BASIC without using source code from any other BASIC interpreters. Basic++ is a derivative of Atari BASIC, so I would not be able to pull from it in this regard. Also, many programs spend more time in control flow than in these particular functions. Running Ahl's Benchmark faster and/or more accurately is nice but I don't consider it a primary goal, since it is a synthetic benchmark.

I don't think anyone from Atari or SMI would be able to claim copyright on something they have not written, and you have certainly my permission on these functions.

JamesD · August 3, 2017

...

I've been looking at the Tandy MC-10 ROM disassembly, and Microsoft used their 6800 BASIC with little change.

6803 optimizations should let the MC-10 overtake the Apple II, C64, etc... in Ahl's benchmark, and the space savings might have allowed the addition of the ELSE statement.

ELSE would have made MC-10 BASIC almost fully compatible with CoCo Color BASIC, and would give the MC-10 an advantage over every 6502 BASIC made prior to the Oric (1983),

I can only imagine how much better the 6809 version could have been. The CoCo benchmarked slower than the MC-10.

Just a little followup to this comment.

I added ELSE and made some minor speed improvements to the MC-10 ROM for this year's Retro Challenge.

Not every optimization I worked on made it into the release though.

So now I'm adding the fast multiply that didn't make it in. It uses the 6803 MUL (multiply) instruction instead of the add-shift loop.

It still needs some work, but preliminary tests show Ahl's benchmark finishing in around 1 minute 7 seconds.

The Hitachi 6303 should be able to run it in about 46 seconds assuming around a 20% improvement due to the prefetch. :-o

If I had more ROM I could make it even faster.

With another 100-200 bytes of ROM, I could also include a fast SQR function and the 6803 time might drop to under 50 seconds.

Plus there are a couple other optimizations I haven't tried to squeeze in yet.

Not bad for 0.89 MHz.

zzip · August 3, 2017

I remember reading somewhere that Commodore 64 BASIC's integers are actually slower than its floating point, because the integers have to get converted back to FP any time they're used in calculations. Maybe Atari MS BASIC suffers the same problem?

(Being slower doesn't make them useless: integer variables still use less RAM than floating point variables)

yeah, but my recollection of basic was that it doesn't differentiate between integers and floats. Maybe some dialects do, but not the one's I've used. The easiest way for an interpreter to implement that would be to store every number as a float.

Edited August 3, 2017 by zzip

JamesD · August 3, 2017

yeah, but my recollection of basic was that it doesn't differentiate between integers and floats. Maybe some dialects do, but not the one's I've used. The easiest way for an interpreter to implement that would be to store every number as a float.

I think the only 6502 version of Microsoft BASIC that supports integer variables is the Atari version.

JamesD · August 3, 2017

Just a little followup to this comment.

I added ELSE and made some minor speed improvements to the MC-10 ROM for this year's Retro Challenge.

Not every optimization I worked on made it into the release though.

So now I'm adding the fast multiply that didn't make it in. It uses the 6803 MUL (multiply) instruction instead of the add-shift loop.

It still needs some work, but preliminary tests show Ahl's benchmark finishing in around 1 minute 7 seconds.

The Hitachi 6303 should be able to run it in about 46 seconds assuming around a 20% improvement due to the prefetch.

If I had more ROM I could make it even faster.

With another 100-200 bytes of ROM, I could also include a fast SQR function and the 6803 time might drop to under 50 seconds.

Plus there are a couple other optimizations I haven't tried to squeeze in yet.

Not bad for 0.89 MHz.

I looked at the scan of the article and that is an improvement of around 52 seconds.

It also jumps ahead of 34 machines in the article and into the first column of machines on the results page.

Every optimization and enhancement I've made was possible back when the machine was created.

I don't know whether to swear at Microsoft or the programmer.

The CoCo 1/2/3 might also benefit from this since the 6809 has a MUL instruction.

Faicuai · August 3, 2017

Latest update on this famous little thread (UPDATES):

(Implementation NOTES: ANTIC=OFF for maximizing 6502 CPU bandwidth, a=A*A)

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and ATARI BASIC (Rev.C) Interpreted:

Accuracy: 0.013649 (pretty steady)
Random: 11.306536 (varies all over the place)
Time (s): 46.3000
Time (s): 42.9166 (Inner For / Next Loops unrolled)

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and BASIC XE (v4.1p) Interpreted:

Accuracy: 0.013649 (pretty steady)
Random: 14.79776 (varies all over the place)
Time (s): 37.9666
Time (s): 35.5333 (Inner For / Next Loops unrolled)

=> (Altirra 2.90 w/ FP=OFF, MyDos, ALTIRRA ROM), and ALTIRRA BASIC Interpreted:

Accuracy: 0.000452 (WoW! BIG jump in precision !!!)
Random: 2.605347 (varies all over the place)
Time (s): 33.9833
Time (s): 32.7000 (Inner For / Next Loops unrolled)

=> (Altirra 2.90 w/ FP=OFF, MyDOS, XL ROM Rev.2 OEM), and TURBO BASIC (1.5):

Accuracy: 0.013649 (pretty steady)
Random: 2.10417 (varies all over the place)
Time (s): 26.68 (non-compiled)
Time (s): 25.50 (non-compiled, Inner For / Next Loops unrolled)
Time (s): 21.75 (compiled, Inner For / Next Loops unrolled)

=> (Altirra 2.90 w/ FP=OFF, XE ROM patched w/ optimized FP pack), and ALTIRRA BASIC:

Accuracy: 0.014842 (a bit lower precision)
Random: 0.7139 (varies all over the place)
Time (s): 19.7166
Time (s): 18.5800 (Inner For / Next Loops unrolled)

In short:

Compared to original 6'48s, (408s), it is clear how wasteful original ROMs+Atari Basic were, with respect to these types of tasks.
With above timings, no other 6502-based system touches it (not even with Atari Basic!) and it virtually matches non-compiled IBM/PC timings (with help of TBASIC optimizations).
No way to reach IBM/PC's compiled test (6 secs.)
No timing differences in actual HW (800/Incognito, 800XL/Ultimate) with same 800XE-FP_optimized ROM.
Overall resulting precision (on Atari Basic or Turbo Basic) is nothing to brag about, to be honest.
Have NOT tried Altirra Basic on real HW.
In all tests, Altirra's FP-calls intercept has been disabled.

Cheers!

JamesD · August 4, 2017

Well, after some fixes, the time is up to 68 seconds which is still 51 seconds faster than the original
The accuracy is also slightly reduced, but it's still better than the 6502 versions.
But now I introduced a couple other bugs, so It's still not ready. It shouldn't be slower for these though.

dmsc · August 6, 2017

Hi!

Latest update on this famous little thread:

=> (Altirra 2.90 w/ FP=OFF, SDX, XE ROM patched w/ optimized FP pack), and ATARI BASIC (Rev.C) Interpreted:

Accuracy: 0.013649 (pretty steady)

Random: 11.306536 (varies all over the place)

Time (s): 42.9166

=> (Altirra 2.90 w/ FP=OFF, MyDOS, XL ROM Rev.2 OEM), and TURBO BASIC (1.5):

Accuracy: 0.013649 (pretty steady)

Random: 2.10417 (varies all over the place)

Time (s): 25.50 (non-compiled)

Time (s): 21.75 (compiled)

=> (Altirra 2.90 w/ FP=OFF, XE ROM patched w/ optimized FP pack), and ALTIRRA BASIC:

Accuracy: 0.014842 (a bit lower precision)

Random: 0.7139 (varies all over the place)

Time (s): 18.58

In short:

Compared to original 6'48s, (408s), it is clear how wasteful original ROMs+Atari Basic were, with respect to these types of tasks.

With above timings, no other 6502-based system touches it (not even with Atari Basic!) and it virtually matches non-compiled IBM/PC timings (with help of TBASIC optimizations).

No way to reach IBM/PC's compiled test (6 secs.)

No timing differences in actual HW (800/Incognito, 800XL/Ultimate) with same 800XE-FP_optimized ROM.

Overall resulting precision (on Atari Basic or Turbo Basic) is nothing to brag about, to be honest.

Have NOT tried Altirra Basic on real HW.

In all tests, Altirra's FP-calls intercept has been disabled.

Cheers!

I don't get the same results as you, are you sure you are using a stock atari??

i get, 41.5(s) with TurboBasic XL, 117.6(s) with Altirra Basic under Altirra OS, 190.7(s) with Atari Basic under Altirra OS, 296.7(s) with Altirra Basic under Atari XL OS, 404.6(s) with Atari Basic under Atari XL OS.

All results on NTSC Atari 800XL.

JamesD · August 7, 2017

Here's another little program that could be used as a benchmark.
It was published in a magazine (BYTE?) in the early personal computer days.
It draws a Mandelbrot only using text. I believe the original version was designed for an 80 column screen.
Some BASICs will have issues with the length of line 110.

10 REM SIMPLE MANDELBROT GENERATOR USING TEXT
20 X0=-2: X1=0.5: Y0=-1: Y1=1: I1=20
30 X2=0.06: Y2=0.2: D$=" .-=#"
40 FOR Y=Y0 TO Y1 STEP Y2
50 FOR X=X0 TO X1 STEP X2
60 Z0=0: Z1=0
70 FOR I=1 TO 11
80 Z2=Z0*Z0-Z1*Z1: Z3=2*Z0*Z1
90 Z0=Z2+X: Z1=Z3+Y: IF Z0*Z0+Z1>4 THEN GOTO 110
100 NEXT I
110 IF Z0 AND Z1 > 0 THEN PA=SQR(Z0*Z0+Z1):A=(PA-(5*INT(PA/5)))+1:C$=MID$(D$,A,1):PRINT C$;
120 NEXT X: PRINT
130 NEXT Y

And here is a factorial generator

5 REM FACTORIAL GENERATOR
10 FOR Z=1 TO 100
20 FOR X=0 TO 33
30 GOSUB 80
40 PRINT Z;X;A
50 NEXT X
60 NEXT Z
70 END
80 A=1
90 IF X=0 THEN RETURN
100 FOR C=1 TO X
110 A=A*C
120 NEXT C
130 RETURN

On another note... the faster multiply in the new MC-10 ROM is actually just as accurate as the original.
One of my optimizations depended on memory order of variables and a variable got moved to the wrong floating point accumulator while trying to fix something else.

Faicuai · August 7, 2017

Hi!

I don't get the same results as you, are you sure you are using a stock atari??

i get, 41.5(s) with TurboBasic XL, 117.6(s) with Altirra Basic under Altirra OS, 190.7(s) with Atari Basic under Altirra OS, 296.7(s) with Altirra Basic under Atari XL OS, 404.6(s) with Atari Basic under Atari XL OS.

All results on NTSC Atari 800XL.

Atari Basic has almost NONE of the Integer / FP optimizations that are obviously present in many of the Basic implementations reported on the original article. It was made to fit into 8 KB of space, not to perform optimally or fast. And this also applies to Atari 800/800XL original ROMs (!!!)

In order to obtain the timings shown above, you will need:

Turn off ANTIC during execution (as it grabs 25% or more of CPU cycles from 6502, which could be devoted to Floating Point computations).
DO NOT run X=X^2. Instead, run x=x*x (neither Atari Basic nor ROM's FP are optimized or smart enough to compute this efficiently).
Unroll two inner For-Next loops (1 to 10), as Atari Basic does not seem to handle For-Next arguments as true integers. About 8% of increased efficiency, here, at the expense of more line-lookups in Atari basic (which are not that fast, anyway).
For results under Atari OS & Atari Basic, use attached Atari 800XL/XE optimized OS with high-performance FP routines. These runs beautifully on A800 / Incognito, too.

AtariOS-800XE-Rev03-FastMath.rom

If anyone wants to perform above adjustments for other (comparable) machines / 6502 / Basic combos, that's fine. At this point, I believe they will have a hard time catching the Atari, anyway.

Cheers!

Faicuai · August 8, 2017

Hi!

I don't get the same results as you, are you sure you are using a stock atari??

i get, 41.5(s) with TurboBasic XL, 117.6(s) with Altirra Basic under Altirra OS, 190.7(s) with Atari Basic under Altirra OS, 296.7(s) with Altirra Basic under Atari XL OS, 404.6(s) with Atari Basic under Atari XL OS.

All results on NTSC Atari 800XL.

Here, for you sir (click on images for maximum clarity):

1. [A800 + Incognito] + [800XL/XE-Rev3-FP Optimized] + [Atari Basic]:

2. [A800 + Incognito] + [800XL/XE-Rev3-FP Optimized] + [Turbo Basic, NON compiled]:

3. [A800 + Incognito] + [Colleen Mode + Newell FP roms] + [Atari Basic]:

Have fun!

JamesD · August 8, 2017

Atari Basic has almost NONE of the Integer / FP optimizations that are obviously present in many of the Basic implementations reported on the original article. It was made to fit into 8 KB of space, not to perform optimally or fast. And this also applies to Atari 800/800XL original ROMs (!!!)

In order to obtain the timings shown above, you will need:

Turn off ANTIC during execution (as it grabs 25% or more of CPU cycles from 6502, which could be devoted to Floating Point computations).

DO NOT run X=X^2. Instead, run x=x*x (neither Atari Basic nor ROM's FP are optimized or smart enough to compute this efficiently).

Unroll two inner For-Next loops (1 to 10), as Atari Basic does not seem to handle For-Next arguments as true integers. About 8% of increased efficiency, here, at the expense of more line-lookups in Atari basic (which are not that fast, anyway).

For results under Atari OS & Atari Basic, use attached Atari 800XL/XE optimized OS with high-performance FP routines. These runs beautifully on A800 / Incognito, too.

AtariOS-800XE-Rev03-FastMath.rom

If anyone wants to perform above adjustments for other (comparable) machines / 6502 / Basic combos, that's fine. At this point, I believe they will have a hard time catching the Atari, anyway.

Cheers!

And here we go...

1. If you turn off the display you can't see what a program is doing. That limits the results to specific tasks, not a general benchmark of the computer's speed. You are tuning the benchmark. Atari isn't the only machine that can do that. If that's the game you want to play, other people are going to do the same thing, but it's not a typical performance measurement.

2. I know we accepted A*A in place of A^2 earlier in the thread, but it's NOT the same thing. If you pass a variable like this A^B, you can't just use A*B a a substitute. It's a benchmark of that specific library function and you are bypassing it. Yes, in a specific application you can do that, but that's not the point here.

3. Unrolling loops is tuning the benchmark. Part of the reason for including the FOR NEXT loops is to see how well a BASIC performs FOR NEXT loops. You get an unrealistic picture of the speed of the machine.

Bottom line, your code isn't doing the same thing and you are defeating the purpose of the benchmark..

FWIW, I had forgotten about the A*A thing.

If I replace A^2 with A*A, the MC-10 finishes the benchmark in about 42 or 43 seconds using the new ROM.

That's without unrolling *any* loops, and that ROM doesn't have an optimized SQR() function.

Do you know what the algorithm to optimize the SQR() function depends on?

At least 4 floating point multiplies which the MC-10 is now very fast at.

Faicuai · August 8, 2017

And here we go...

1. If you turn off the display you can't see what a program is doing. That limits the results to specific tasks, not a general benchmark of the computer's speed. You are tuning the benchmark. Atari isn't the only machine that can do that. If that's the game you want to play, other people are going to do the same thing, but it's not a typical performance measurement.

2. I know we accepted A*A in place of A^2 earlier in the thread, but it's NOT the same thing. If you pass a variable like this A^B, you can't just use A*B a a substitute. It's a benchmark of that specific library function and you are bypassing it. Yes, in a specific application you can do that, but that's not the point here.

3. Unrolling loops is tuning the benchmark. Part of the reason for including the FOR NEXT loops is to see how well a BASIC performs FOR NEXT loops. You get an unrealistic picture of the speed of the machine.

Bottom line, your code isn't doing the same thing and you are defeating the purpose of the benchmark..

FWIW, I had forgotten about the A*A thing.

If I replace A^2 with A*A, the MC-10 finishes the benchmark in about 42 or 43 seconds using the new ROM.

That's without unrolling *any* loops, and that ROM doesn't have an optimized SQR() function.

Do you know what the algorithm to optimize the SQR() function depends on?

At least 4 floating point multiplies which the MC-10 is now very fast at.

There is nothing to show during compute-time. Zero. No point in wasting 25%-30% of 6502's output... because it is literally being halted by Antic. Moreover, stuff CAN BE SHOWN, even if Antic is "turned off". System Information 2.24 achieves exactly this.
Atari Basic (nor Atari OS) are not aware of trivial arithmetic and basic optimizations... that they would otherwise be with more memory to spare (instead of a "miserable" 8Kbytes span).
Unrolling MAY or MAY NOT help. Atari Basic, for instance, does not seem to operate For-Next loops with pure integer arithmetic. Atari Basic is VERY, VERY constrained.
The system rom I am using (800XL/XE-Rev3-FP) runs add / subs. operations about 2.3x faster and Mult / Div. operations 5.0-5.8x faster than original Atari FP routines. That's the key.

Anyone here is welcome to post resulting times (and screen shots) from similar optimizations. Going from 400+ secs. down to 42 sec (still on ATARI Basic !!!) shows how wasteful and potentially pointless this benchmark is on Atari.

Cheers!

JamesD · August 8, 2017

And here we go...

1. If you turn off the display you can't see what a program is doing. That limits the results to specific tasks, not a general benchmark of the computer's speed. You are tuning the benchmark. Atari isn't the only machine that can do that. If that's the game you want to play, other people are going to do the same thing, but it's not a typical performance measurement.

2. I know we accepted A*A in place of A^2 earlier in the thread, but it's NOT the same thing. If you pass a variable like this A^B, you can't just use A*B a a substitute. It's a benchmark of that specific library function and you are bypassing it. Yes, in a specific application you can do that, but that's not the point here.

3. Unrolling loops is tuning the benchmark. Part of the reason for including the FOR NEXT loops is to see how well a BASIC performs FOR NEXT loops. You get an unrealistic picture of the speed of the machine.

Bottom line, your code isn't doing the same thing and you are defeating the purpose of the benchmark..

FWIW, I had forgotten about the A*A thing.

If I replace A^2 with A*A, the MC-10 finishes the benchmark in about 42 or 43 seconds using the new ROM.

That's without unrolling *any* loops, and that ROM doesn't have an optimized SQR() function.

Do you know what the algorithm to optimize the SQR() function depends on?

At least 4 floating point multiplies which the MC-10 is now very fast at.

It's actually only 2 floating point multiplies on the version I'm trying.

But if it drops at least 15 seconds like going to A*A that puts it around 46 seconds without A*A and around 30 with it.

And there are faster versions of the power function I could implement, so under 40 seconds without any tweaks to the benchmark is a real possibility

But I'll have to dump something from the 8K ROM to do that.

I also just thought of a way to speed up the divide, but I should have just enough room for that. It won't impact Ahl #s since it doesn't test divide though.

Ahl's Benchmark?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members