Jump to content
Larry

Ahl's Benchmark?

Recommended Posts

Some emulators are fairly rudimentary in their implementation and timing.

 

But, WinVICE and Atari800Win+ are cycle-exact, and should return results extremely close to the real thing, provided the machine they run on can maintain 100% speed.

It still gets an asterisk until it's run on the real thing.

Share this post


Link to post
Share on other sites
The difference in accuracy makes me nervous. They *should* be the same.

 

Dunno. That was the only difference, run on a real 64.

 

I've already got the stuff put away in the garage. Perhaps someone who uses the 64 emulator can replicate these.

 

-Larry

Share this post


Link to post
Share on other sites

Apple IIgs 2.8MHz, hand timed

Time 42.5

Accuracy 1.04141235E-03

Random 8.42399192

 

There are aftermarket accellerators clocked at up to 18MHz.

One of those should be able to run it in about 6.5 seconds

*without* using native 65c816 instructions.

<edit>

I'm not sure if this is from the same benchmark

but the standard gs had a result of 42.8...

so the other benchmarks should be pretty close.

(hand timing could have been off that much)

TransWarp 11.25 Mhz Time 8.4s

<edit> 5.4 estimated with A=A*A

http://home.swbell.net/rubywand/R026GSEMUS.htm

Edited by JamesD

Share this post


Link to post
Share on other sites

After looking at some results people are getting I think A*A seems to be consistently faster than A^2.

Using that method cut results on the CoCo 3. I'll see if someone can rerun the benchmarks for it to improve times.

I'll have to check out the Apples again.

 

<edit>

The CoCo3 benchmark improved by over 38%.

The IIgs benchmark inproved by 40% +-hand timing error.

 

 

I think it's safe to say that an FPGA based machine with a 25MHz CPU could run this in under 1 second on many machines.

Edited by JamesD

Share this post


Link to post
Share on other sites

The benchmark over on the C128 forum:

 

10 PRINT TIME$

20 A=2.71826

30 B=3.14159

40 C=1

50 FOR I=1 TO 5000

60 C=C*A

70 C=C*B

80 C=C/A

90 C=C/B

100 NEXT I

110 PRINT TIME$

120 END

 

 

Then they added these lines for a larger benchmark:

25 A$="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

35 B$="1234567890"

65 C$=LEFT$(A$,INT(A))

75 C$=LEFT$(B$,INT(B))

85 C$=RIGHT$(A$,INT(A))

95 C$=RIGHT$(B$,INT(B))

 

 

 

To give you an idea of how fast the SuperCPU 65c816 upgrade is:

PAL SuperCPU128 128 mode: 16 or 17

PAL SuperCPU128 64 mode: 10

 

CoCo 3 Emulator Fast clock: 141.6

NTSC C128 Fast: 220

Share this post


Link to post
Share on other sites

For systems where A=A*A and A=A^2 end up resulting in large differences it's because A=A*A is more accurate than the other method. If it appears inaccurate it's only because SQR() is equally inaccurate to A=A^2.

 

Now, even though A=A*A is significantly faster, it's use doesn't test efficiency of math library functions and kinda defeats the purpose of the original benchmark. If you are using much larger exponents the ^ can be faster btw.

 

If this were something other than a benchmark you could unroll the loop and the number of multiplies would be reduced to a fraction of what the loop does. The calculations could actually be executed in under a second.

Share this post


Link to post
Share on other sites
Anyone got a BBC Model B?

I own the following Acorn computers: Electron, BBC Model B, BBC Master 128, BBC Master Compact. I can benchmark them all tonight (or at least before Christmas).

 

I also own the very rare computers vTech Laser 2001 (2 MHz 6502, compatible with creatiVision) and COMX-35 (RCA 1802, supposedly one of the slowest Basics ever). I could benchmark these as well. In particular the COMX would be interesting to benchmark compared to ZX-81 in SLOW mode, TI-99/4A, Atari 400/800 etc.

 

Furthermore, I have a Philips VG-8235 MSX2 to benchmark. Commodore (VIC-20, C64, C128) and Atari (800XL, 130XE) computers are already catered for, it seems.

 

While we're discussing benchmarks, do you remember that Byte Magazine already in 1977 (or if it was 1979?) published a suite of eight simple benchmarks? I think David Ahl's benchmark was published as an extention to these eight benchmarks. I think many years ago I posted on the newsgroup comp.sys.oric about these benchmarks, and it might be interesting to run them all for a whole set of comparisons. Some Basics are lightning fast when they run an empty FOR-NEXT loop, but get dog slow as soon as you fill the loop with some content like variable assignments, IF statements etc.

Edited by carlsson

Share this post


Link to post
Share on other sites
* Amstrad CPC 6128 Plus Locomotive BASIC 1.1

 

Accuracy 3.50475E-05

Random 3.07867885

Time 38.9666667 sec

 

 

* benchmarked on WinAPE

 

I performed this benchmark on WinAPE, after adding the following lines:

 

5 t=TIME

 

...

 

120 PRINT "Time: ";(TIME - t) / 300

 

WinAPE is cycle accurate, and it's using the internal TIME function. The TIME is incremented every 64 * 52 = 3328 microseconds, so the calculation for display should actually be (TIME - t) * 3328 / 1E6 = 38.904 sec. I will run this on my real Amstrad Plus today.

 

I can also run this benchmark on the BBC Model B. It could be at least twice as fast as the C64 since it's 6502 is clocked at 2MHz, and it doesn't lose any CPU states for video access like the C64.

Share this post


Link to post
Share on other sites

Acorn Electron (0.6 - 2 MHz variable rate 6502)

Accuracy: 1.28746033E-5

Random: 5.17578411

Time: 28.5 seconds

 

Acorn BBC Model B (2 MHz 6502, test rerun twice)

Accuracy: 1.28746033E-5

Random: 10.0480092, 0.309859753

Time: 21 seconds, 20 seconds

 

Acorn BBC Master Compact (2 MHz R65C102, dunno if it matters)

Accuracy: 2.21729279E-5

Random: 28.1809998

Time: 7.5 seconds - yes, I double checked this benchmark result!

 

vTech Laser 2001 (2 MHz 6502, I believe)

Accuracy: 3.2722295E-04

Random: 11.8781874

Time: 45 seconds

 

COMX-35 (2 MHz RCA CDP-1802, like it matters)

Accuracy: 0.32489

Random: 1.26404

Time: approx 7.5 minutes!!! (I didn't measure on the exact seconds)

 

All times were hand measured using a wrist watch. I was unable to measure my BBC Master 128, as it has a partly malfunctioning keyboard (unable to enter the equals sign, among others). The Laser 2001 doesn't seem to have a ^ function, so I replaced it with A*A.

 

In any case, the COMX doesn't seem any slower than the original Atari Basic, which surprises me a bit. Of course, the COMX-35 was introduced in late 1983, by which Atari computers had moved on to the XL series and should no longer suffer from a dog-slow Basic. This is a candidate for another series of tests, to see which one really has the slowest Basic and should be poked fun for this.

Edited by carlsson

Share this post


Link to post
Share on other sites

Adding to an old thread, since it came up in the TI group... :) Testing against the TI using Classic99 emulator (which is close but not 100%). Timed with clock device, time ends before results are printed. Porting to TI Extended BASIC required using '::' as the multiple statement operator and removing the argument from RND. Porting to TI BASIC further required breaking up the multiple statement lines, as multiple statements are not supported there.

 

Texas Instruments 99/4A * (3MHZ TMS9900)

 

TI BASIC

Accuracy: 0.00000011

Random: 1.928494715

Time: 4m31s

 

TI Extended BASIC (with 32k memory)

Accuracy: 0.00000011

Random: 6.27999115

Time: 3m55s

 

The random factor varies a fair bit depending on the random seed. Both of those values are on the low end.

Share this post


Link to post
Share on other sites

For reference: The following is (still) with the Atari 8K basic, but with the (improved) Mathpack that comes with Os++:

 

Accuracy = 1.18E-3 (that is, by a factor of ten better, note that "accuracy" should spell "error")

Random = 4.985358 (on this run)

Time = 230secs (measured on the machine, not with a stop-watch, i.e. by counting cycles which should be precise)

 

Thus, a only tiny little bit more careful implementation is by one magnitude more accurate, and by a factor of 2 here faster. Unfortunately, there is not enough ROM space to unroll the loop as TBXL did so there is no chance to improve this much further.

 

And, finally, if the math is all done on the host (i.e. in an emulation), we get:

 

Accuracy: 0.014125

Random: 3.94

Time: 17.36 (also, machine time).

 

The accuracy is worse since numbers are converted back and forth from the Atari BCD to the machine native IEEE format, but the execution speed is of course un-beatable. But that also means that only 17 seconds of the overall native >400 seconds are due to the poor loop management of Atari Basic.

 

If you ask me about my best bet what improved this is that: a) multiplication is a bit faster, but b) log and exp, as used by Atari basic for the power operator, are quite a bit faster due to a smarter and more precise choice of the approximating polynomials.

 

(That said, Atari Basic rev.B and C. still have a stupid bug that wants me believe that 4^4 = 257. But that holds only for the Os++ mathpack, and is due to the outright stupid rounding mechanism of the BASIC and not due to the math itself. Basic A gives an unrounded result that is very close to the right one).

Share this post


Link to post
Share on other sites

Out of curiosity I tested this on a HP-35s calc and got 2:45 with 2^10-8 'accuracy' and 12,0067 'randomness'. I can't compare with contemporary calculators as my HP-41 doesn't have a built-in random number function and my 15C (which has one) is a modern day clone much faster than the original. Like Atari BASIC they use BCD (apparently with more significant digits) but I suppose the programs are closer to assembler than to BASIC.

Share this post


Link to post
Share on other sites

Out of curiosity I tested this on a HP-35s calc and got 2:45 with 2^10-8 'accuracy' and 12,0067 'randomness'. I can't compare with contemporary calculators as my HP-41 doesn't have a built-in random number function and my 15C (which has one) is a modern day clone much faster than the original. Like Atari BASIC they use BCD (apparently with more significant digits) but I suppose the programs are closer to assembler than to BASIC.

 

You beat me to the punch... I was wondereing how my 48GX (+Metakerne/Erable/Alg48) would perform. I will run this one today (or tomorrow), and check execution timings and precision.

 

I am preparing a special ROM-build for my Incognito (which will correct the FP-code of the included Atari 10K roms) and also compare with modified fast-FP rom for XL, and also with 48GX results...

  • Like 1

Share this post


Link to post
Share on other sites

I am preparing a special ROM-build for my Incognito (which will correct the FP-code of the included Atari 10K roms) and also compare with modified fast-FP rom for XL, and also with 48GX results...

Hope you'll share that incognito ROM, also waiting for a board. (And if you PM your 48 code, I'd like to try it on a 48G and 49G+, I still didn't grasp enough of RPL to do it myself. Maybe I'll try in the hpmuseum forum for someone to run it on a HP-15 as that was the calc I used in senior high during my "Atari prime".)

  • Like 1

Share this post


Link to post
Share on other sites

Hope you'll share that incognito ROM, also waiting for a board. (And if you PM your 48 code, I'd like to try it on a 48G and 49G+, I still didn't grasp enough of RPL to do it myself. Maybe I'll try in the hpmuseum forum for someone to run it on a HP-15 as that was the calc I used in senior high during my "Atari prime".)

 

Here you go:

 

Even though this is code pertaining to a different platform, I will post it here (anyway) because it certainly serves us as a solid reference for evaluating speed and accuracy of Atari's own results:

 

 

HP 48GX Results Summary (code samples shown further below):

  1. HW Config:
    • 128K MAIN ram, 128K RAM Card Slot 1, 1MB RAM Card Slot 2
    • HP SATURN 4bits/20bits/64bits CPU, running @ 3.9 Mhz on 3x AAA 850mah Eneloop batts (!!!)

[*]SW Config: Erable 3.2, ALG48, Java, 106K free MAIN ram.

 

[*]Final Results:

  1. Test Version #1 (UserRPL code interpreter): 14.95 secs, Delta: 0.00000002 (2x10^-8, 12 digits), Random: 23.52
  2. Test Version #2 (UserRPL code interpreter): 21.39 secs, Delta: 0.00000002 (2x10^-8, 12 digits), Random: 16.35.
  3. Test Version #3 (UserRPL code interpreter): 44.59 secs, Delta: 0.00000000 ("infinity" precision), Random: 3.60.
  4. NOTES: "Random" sum varies from as little as 1.3 to 30+. SCREEN refresh turned-off, for full CPU-processing power.

Here is the code for each Test Version:

 

Test Version #1 (unrolled inner-loops & optimized for on-Stack/RPL performance, sensibly faster than local variables, timed with Erable's TEVAL):

<<
0 0 1 100 FOR n n

SQRT SQRT SQRT SQRT SQRT SQRT SQRT SQRT SQRT SQRT

SQ SQ SQ SQ SQ SQ SQ SQ SQ SQ + SWAP

RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND

RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND

+ + + + + + + + + + + + + + + + + + + + SWAP

NEXT

5 / 1010 - ABS SWAP 1000 - ABS
>>

 

Test Version #2 (run-of-the-mill FOR-NEXT inner loops, but control variables "S" and "R" still tallied on-STACK, timed with Erable's TEVAL):

<<
0 0 1 100 FOR n n

1 10 FOR i SQRT SWAP RAND + SWAP NEXT

1 10 FOR i SQ SWAP RAND + SWAP NEXT

3 ROLL + SWAP NEXT

1000 - ABS SWAP 5 / 1010 - ABS
>>

 

Test Version #3 (On-Stack Algebraic solve of "10th Power of 10th Root of n", unrolled inner-loops, 12-digits RAND sum, requires ALG48 "ASIM" or similar):

<<
0 0 1 100 FOR n

'XROOT(10,n)^10' ASIM EVAL + SWAP

RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND

RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND RAND

+ + + + + + + + + + + + + + + + + + + + SWAP

NEXT

5 / 1010 - ABS SWAP 1000 - ABS
>>

 

As for Incognito special ROM-build, I will certainly share it. Please, keep in mind that you will need a RELIABLE (notice the word) Flash/PLCC burner, so you do not get temporarily stuck (as me 8-). It will be finished by Wed/Thursday, this week. All ROMs (XL and legacy) will be updated from "binary-correct" versions of each, and both RevA and RevB 10K-ROMs will be updated with Newell's FP code and Fast-FP XL-rom will also be included (latest version of SDX will be there, too). Hopefully, It will be a nuts-and-bolts package.

 

Enjoy!

  • Like 1

Share this post


Link to post
Share on other sites

As for Incognito special ROM-build, I will certainly share it. Please, keep in mind that you will need a RELIABLE (notice the word) Flash/PLCC burner, so you do not get temporarily stuck (as me 8-).

 

It won't be long before the Incognito Flasher is complete. I estimate in the next week that it will be done :)

Just have a couple of kinks to work out, then I'll ask the beta team to test it before a public release...

Share this post


Link to post
Share on other sites

Probably 70 seconds on Atari BASIC.

 

Something like 6.25 seconds on TBXL.

 

Bob

 

 

 

I wonder how Ahl might run at 14 MHz? ;)

 

-Larry

Share this post


Link to post
Share on other sites

Adding to an old thread, since it came up in the TI group... :) Testing against the TI using Classic99 emulator (which is close but not 100%). Timed with clock device, time ends before results are printed. Porting to TI Extended BASIC required using '::' as the multiple statement operator and removing the argument from RND. Porting to TI BASIC further required breaking up the multiple statement lines, as multiple statements are not supported there.

 

Texas Instruments 99/4A * (3MHZ TMS9900)

 

TI BASIC

Accuracy: 0.00000011

Random: 1.928494715

Time: 4m31s

 

TI Extended BASIC (with 32k memory)

Accuracy: 0.00000011

Random: 6.27999115

Time: 3m55s

 

The random factor varies a fair bit depending on the random seed. Both of those values are on the low end.

Well, the TI-99/4 beats the Sinclair Spectrum and standard Atari BASIC,

It's even close to the speed of Atari BASIC using the improved math pack.

However, it's not even 1/3 the speed of the faster machines.

 

 

I never did run the benchmark on the JR-200 or NEC TREK and my Thomson TO-8 needs repairs before I could test it.

I would expect the JR-200 to do well. The TREK would probably do ok in text mode but in hi-res graphics modes it would probably slow down.

FWIW, in the benchmarks I ran on the Tandy MC-10 it was faster than the C64 but I didn't run this benchmark.

Share this post


Link to post
Share on other sites

For reference: The following is (still) with the Atari 8K basic, but with the (improved) Mathpack that comes with Os++:

 

Accuracy = 1.18E-3 (that is, by a factor of ten better, note that "accuracy" should spell "error")

Random = 4.985358 (on this run)

Time = 230secs (measured on the machine, not with a stop-watch, i.e. by counting cycles which should be precise)

 

Thus, a only tiny little bit more careful implementation is by one magnitude more accurate, and by a factor of 2 here faster. Unfortunately, there is not enough ROM space to unroll the loop as TBXL did so there is no chance to improve this much further.

I wonder what that would do for BASIC XL. 220 secs or less I would guess. (I have a BASIC XL cart).

Share this post


Link to post
Share on other sites

I hand timed an MC-10 emulator. The emulator clearly isn't cycle accurate or my timing is off. Probably both but it seems about right vs the C64 times given other benchmarks I've seen.

Time: 120-123 secs (timed multiple times)

Accuracy: 5.96284867E-04

Random: Yup, it's random

Share this post


Link to post
Share on other sites

I translated Ahl's benchmark to Javascript just for curiosity.

 

<script>
// Ahl's simple benchmark in Javascript
var a, n, i, r, s, t, m;
t = new Date();
m = 0;
do {
r = 0;
s = 0;
for (n = 1; n <= 100; n++) {
a = n;
for (i = 1; i <= 10; i++) {
 a = Math.sqrt(a);
 r = r + Math.random();
}
for (i = 1; i <= 10; i++) {
 a = Math.pow(a, 2);
 r = r + Math.random();
}
s = s + a;
}
m++;
} while (new Date() - t < 1000) ;
document.writeln("Times " + m);
document.writeln("Accuracy " + Math.abs(1010-s/5));
document.writeln("Random " + Math.abs(1000-r));
</script>

Put it inside a file named benchmark.html and run it with your web browser.

 

As modern machines are too fast, the test runs until a whole second has passed and it shows how many times it ran.

 

If someone noted it, the accuracy value indicates to how many digits is exact the floating point core, so it should be a very low value. (the trick is it takes the value, takes square root ten times and then squares ten times, so the value theorically should be the same but cannot because of floating point implementations)

Edited by nanochess

Share this post


Link to post
Share on other sites

I think it's safe to say that just about any modern computer (including many cell phones) could probably run this faster than a single VBLANK interrupt could be used to time.

When you switch to other languages it probably just makes matters worse.

Share this post


Link to post
Share on other sites

I just timed the NEC Trek /PC-6001 on the Virtual NEC Trek emulator.by hand

I don't know how accurate the emulator timing is but it's pretty interesting.

I knew the video hardware wasn't isolated from the CPU and added wait states to the CPU, but I didn't know it was this bad.

 

BTW, I mentioned in another post that hi-res mode would be worse than text but after some thought I realize the VDG has to read the same number of bytes from RAM in text mode as hi-res. The VDG doesn't remember what characters were read from one scan line to the next and there are the same number of horizontal bytes and vertical rows in both modes. The modes that allowed more colors on screen by switching palettes or modes would require more reads from RAM though and slow this down further.

 

NEC Trek/PC-6001 (4MHz PD780C-1 CPU, a Z80 Clone)

Time: 496 secs (8:16)

Accuracy: 4.78744507E-04

 

If that's correct, we have a new bottom of the heap by 46 secs.

I don't have an Extended BASIC ROM image to see if it alters the benchmark.

Share this post


Link to post
Share on other sites

The MSX Turbo R

Time: 30 secs (Hand timed)

Accuracy: 2.058E-07

 

That's slightly faster than the Apple IIc+

That only leaves the Acorn machines as potentially faster but I'd like to make sure the square function was used and not A*A before I hand Acorn the speed title for standard speed machines.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...