Jump to content
IGNORED

Anyone for a little benchmarking?


Recommended Posts

A couple of weeks ago I came across a 1970s BASIC benchmark suite I had not heard of - it was mentioned here in the forums. So I wrote this:

 

https://en.wikipedia.org/wiki/Rugg/Feldman_benchmarks

 

Is anyone willing to run these through on a NTSC machine in Atari BASIC and TURBO? Others would be nice too, but I'm especially curious about TURBO on the 8th test.

 

It's interesting that I did not hear of these tests at the time - they are much more useful than either Ahl or Sieve, and pre-date either.

Link to comment
Share on other sites

As a matter of destiny (?), I actually worked just a few weeks back in compiling the entire suite into a SINGLE set, that can be ran sequentially (with a small penalty associated to Basic line-search overhead, etc.). I actually generated two sets (one that can be ran as-is in Atari-Basic and derivatives, and the other for FastBasic v4.0, where I discovered a few bugs...) 

 

Watch how the 800 / Incognito runs the suite, with commanding authority (Altirra Basic 1.56, Fast Basic 4.x, Altirra High-Performance FP pack, and ultra-high Performance FP pack for OS-XL):

 

First, on Altirra's FP pack:

 

D582AEA9-67A3-4A15-B9B1-3601B4EE6D1C.jpeg

 

71371038-90F8-4058-81CF-F7DEA5FE2FE5.jpeg

 

 

Then on XL-OS High-Performance FP pack:

 

C887F66D-E8E7-4B52-8E3D-A88DB12C6982.jpeg

 

1E72680A-C006-4F23-80BD-46B351A52155.jpeg

 

 

NOTES:

  1. Since FastBasic does NOT support arrays of reals (FP), these were first converted to STRINGS and there stored as such.
  2. Once I transfer the .FBA and .BAS sources into an .ATR, I will post it here, so everyone can download and play with them on their end with a finished product (thinking about JamesD! :-))) (==> UPDATE: ATTACHED !!!)  Scratchpad-DOS-90K-II.ATR

 

Have fun!

 

 

C611F1A2-51BC-4717-AC0D-61DF26F13442.jpeg

Edited by Faicuai
Typos, .ATR attachment...
  • Like 3
Link to comment
Share on other sites

A PAL machine should get a slightly faster result.

 

The comparitive speeds especially vs C64 just go to show the inefficiency of Atari's Basic and FP implementation.

Floating-Point by BCD (and current thread elsewhere) is just plain slow to begin with.

Atari's Basic suffers badly from that and from it's slow line search which penalises it any time a program branch occurs including loops and subroutine returns.

Relatively going by net CPU speed you should expect about 40% better than C64 and about 30% less than BBC (2 MHz and supposedly no DMA penalties)

Edited by Rybags
  • Like 1
Link to comment
Share on other sites

Here's the Atari pulling out a net 1.7x performance gain over the C64 (Vice-64), and that is with XL OS overhead still active, and NO, not in basic, but.... in CC65 (!)

 

Below are TEN (10) iterations of SIEVE, ran in both machines, from the exact same code, and with the exact same compile optimizations. The Atari is almost at ONE SECOND (flat) per Sieve iteration (!!!)

 

 

0CA47D39-1BF6-45D8-80EE-2B75027A43DC.jpeg

 

0229D5F7-1D32-4196-A4C3-2B5B69633021.jpeg

 

Here are the sources and respective executables, for anyone to play (.xex is Atari executable. .prg is C64's:

 

sieveBYT.c

sieveBYT.prg

sieveBYT.xex

 

Compiling commands:

 

Atari: cl65 --static-locals -o %1.xex -t atari -I include -L lib --asm-include-dir asminc --cfg-path cfg -Oir %1.c

C64: cl65 --static-locals -o %1.prg -t c64 -I include -L lib --asm-include-dir asminc --cfg-path cfg -Oirs %1.c

 

 

Edited by Faicuai
  • Like 1
Link to comment
Share on other sites

22 minutes ago, Faicuai said:

Here's the Atari pulling out a net 1.7x performance gain over the C64 (Vice-64), and that is with XL OS overhead still active, and NO, not in basic, but.... in CC65 (!)

...

 

Hmmm... it's almost like the Atari has a 1.7 x advantage in clock speed. 
?

  • Like 3
Link to comment
Share on other sites

13 minutes ago, bob1200xl said:

How do I run the .XEX file from APE?

 

Bob

 

You will better off running it from DOS 2.X, MyDos or SDX prompt.

 

Here's an .ATR with the .XEX, so you can proceed accordingly. If you run it on the 1200XL, turn off screen / ANTIC with 1200XL OS-enabled function keys, on your keyboard, for maximum execution speed.

Scratchpad-DOS-90K-II.ATR

Edited by Faicuai
Link to comment
Share on other sites

Here's the performance summary of the BASIC benchmark suite (interpreter-level only):

 

1. Atari: 61.449 secs (Integer tests), 102.1498 secs (Integer  + FP tests)

2. BBC Micro: 114.6secs (Integer + FP)
3. Apple II: 74.1 (Integer), 192.6 (Integer + FP)

4. Z80 2 Mhz: 166.3 (Integer)

 

Source code is attached on prior posts...

Link to comment
Share on other sites

1 hour ago, JamesD said:

Hmmm... it's almost like the Atari has a 1.7 x advantage in clock speed. 
?

But that has been said already on this thread.

 

Only that we can now see it (confirmed) with a very efficient cross-platform compiler (CC65), instead of stupid-ass, inefficient Basic packages.

 

In this sense (and this is what really matters for those who are reading carefully), the super-nice BBC-Micro will hardly ever pull out a +30% gain over the Atari, assuming ANTIC is out of the way, or on these particular types of tasks... unless its core processor turns out to be inherently different than the 6502.

Edited by Faicuai
Link to comment
Share on other sites

If you run the benchmark suite on a 14mhz machine, there is only a minor improvement because the code runs mostly in the cartridge. The cart has to run at 1.79mhz.

 

But, if you run the sieve, you are using the FP code, which can run in RAM at 14mhz.

 

Like this:

 

DSC01724.thumb.JPG.1a685534c00498ab586bd94cfcb40183.JPG

Link to comment
Share on other sites

4 minutes ago, bob1200xl said:

If you run the benchmark suite on a 14mhz machine, there is only a minor improvement because the code runs mostly in the cartridge. The cart has to run at 1.79mhz.

 

But, if you run the sieve, you are using the FP code, which can run in RAM at 14mhz.

 

Like this:

 

DSC01724.thumb.JPG.1a685534c00498ab586bd94cfcb40183.JPG

 

 

You should be getting 1899 primes... not 1674... Mhmmm...

Link to comment
Share on other sites

Yes, you are correct. The 7mhz machine gets 1899 primes in 3.700 seconds. The hardware is very similar except for the memory wrap around $10000. The 14 wraps to $10000 as it should, while the 7 does not. Both have 65816s. If I run the OS in ROM, I get 1899 and a tiny bit slower at 14mhz.

 

Like this:

 

DSC01725.thumb.JPG.0c76e5c249d05c5806b433bacaea5cb7.JPG

  • Like 1
Link to comment
Share on other sites

17 hours ago, bob1200xl said:

Yes, you are correct. The 7mhz machine gets 1899 primes in 3.700 seconds. The hardware is very similar except for the memory wrap around $10000. The 14 wraps to $10000 as it should, while the 7 does not. Both have 65816s. If I run the OS in ROM, I get 1899 and a tiny bit slower at 14mhz.

 

Like this:

 

DSC01725.thumb.JPG.0c76e5c249d05c5806b433bacaea5cb7.JPG

 

Nice!!!
 

You are now in Moto 68000 territory... feel free to check against that CPU's results... Also (and at the very least) close to 8086, too...

 

  • Like 3
Link to comment
Share on other sites

On 6/21/2019 at 9:31 PM, Rybags said:

A PAL machine should get a slightly faster result.

 

The comparitive speeds especially vs C64 just go to show the inefficiency of Atari's Basic and FP implementation.

 Floating-Point by BCD (and current thread elsewhere) is just plain slow to begin with.

Atari's Basic suffers badly from that and from it's slow line search which penalises it any time a program branch occurs including loops and subroutine returns.

Relatively going by net CPU speed you should expect about 40% better than C64 and about 30% less than BBC (2 MHz and supposedly no DMA penalties)

Why would PAL be faster, less memory contention?

 

BTW, how did TURBO fix that problem, did it build a list of GOTO targets using linenum/address?

Link to comment
Share on other sites

On 6/22/2019 at 1:45 PM, Faicuai said:

Here's the performance summary of the BASIC benchmark suite (interpreter-level only):

 

1. Atari: 61.449 secs (Integer tests), 102.1498 secs (Integer  + FP tests)

2. BBC Micro: 114.6secs (Integer + FP)
3. Apple II: 74.1 (Integer), 192.6 (Integer + FP)

4. Z80 2 Mhz: 166.3 (Integer)

 

Source code is attached on prior posts...

Which BASIC is this on the Atari? That's not stock I don't think?

Link to comment
Share on other sites

2 hours ago, Maury Markowitz said:

Which BASIC is this on the Atari? That's not stock I don't think?

 

As mentioned above, Altirra Basic 1.56 was used as the interpreter of choice, which is a 8K drop-in rom replacement for Atari Basic. 

 

BTW, and now that you mention this, if my recollection is correct, I believe NONE of the non-Atari Basic Interpreters used by the machines I listed above actually fit in 8K, except (maybe) Apple Integer? 

 

It would be worth confirming this closely because, if that is the case, the results shown here would place the Atari on a class of its own, plain and simple.

 

Cheers!

Edited by Faicuai
Link to comment
Share on other sites

8 minutes ago, flashjazzcat said:

Altirra BASIC is essentially a 10K BASIC just like Atari BASIC, since both depend on the 2K OS FP package.

NO.

 

Quite different from, for example, Microsoft Basic, or Turbo Basic, which come with THEIR own FP package, and once removed, that FP package is gone from the machine.

Edited by Faicuai
Link to comment
Share on other sites

But FP is used. Do you think that by avoiding explicit FP calculations, you avoid calling the FP ROM entirely? Does your program have no line numbers? By this illusory logic, an MS BASIC program with no FP math makes MS BASIC 8K.

Edited by flashjazzcat
  • Like 3
Link to comment
Share on other sites

28 minutes ago, flashjazzcat said:

But FP is used. Do you think that by avoiding explicit FP calculations, you avoid calling the FP ROM entirely? Does your program have no line numbers? By this illusory logic, an MS BASIC program with no FP math makes MS BASIC 8K.

The BBC micro runs its OS and Basic in a THIRTY TWO (32) KBytes ROM package. We do not need to discuss anything else here, other than hypothetically allowing Atari basic to be implemented in same ROM space, for instance. That is a discussion we can certainly have.

 

The Atari ROM package was designed and conceived under a different vision, and you can of course have a Basic Package relying entirely on integer computations, if anyone really wishes so... In the case of the Atari Basic, they decided to over-rely on the FP package for the sake of space, but that is a decision pertaining that particular implementation / model, though.

 

You don't need to resort to distorting rhetoric as an attempt to disqualify an irreductible, and unquestionable point: more ROM space can easily lead you to better performance and extra optimizations (which the BBC Micro very well exploits).

 

In the above results summary, I already managed to extract higher speed from the Atari, by sticking to 16+8 KB rom space. Now, if you can prove that one of the two ROM banks on the BBC is half empty, that is a different story, though...

 

 

Edited by Faicuai
Link to comment
Share on other sites

No-one would argue with the assertion that more ROM space can allow for performance and optimisation improvements. But you appeared to be under the impression that the Atari interpreters - unlike 'the non-Atari Basic Interpreters used by the machines I listed above' - are operable within 8K. I point out that neither Atari BASIC, nor your 'interpreter of choice' Altirra BASIC, will work without the 2K OS FP ROM. Both BASICs are essentially 10K implementations, then, whether you like it or not.

 

Try it for yourself: take an XL/XE OS, fill $D800-$DFFF with $FF, and replace the 'SEC' ($38) instruction at the end of the OS checksum test with 'CLC' ($18) so that the bad checksum is overlooked. You'll see the OS boots just fine but Atari BASIC and Altirra BASIC crash before they even get a 'READY' prompt on the screen. If you can demonstrate either BASIC being operational with such a patched, FP-less OS, that's a different story, of course. :)

 

I can see why Atari placed the FP package in the OS since there was space available, and this a) makes the FP package accessible by other applications, and b) allows the BASIC cartridge to be 8K instead of 10K. And that is the only reason the BASIC cartridge is 8K and not 10K. Altirra BASIC is superb, and is my default drop-in replacement for Atari BASIC. It's even clear that it's a marvel of conciseness, given the features Avery managed to pack into it. But it still relies on 2K of FP code, and since it relies on it at all, it relies on it 'entirely'. Maybe if Avery wrote a 10K version of Altirra BASIC with the FP code built in, it would come it at a little under 10K... more than likely. Whatever.

 

  • Like 3
Link to comment
Share on other sites

15 hours ago, Faicuai said:

 

As mentioned above, Altirra Basic 1.56 was used as the interpreter of choice, which is a 8K drop-in rom replacement for Atari Basic. 

The reason I ask is BM2 time. Apple II running Applesoft is 8.5 seconds, while a stock XL is 7.3.

 

Altirra is 1.9?

 

This suggests that the Altirra version is using some sort of fast GOTO like Turbo or BASIC XL? Does anyone know?

Edited by Maury Markowitz
Link to comment
Share on other sites

46 minutes ago, Maury Markowitz said:

The reason I ask is BM2 time. Apple II running Applesoft is 8.5 seconds, while a stock XL is 7.3.

 

Altirra is 1.9?

 

This suggests that the Altirra version is using some sort of fast GOTO like Turbo or BASIC XL? Does anyone know?

Ask @phaeron - it’s his baby after all. 

Link to comment
Share on other sites

14 hours ago, Faicuai said:

you can of course have a Basic Package relying entirely on integer computations, if anyone really wishes so...

A more interesting question, IMHO, is whether you can have improvements if you have both and still stay within a size limit.

 

MS BASIC 1.1 and on had the integer variable type, I%. However, this was used only to save RAM. If you did I%=I%+1 it would convert I% to float, add 1, and then INT the result to put it back in I%. So this was actually slower than using FP, although only a tiny amount.

 

Obviously, you could save some serious cycles if you had an inline 16-bit package that performed the math in 16-bit, not just stored it that way. However, deciding whether or not you can use the int math may be complex. Consider:

 

I%=I%+1

I%=A+1

I=A%+1

 

Of course we can see that the first is int, the second is a float+int and the third is float. But the instructions needed to determine this at run time would slow things down. Perhaps to the point where you should just do everything in FP.

 

But...

 

Atari BASIC has a number of unused tokens. Among these are two constant indicators, $10 and $11. It would be easy to modify the tokenizer to look at the format of the constant and put in a $10 if it is an integer, followed by 16-bits of value. This immediately saves 3 bytes of RAM for every int constant in your app - and I suspect those represent >>90% of the constants in a typical program.

 

One might also note that the vast majority of constants are small, -1 through 10 likely cover 95% of all the non-line-number related constants in your program. In this case one might consider using $11 as a 'small int' type that uses only one signed byte. This would require only two bytes more code in the decoder, a branch past the code that copies the high-order byte into the FP operand.

 

Likewise there is ample room in the variable table's type map for an integer variable type. $60 seems like an obvious choice. One does not need to make any other change, if you're willing the burn the three bytes in the value storage area then the rest of the variable handling works as-is. But one might also consider having two storage tables, one for FP and one for int, in the same way there is another storage system for handling strings.

 

To make this work, some minor changes also have to be made to the instructions that read the variables and constants and load them into the FP "registers" in zero page. But this is a few lines of code. At that point you have saved a bunch of memory in almost every program.

 

So then the last bit. While deciding if a set of instructions is int or FP at runtime would kill you, doing so at tokenizing time would not. It could even be done as a separate pass, first one tokenizes the line as normal, although using the $10 and $60 as above.

 

Then you take a second pass at it and see if every operand is an int. If so, you replace the math instructions with versions that do the calculations inline. So that means you have two sets of math tokens, for instance, FP+ and int+. You don't have a full set of the int instructions, only the most used ones like + and -, and maybe a few others like INT(), which becomes a no-op.

 

So in this second pass, if all the operands are int, and all the operators have int versions, you update the operator tokens. There's no need to actually parse the formulas again or do this during the original token run, you simply have a list of tokens that match int and ensure all the constants and vars have int flags on them.

 

More importantly, there should be two versions of FOR/NEXT and GOTO/GOSUB based on the same logic. In the FOR/NEXT case you can inline all the steps, which should result in a noticeable speed increase. For GOTO/GOSUB there is slightly more complexity, the "int versions" of these could work like FOR/NEXT. But much more interesting, instead of an "int only" version of these, you might consider the "constant only" version which would skip any parsing of the line and just read the value and go. Since the use of GOTO A*100 is relatively rare, this could offer a significant boost for the much more common GOTO 100 as it would skip the (admittedly cheap) INT() on the constant.

 

One might go so far make the "constant version" even a little smarter. If the constant following the GOTO is a $0E or $10, that means it's a line number that has to be looked up. For for a few extra lines of code, one could replace that value with a $11 constant and follow that with the 16-bit memory address that returned from the line-number-lookup. That would offer the same sort of boost that you see in TURBO or BASICXL, but would not require a separate lookup table (TURBO) or pre-parse (XL), thus saving even more time and memory.

 

 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...