Maury Markowitz Posted June 21, 2019 Share Posted June 21, 2019 A couple of weeks ago I came across a 1970s BASIC benchmark suite I had not heard of - it was mentioned here in the forums. So I wrote this: https://en.wikipedia.org/wiki/Rugg/Feldman_benchmarks Is anyone willing to run these through on a NTSC machine in Atari BASIC and TURBO? Others would be nice too, but I'm especially curious about TURBO on the 8th test. It's interesting that I did not hear of these tests at the time - they are much more useful than either Ahl or Sieve, and pre-date either. Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 21, 2019 Share Posted June 21, 2019 (edited) As a matter of destiny (?), I actually worked just a few weeks back in compiling the entire suite into a SINGLE set, that can be ran sequentially (with a small penalty associated to Basic line-search overhead, etc.). I actually generated two sets (one that can be ran as-is in Atari-Basic and derivatives, and the other for FastBasic v4.0, where I discovered a few bugs...) Watch how the 800 / Incognito runs the suite, with commanding authority (Altirra Basic 1.56, Fast Basic 4.x, Altirra High-Performance FP pack, and ultra-high Performance FP pack for OS-XL): First, on Altirra's FP pack: Then on XL-OS High-Performance FP pack: NOTES: Since FastBasic does NOT support arrays of reals (FP), these were first converted to STRINGS and there stored as such. Once I transfer the .FBA and .BAS sources into an .ATR, I will post it here, so everyone can download and play with them on their end with a finished product (thinking about JamesD! :-))) (==> UPDATE: ATTACHED !!!) Scratchpad-DOS-90K-II.ATR Have fun! Edited June 21, 2019 by Faicuai Typos, .ATR attachment... 3 Quote Link to comment Share on other sites More sharing options...
Rybags Posted June 22, 2019 Share Posted June 22, 2019 (edited) A PAL machine should get a slightly faster result. The comparitive speeds especially vs C64 just go to show the inefficiency of Atari's Basic and FP implementation. Floating-Point by BCD (and current thread elsewhere) is just plain slow to begin with. Atari's Basic suffers badly from that and from it's slow line search which penalises it any time a program branch occurs including loops and subroutine returns. Relatively going by net CPU speed you should expect about 40% better than C64 and about 30% less than BBC (2 MHz and supposedly no DMA penalties) Edited June 22, 2019 by Rybags 1 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 22, 2019 Share Posted June 22, 2019 (edited) Here's the Atari pulling out a net 1.7x performance gain over the C64 (Vice-64), and that is with XL OS overhead still active, and NO, not in basic, but.... in CC65 (!) Below are TEN (10) iterations of SIEVE, ran in both machines, from the exact same code, and with the exact same compile optimizations. The Atari is almost at ONE SECOND (flat) per Sieve iteration (!!!) Here are the sources and respective executables, for anyone to play (.xex is Atari executable. .prg is C64's: sieveBYT.c sieveBYT.prg sieveBYT.xex Compiling commands: Atari: cl65 --static-locals -o %1.xex -t atari -I include -L lib --asm-include-dir asminc --cfg-path cfg -Oir %1.c C64: cl65 --static-locals -o %1.prg -t c64 -I include -L lib --asm-include-dir asminc --cfg-path cfg -Oirs %1.c Edited June 22, 2019 by Faicuai 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted June 22, 2019 Share Posted June 22, 2019 22 minutes ago, Faicuai said: Here's the Atari pulling out a net 1.7x performance gain over the C64 (Vice-64), and that is with XL OS overhead still active, and NO, not in basic, but.... in CC65 (!) ... Hmmm... it's almost like the Atari has a 1.7 x advantage in clock speed. ? 3 Quote Link to comment Share on other sites More sharing options...
+bob1200xl Posted June 22, 2019 Share Posted June 22, 2019 How do I run the .XEX file from APE? Bob Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 22, 2019 Share Posted June 22, 2019 (edited) 13 minutes ago, bob1200xl said: How do I run the .XEX file from APE? Bob You will better off running it from DOS 2.X, MyDos or SDX prompt. Here's an .ATR with the .XEX, so you can proceed accordingly. If you run it on the 1200XL, turn off screen / ANTIC with 1200XL OS-enabled function keys, on your keyboard, for maximum execution speed. Scratchpad-DOS-90K-II.ATR Edited June 22, 2019 by Faicuai Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 22, 2019 Share Posted June 22, 2019 Here's the performance summary of the BASIC benchmark suite (interpreter-level only): 1. Atari: 61.449 secs (Integer tests), 102.1498 secs (Integer + FP tests) 2. BBC Micro: 114.6secs (Integer + FP) 3. Apple II: 74.1 (Integer), 192.6 (Integer + FP) 4. Z80 2 Mhz: 166.3 (Integer) Source code is attached on prior posts... Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 22, 2019 Share Posted June 22, 2019 (edited) 1 hour ago, JamesD said: Hmmm... it's almost like the Atari has a 1.7 x advantage in clock speed. ? But that has been said already on this thread. Only that we can now see it (confirmed) with a very efficient cross-platform compiler (CC65), instead of stupid-ass, inefficient Basic packages. In this sense (and this is what really matters for those who are reading carefully), the super-nice BBC-Micro will hardly ever pull out a +30% gain over the Atari, assuming ANTIC is out of the way, or on these particular types of tasks... unless its core processor turns out to be inherently different than the 6502. Edited June 22, 2019 by Faicuai Quote Link to comment Share on other sites More sharing options...
+bob1200xl Posted June 22, 2019 Share Posted June 22, 2019 If you run the benchmark suite on a 14mhz machine, there is only a minor improvement because the code runs mostly in the cartridge. The cart has to run at 1.79mhz. But, if you run the sieve, you are using the FP code, which can run in RAM at 14mhz. Like this: Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 22, 2019 Share Posted June 22, 2019 4 minutes ago, bob1200xl said: If you run the benchmark suite on a 14mhz machine, there is only a minor improvement because the code runs mostly in the cartridge. The cart has to run at 1.79mhz. But, if you run the sieve, you are using the FP code, which can run in RAM at 14mhz. Like this: You should be getting 1899 primes... not 1674... Mhmmm... Quote Link to comment Share on other sites More sharing options...
+bob1200xl Posted June 22, 2019 Share Posted June 22, 2019 Yes, you are correct. The 7mhz machine gets 1899 primes in 3.700 seconds. The hardware is very similar except for the memory wrap around $10000. The 14 wraps to $10000 as it should, while the 7 does not. Both have 65816s. If I run the OS in ROM, I get 1899 and a tiny bit slower at 14mhz. Like this: 1 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 23, 2019 Share Posted June 23, 2019 17 hours ago, bob1200xl said: Yes, you are correct. The 7mhz machine gets 1899 primes in 3.700 seconds. The hardware is very similar except for the memory wrap around $10000. The 14 wraps to $10000 as it should, while the 7 does not. Both have 65816s. If I run the OS in ROM, I get 1899 and a tiny bit slower at 14mhz. Like this: Nice!!! You are now in Moto 68000 territory... feel free to check against that CPU's results... Also (and at the very least) close to 8086, too... 3 Quote Link to comment Share on other sites More sharing options...
Maury Markowitz Posted June 25, 2019 Author Share Posted June 25, 2019 OMG, nice work! Any chance of running under stock TURBO? Or did you do that and I missed it? Quote Link to comment Share on other sites More sharing options...
Maury Markowitz Posted June 25, 2019 Author Share Posted June 25, 2019 On 6/21/2019 at 9:31 PM, Rybags said: A PAL machine should get a slightly faster result. The comparitive speeds especially vs C64 just go to show the inefficiency of Atari's Basic and FP implementation. Floating-Point by BCD (and current thread elsewhere) is just plain slow to begin with. Atari's Basic suffers badly from that and from it's slow line search which penalises it any time a program branch occurs including loops and subroutine returns. Relatively going by net CPU speed you should expect about 40% better than C64 and about 30% less than BBC (2 MHz and supposedly no DMA penalties) Why would PAL be faster, less memory contention? BTW, how did TURBO fix that problem, did it build a list of GOTO targets using linenum/address? Quote Link to comment Share on other sites More sharing options...
Maury Markowitz Posted June 25, 2019 Author Share Posted June 25, 2019 On 6/22/2019 at 1:45 PM, Faicuai said: Here's the performance summary of the BASIC benchmark suite (interpreter-level only): 1. Atari: 61.449 secs (Integer tests), 102.1498 secs (Integer + FP tests) 2. BBC Micro: 114.6secs (Integer + FP) 3. Apple II: 74.1 (Integer), 192.6 (Integer + FP) 4. Z80 2 Mhz: 166.3 (Integer) Source code is attached on prior posts... Which BASIC is this on the Atari? That's not stock I don't think? Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 25, 2019 Share Posted June 25, 2019 (edited) 2 hours ago, Maury Markowitz said: Which BASIC is this on the Atari? That's not stock I don't think? As mentioned above, Altirra Basic 1.56 was used as the interpreter of choice, which is a 8K drop-in rom replacement for Atari Basic. BTW, and now that you mention this, if my recollection is correct, I believe NONE of the non-Atari Basic Interpreters used by the machines I listed above actually fit in 8K, except (maybe) Apple Integer? It would be worth confirming this closely because, if that is the case, the results shown here would place the Atari on a class of its own, plain and simple. Cheers! Edited June 25, 2019 by Faicuai Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted June 25, 2019 Share Posted June 25, 2019 Altirra BASIC is essentially a 10K BASIC just like Atari BASIC, since both depend on the 2K OS FP package. 1 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 25, 2019 Share Posted June 25, 2019 (edited) 8 minutes ago, flashjazzcat said: Altirra BASIC is essentially a 10K BASIC just like Atari BASIC, since both depend on the 2K OS FP package. NO. Quite different from, for example, Microsoft Basic, or Turbo Basic, which come with THEIR own FP package, and once removed, that FP package is gone from the machine. Edited June 25, 2019 by Faicuai Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted June 25, 2019 Share Posted June 25, 2019 (edited) But FP is used. Do you think that by avoiding explicit FP calculations, you avoid calling the FP ROM entirely? Does your program have no line numbers? By this illusory logic, an MS BASIC program with no FP math makes MS BASIC 8K. Edited June 25, 2019 by flashjazzcat 3 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted June 25, 2019 Share Posted June 25, 2019 (edited) 28 minutes ago, flashjazzcat said: But FP is used. Do you think that by avoiding explicit FP calculations, you avoid calling the FP ROM entirely? Does your program have no line numbers? By this illusory logic, an MS BASIC program with no FP math makes MS BASIC 8K. The BBC micro runs its OS and Basic in a THIRTY TWO (32) KBytes ROM package. We do not need to discuss anything else here, other than hypothetically allowing Atari basic to be implemented in same ROM space, for instance. That is a discussion we can certainly have. The Atari ROM package was designed and conceived under a different vision, and you can of course have a Basic Package relying entirely on integer computations, if anyone really wishes so... In the case of the Atari Basic, they decided to over-rely on the FP package for the sake of space, but that is a decision pertaining that particular implementation / model, though. You don't need to resort to distorting rhetoric as an attempt to disqualify an irreductible, and unquestionable point: more ROM space can easily lead you to better performance and extra optimizations (which the BBC Micro very well exploits). In the above results summary, I already managed to extract higher speed from the Atari, by sticking to 16+8 KB rom space. Now, if you can prove that one of the two ROM banks on the BBC is half empty, that is a different story, though... Edited June 25, 2019 by Faicuai Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted June 26, 2019 Share Posted June 26, 2019 No-one would argue with the assertion that more ROM space can allow for performance and optimisation improvements. But you appeared to be under the impression that the Atari interpreters - unlike 'the non-Atari Basic Interpreters used by the machines I listed above' - are operable within 8K. I point out that neither Atari BASIC, nor your 'interpreter of choice' Altirra BASIC, will work without the 2K OS FP ROM. Both BASICs are essentially 10K implementations, then, whether you like it or not. Try it for yourself: take an XL/XE OS, fill $D800-$DFFF with $FF, and replace the 'SEC' ($38) instruction at the end of the OS checksum test with 'CLC' ($18) so that the bad checksum is overlooked. You'll see the OS boots just fine but Atari BASIC and Altirra BASIC crash before they even get a 'READY' prompt on the screen. If you can demonstrate either BASIC being operational with such a patched, FP-less OS, that's a different story, of course. I can see why Atari placed the FP package in the OS since there was space available, and this a) makes the FP package accessible by other applications, and b) allows the BASIC cartridge to be 8K instead of 10K. And that is the only reason the BASIC cartridge is 8K and not 10K. Altirra BASIC is superb, and is my default drop-in replacement for Atari BASIC. It's even clear that it's a marvel of conciseness, given the features Avery managed to pack into it. But it still relies on 2K of FP code, and since it relies on it at all, it relies on it 'entirely'. Maybe if Avery wrote a 10K version of Altirra BASIC with the FP code built in, it would come it at a little under 10K... more than likely. Whatever. 3 Quote Link to comment Share on other sites More sharing options...
Maury Markowitz Posted June 26, 2019 Author Share Posted June 26, 2019 (edited) 15 hours ago, Faicuai said: As mentioned above, Altirra Basic 1.56 was used as the interpreter of choice, which is a 8K drop-in rom replacement for Atari Basic. The reason I ask is BM2 time. Apple II running Applesoft is 8.5 seconds, while a stock XL is 7.3. Altirra is 1.9? This suggests that the Altirra version is using some sort of fast GOTO like Turbo or BASIC XL? Does anyone know? Edited June 26, 2019 by Maury Markowitz Quote Link to comment Share on other sites More sharing options...
+DrVenkman Posted June 26, 2019 Share Posted June 26, 2019 46 minutes ago, Maury Markowitz said: The reason I ask is BM2 time. Apple II running Applesoft is 8.5 seconds, while a stock XL is 7.3. Altirra is 1.9? This suggests that the Altirra version is using some sort of fast GOTO like Turbo or BASIC XL? Does anyone know? Ask @phaeron - it’s his baby after all. Quote Link to comment Share on other sites More sharing options...
Maury Markowitz Posted June 26, 2019 Author Share Posted June 26, 2019 14 hours ago, Faicuai said: you can of course have a Basic Package relying entirely on integer computations, if anyone really wishes so... A more interesting question, IMHO, is whether you can have improvements if you have both and still stay within a size limit. MS BASIC 1.1 and on had the integer variable type, I%. However, this was used only to save RAM. If you did I%=I%+1 it would convert I% to float, add 1, and then INT the result to put it back in I%. So this was actually slower than using FP, although only a tiny amount. Obviously, you could save some serious cycles if you had an inline 16-bit package that performed the math in 16-bit, not just stored it that way. However, deciding whether or not you can use the int math may be complex. Consider: I%=I%+1 I%=A+1 I=A%+1 Of course we can see that the first is int, the second is a float+int and the third is float. But the instructions needed to determine this at run time would slow things down. Perhaps to the point where you should just do everything in FP. But... Atari BASIC has a number of unused tokens. Among these are two constant indicators, $10 and $11. It would be easy to modify the tokenizer to look at the format of the constant and put in a $10 if it is an integer, followed by 16-bits of value. This immediately saves 3 bytes of RAM for every int constant in your app - and I suspect those represent >>90% of the constants in a typical program. One might also note that the vast majority of constants are small, -1 through 10 likely cover 95% of all the non-line-number related constants in your program. In this case one might consider using $11 as a 'small int' type that uses only one signed byte. This would require only two bytes more code in the decoder, a branch past the code that copies the high-order byte into the FP operand. Likewise there is ample room in the variable table's type map for an integer variable type. $60 seems like an obvious choice. One does not need to make any other change, if you're willing the burn the three bytes in the value storage area then the rest of the variable handling works as-is. But one might also consider having two storage tables, one for FP and one for int, in the same way there is another storage system for handling strings. To make this work, some minor changes also have to be made to the instructions that read the variables and constants and load them into the FP "registers" in zero page. But this is a few lines of code. At that point you have saved a bunch of memory in almost every program. So then the last bit. While deciding if a set of instructions is int or FP at runtime would kill you, doing so at tokenizing time would not. It could even be done as a separate pass, first one tokenizes the line as normal, although using the $10 and $60 as above. Then you take a second pass at it and see if every operand is an int. If so, you replace the math instructions with versions that do the calculations inline. So that means you have two sets of math tokens, for instance, FP+ and int+. You don't have a full set of the int instructions, only the most used ones like + and -, and maybe a few others like INT(), which becomes a no-op. So in this second pass, if all the operands are int, and all the operators have int versions, you update the operator tokens. There's no need to actually parse the formulas again or do this during the original token run, you simply have a list of tokens that match int and ensure all the constants and vars have int flags on them. More importantly, there should be two versions of FOR/NEXT and GOTO/GOSUB based on the same logic. In the FOR/NEXT case you can inline all the steps, which should result in a noticeable speed increase. For GOTO/GOSUB there is slightly more complexity, the "int versions" of these could work like FOR/NEXT. But much more interesting, instead of an "int only" version of these, you might consider the "constant only" version which would skip any parsing of the line and just read the value and go. Since the use of GOTO A*100 is relatively rare, this could offer a significant boost for the much more common GOTO 100 as it would skip the (admittedly cheap) INT() on the constant. One might go so far make the "constant version" even a little smarter. If the constant following the GOTO is a $0E or $10, that means it's a line number that has to be looked up. For for a few extra lines of code, one could replace that value with a $11 constant and follow that with the 16-bit memory address that returned from the line-number-lookup. That would offer the same sort of boost that you see in TURBO or BASICXL, but would not require a separate lookup table (TURBO) or pre-parse (XL), thus saving even more time and memory. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.