Jump to content
IGNORED

Performance test (Atari BASIC vs. CP/M): Surprising results...


Faicuai

Recommended Posts

Well, it is no secret that among the latest waves of ALL sorts of developments for our favorite 8-bit computing platform, we have among those a new arsenal of BASIC interpreters and compilers...

 

So I momentarely focused on raw INTEGER handling performance, with an extremely simple test that can be loaded and executed immediately, requiring one-byte arguments (in essence) to be fully executed:

 

POKE 559,0: FOR i=0 to 255: FOR J=0 to 255: NEXT J: NEXT I: POKE 559,34

 

No line-numbers required, no nothing. Just straight type-up and hit ENTER.

 

Here are the results (all on Atari 800 / Incognito, running on high-performance OS ROMs I already posted in a couple of past threads):

 

INTERPRETERS:

 

1. Atari Basic Revc.: 82.50 secs

2. Indus CP/M (Microsoft Basic 5.29): 46.75 secs

3. Atari Microsoft Basic II: 41.41 secs

4. Altirra Basic 1.55: 37.70 secs

5. Turbo Basic v1.5: 36.30 secs

 

Now, for the COMPILERS results, here's where we are in for quite a surprise:

 

1. Atari FastBasic v3.5: 09.75 secs

2. Indus CP/M (Microsoft Compiler): 02.10 secs (!)

3. Indus CP/M (CBasic 2.0 Dig. Rsch): 01.10 secs (!!)

 

I know the Indus-GT is running a Z80-A at around 4Mhz... but notice how the Atari fairs really well on the interpreted domain... Even in the cross-platform group, Atari's Microsoft Basic does better than CP/M Microsoft Basic (INTERPRETED, again).

 

HOWEVER, as soon as you move into the compiled domain, the CP/M versions just and purely KICK-ASS! The level of optimization of Digital Research's CBasic is impressive, yielding up to NINE (9) times the performance of Atari's fastest basic compiler we know (the newly introduced Fast Basic), and up to EIGHTY (80) TIMES faster than Atari Basic RevC (appalling, to say the least)...

 

I wonder if FastBasic's 8K-footprint target is what keeps it from not being able to further optimize integer arithmetic / operations, for instance (footprint that none of these optimized CP/M compilers will ever meet, though)...

 

I will later add the floating point resutls, which also promise to be head-scratching...

Edited by Faicuai
  • Like 5
Link to comment
Share on other sites

Please take note that FastBasic is not a "compiled" language. It does not generate 6502 instructions... it is still interpreted by a runtime lib. The parser only generates the bytecodes to be interpreted from a free text listing. The bytecodes are like tokens from Atari BASIC, except that you cannot get the source listing back.

Link to comment
Share on other sites

Bill Wilkinson always took the position that cpu's like the Z80 were much better suited to compilers than the 6502. Probably a lot of folks here would have a good idea of the technical reasons, but IIRC, Bill said it was related to the registers of the Z80 and lack thereof on the 6502 + the obvious clock speed differences.

 

I know that I was shocked many years ago when I ran Basic XL against MS Basic on my 4.77 MHz PC and Basic XL won (by a small amount). And as we know now, Basic XL is not exactly a speed demon among our Basics. Probably it would not fair to too well today against a 3+ GHz PC! ;)

But there we are -- our 8-bit "time in a bottle."

 

-Larry

  • Like 1
Link to comment
Share on other sites

6502 doesn't fare well re # of registers and ease of passing parameters via the stack (in the case of some CPUS you can cheaply implement user stacks with registers).

 

68000 is an example of a CPU which does well there - with address registers and plenty of instructions to do moves with pre-decrement or post-increment of the register.

Also 6502 doesn't have a good and quick method of doing software IRQ with parameter passing - BRK is somewhat weak there and rarely used other than in debuggers.

  • Like 1
Link to comment
Share on other sites

On integers, two versions of Microsoft BASIC running on a 3.5 MHz Z80 and 1 MHz 6502 should be more or less equal. The Atari runs faster, so it should yield better results as you observed. I think when you get into floating point and in particular trigonometry, Microsoft may have optimized and improved their BASIC in different ways for each architecture so you would get bigger differences. Try this benchmark from Creative Computing:

 

10 PRINT "START":K=0
20 K=K+1:A=K^2:B=LOG(K):C=SIN(K)
30 IF K<1000 THEN GOTO 20
40 PRINT "STOP"
The Atari Microsoft BASIC II shouldn't take more than 100 seconds, probably closer to 70. The Microsoft BASIC 5.29 in CP/M might take over 200 seconds, though those are mainly guessed numbers.
Link to comment
Share on other sites

Got to be careful with integers. Supposedly the Commodore Basics just convert to FP and back, making them almost pointless. The lesser storage used by the variable becomes meaningless too given that each reference costs an extra byte for the % suffix. Though I suppose for arrays they'd have benefits.

 

With Atari stuff though - I'd be surprised by anything using the stock FP routines that won a contest against another computer. It is really that bad.

Link to comment
Share on other sites

On integers, two versions of Microsoft BASIC running on a 3.5 MHz Z80 and 1 MHz 6502 should be more or less equal. The Atari runs faster, so it should yield better results as you observed. I think when you get into floating point and in particular trigonometry, Microsoft may have optimized and improved their BASIC in different ways for each architecture so you would get bigger differences. Try this benchmark from Creative Computing:

10 PRINT "START":K=0
20 K=K+1:A=K^2:B=LOG(K):C=SIN(K)
30 IF K<1000 THEN GOTO 20
40 PRINT "STOP"
The Atari Microsoft BASIC II shouldn't take more than 100 seconds, probably closer to 70. The Microsoft BASIC 5.29 in CP/M might take over 200 seconds, though those are mainly guessed numbers.

Well, oddly enough, your numbers seem quite OFF for CP/M... Here is the summary (again, under Atari800 Incognito, with XEv03-FP-f14-H high performance OS/ROM):

 

1. Indus CP/M K*K MSBasic v5.29 29.85 secs

2. A800i K*K MSBasic vII 34.83 secs

3. A800i K*K Altirra Bas v155 38.75 secs

4. Indus CP/M K^2 MSBasic v5.29 51.19 secs

5. A800i K^2 MSBasic vII 57.95 secs

6. A800i. K^2 Altirra Bas v155 74.35 secs

 

Both trascendental and non-trascendental computations of ^2 are reported above, and all MSBasic versions include % termination on ALL integer variables. All results above from INTERPRETED domains (no compilers).

 

Also Atari's 6502 does not run anywhere near 1.0Mhz. It runs at 1.8Mhz, and Atari OS carries dormant code from 1983's 1200XL that allows ANTIC to be turned ON and OFF from ANYWHERE and independently of what you are doing, as long as OS key mapper is not (ilegally) by-passed or supressed. The high-performance ROM used on my A800i includes cycle-exact patching to enable such dormant code, therefore it is optional to include ANTIC control on actual Basic code.

  • Like 1
Link to comment
Share on other sites

Please take note that FastBasic is not a "compiled" language. It does not generate 6502 instructions... it is still interpreted by a runtime lib. The parser only generates the bytecodes to be interpreted from a free text listing. The bytecodes are like tokens from Atari BASIC, except that you cannot get the source listing back.

Thanks!!!

 

In this case, it should be made clear that FastBasic is not a true compiler per se, as it gives the impression from reading its documentation that somehow it is. Even though WAY, WAY faster than anything out-there for Atari, this explains why it lags quite behind CP/M true compilers, like CBasic.

 

So our only option left at the table is to code directly on 6502 assembly, resembling as closely as possible the structure of synthetic code that an optimized compiler would output.

 

Coming right next...

Edited by Faicuai
  • Like 1
Link to comment
Share on other sites

Bill Wilkinson always took the position that cpu's like the Z80 were much better suited to compilers than the 6502. Probably a lot of folks here would have a good idea of the technical reasons, but IIRC, Bill said it was related to the registers of the Z80 and lack thereof on the 6502 + the obvious clock speed differences.

 

I know that I was shocked many years ago when I ran Basic XL against MS Basic on my 4.77 MHz PC and Basic XL won (by a small amount). And as we know now, Basic XL is not exactly a speed demon among our Basics. Probably it would not fair to too well today against a 3+ GHz PC! ;)

But there we are -- our 8-bit "time in a bottle."

 

-Larry

 

Well, let's see how the 6502 handles integer-handling, directly in Assembler.

 

Not being an 6502 machine language expert, I decided to dust-off my VERY old Atari Assembler manual, plus 6502 Quick Reference card. In doing so, I wanted to achieve what (under my criteria) would look like a well-optimized compiler code:

  1. Fully-parametric code (e.g. gets all of loops' "For:[sTART] to [END]" parameters from RAM, not hardcoded)
  2. Stores and maintain all loop's variables runtime values, PER computation, on RAM, not in CPU's registers.
  3. Code structure allows for indefinite / infinite FOR-NEXT nesting, as long as RAM is available.
  4. Uses or re-uses only ONE CPU register, and NOT Acumulator, regardless of any give FOR-NEXT nesting depth.
  5. NO hardcoded jumps, so code can be moved around relatively freely, and if anyone wants to run in another 6502 machine can do so.
  6. For maximum performance, Page-0 addressing (per 6502's reference card) to be used during memory-address decoding. It seems very little happens on the top-half of that page, as far as I could monitor.

After considering the above premises, I came up with the following creature, with I tried as hard as I humanly could to not to look like a crude hack-job:

 

post-29379-0-30003600-1541797716_thumb.jpg

 

 

When handling nested-loop levels, the right fly-back entry is "NXT_" and when full-cycling loops locally, "CMP_" is the correct re-entry, and in any case, runtime status of loop-variables is kept on RAM in every step of the way.

 

In short, 65,536 additions, 65,536 comparisons, 65,536 Page0-writes, and 768 page0-reads.

 

Now, as for the results? It runs in approx. 550 milliseconds, (0.55 secs), time after time. That means, almost exactly HALF of CBasic's compiled code (1100 milliseconds, 1.10 secs). That is a 2:1 performance advantage ratio, which, in my opinion, is paltry, as it would only take a similar code-approach for Z80A as above, to blow that out of the water.

 

 

Attached is an .ATR image (SDX) which you can open in Altirra or any SDX-equipped machine, for your inspection.

 

Scratchpad-II.atr

 

 

I can only say my HAT's off to those guys at Digital Research... They really nailed some serious optimizations on CBasic, dating all the way back to April, 1983 (which is what is reported on-screen during compiling time).

Edited by Faicuai
Link to comment
Share on other sites

Back in the day, I used the CB80 compiler on a 4Mhz. TeleVideo system running CBIS Network-OS (Hacked by me to support 4 hard disks and also become Network-NZ). This was a full blown multi-user NZ-COM system. CB80 was AWESOME!

 

I thoroughly enjoyed using that system. CB80 is screaming fast!

 

I haven't looked online yet (the thought just popped into my mind). Does anyone have / know where this is available? I want to run it on my Indus.

 

I used WordStar Non-Document mode to write the source, then saved and compiled it.

 

It's a beautiful thing.

  • Like 2
Link to comment
Share on other sites

Back in the day, I used the CB80 compiler on a 4Mhz. TeleVideo system running CBIS Network-OS (Hacked by me to support 4 hard disks and also become Network-NZ). This was a full blown multi-user NZ-COM system. CB80 was AWESOME!

 

I thoroughly enjoyed using that system. CB80 is screaming fast!

 

I haven't looked online yet (the thought just popped into my mind). Does anyone have / know where this is available? I want to run it on my Indus.

 

I used WordStar Non-Document mode to write the source, then saved and compiled it.

 

It's a beautiful thing.

I will get a CP/M image of CBasic80 II for you, if that is what you refer to. I will post it here.

  • Like 2
Link to comment
Share on other sites

Back in the day, I used the CB80 compiler on a 4Mhz. TeleVideo system running CBIS Network-OS (Hacked by me to support 4 hard disks and also become Network-NZ). This was a full blown multi-user NZ-COM system. CB80 was AWESOME!

 

I thoroughly enjoyed using that system. CB80 is screaming fast!

 

I haven't looked online yet (the thought just popped into my mind). Does anyone have / know where this is available? I want to run it on my Indus.

 

I used WordStar Non-Document mode to write the source, then saved and compiled it.

 

It's a beautiful thing.

 

Here you go (ignore one-time binary transfer procedure, if you already know about it, images attached below):

  1. Set your IndusGT drive to SIO ID #1 (Drive 1), and find a BLANK, unused floppy disk.
  2. Load CBasic II image above, on SIO-attached storage device on SIO ID #2 (Drive 2) CPM-CBasicII-1.atr
  3. Load this binary-copy utility image into an SIO storage device on SIO ID #3 or #4 (Sector_COPIERS.ATR).
  4. Select D3 or D3 above as your boot image, and TURN OFF IndusGT.
  5. On Atari 800i, MUST go to BIOS and boot image in #3 (above) in Colleen Mode (52KBytes). In Ultimate1MB, XL/XE OS will work.
  6. Once copy-utility menu shows up, select US Doubler copier (MyCopyr 2.1 will be the title when it loads).
  7. Set "source" to SIO ID Drive 2 (as set in #2 above), and set "Destination" to SIO ID Drive 1 (as set in #1 above).
  8. Set FORMAT ON, and set Verify and FORMAT SKEW to OFF. Start binary transfer by pressing START.
  9. When finished, leave IndusGT ON (with floppy inserted), and reboot Atari back on your preferred DOS.
  10. Attach this image (DOS-CPM_Scratchpad-2.atr) anywhere you want (preferably on HD/SIDE), on either Incognito / Ultimate SIDE or attached SIO device (if SIO device, make sure you attached to D2 and above).
  11. Go to above image, and run CPM-TOOL v5, and select FIX BOOT sector on DRIVE 1 (default). Your IndusGT will spin very shortly.
  12. Exit CPM_Tool, and load TRUBT05C.COM from image above, and press DRIVE+ERROR buttons and voila!

Cheers!

Edited by Faicuai
  • Like 5
Link to comment
Share on other sites

  • 4 months later...

Fiacuai, you really seem to have done a fair bit more with your 800 and incognito with oses etc.. than I have.. (not logged in much over the last year) Do you know a thread on the optimized OS you talk about?

I would love to also get this CP/m going.. I also have a couple of Indus drives. One good, one not.

 

But first would love to find out the latest and greatest.. seems like you might be in the know ;)

 

James

Link to comment
Share on other sites

Just a point of order. Atari FP uses more significant figures compared to your typical MS or DR BASIC. Slower that way of course but much more accurate. I think it came up for me doing base 85 math and using a pycnometer which are admittedly rare cases.

 

The Master had some things to say about BASIC benchmarks. Still fresh to read it now. https://www.atarimagazines.com/compute/issue57/insight_atari.html

Link to comment
Share on other sites

I can't say the results are a surprise to me.
You are comparing modern optimized interpreters against an old Z80 interpreter. Microsoft didn't really optimize any of their interpreters very much speed wise.
You are also comparing a bytecode interpreter against a native code interpreter for the compilers.

As for the clock ratio difference between 6502 and Z80, I've found the Z80 requires a around a 2.2 to 1 clock speed rather than the 3.5 to 1 mentioned above.
But this varies a LOT depending on what you are doing. For large memory moves, the Z80 stomps on the 6502 due to the special instructions.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...