Jump to content

Photo

Performance test (Atari BASIC vs. CP/M): Surprising results...


12 replies to this topic

#1 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • 900 posts
  • Location:Florida, U.S.A.

Posted Thu Nov 8, 2018 4:29 PM

Well, it is no secret that among the latest waves of ALL sorts of developments for our favorite 8-bit computing platform, we have among those a new arsenal of BASIC interpreters and compilers...

So I momentarely focused on raw INTEGER handling performance, with an extremely simple test that can be loaded and executed immediately, requiring one-byte arguments (in essence) to be fully executed:

POKE 559,0: FOR i=0 to 255: FOR J=0 to 255: NEXT J: NEXT I: POKE 559,34

No line-numbers required, no nothing. Just straight type-up and hit ENTER.

Here are the results (all on Atari 800 / Incognito, running on high-performance OS ROMs I already posted in a couple of past threads):

INTERPRETERS:

1. Atari Basic Revc.: 82.50 secs
2. Indus CP/M (Microsoft Basic 5.29): 46.75 secs
3. Atari Microsoft Basic II: 41.41 secs
4. Altirra Basic 1.55: 37.70 secs
5. Turbo Basic v1.5: 36.30 secs

Now, for the COMPILERS results, here's where we are in for quite a surprise:

1. Atari FastBasic v3.5: 09.75 secs
2. Indus CP/M (Microsoft Compiler): 02.10 secs (!)
3. Indus CP/M (CBasic 2.0 Dig. Rsch): 01.10 secs (!!)

I know the Indus-GT is running a Z80-A at around 4Mhz... but notice how the Atari fairs really well on the interpreted domain... Even in the cross-platform group, Atari's Microsoft Basic does better than CP/M Microsoft Basic (INTERPRETED, again).

HOWEVER, as soon as you move into the compiled domain, the CP/M versions just and purely KICK-ASS! The level of optimization of Digital Research's CBasic is impressive, yielding up to NINE (9) times the performance of Atari's fastest basic compiler we know (the newly introduced Fast Basic), and up to EIGHTY (80) TIMES faster than Atari Basic RevC (appalling, to say the least)...

I wonder if FastBasic's 8K-footprint target is what keeps it from not being able to further optimize integer arithmetic / operations, for instance (footprint that none of these optimized CP/M compilers will ever meet, though)...

I will later add the floating point resutls, which also promise to be head-scratching...

Edited by Faicuai, Thu Nov 8, 2018 4:32 PM.


#2 Stephen OFFLINE  

Stephen

    Quadrunner

  • 7,220 posts
  • A8 Gear Head
  • Location:No longer in Crakron, Ohio

Posted Thu Nov 8, 2018 4:48 PM

Sweet test idea!  I love messing with the CP/M stuff I have collected since I upgraded my Indus.



#3 vitoco OFFLINE  

vitoco

    Moonsweeper

  • 310 posts

Posted Fri Nov 9, 2018 4:02 AM

Please take note that FastBasic is not a "compiled" language. It does not generate 6502 instructions... it is still interpreted by a runtime lib. The parser only generates the bytecodes to be interpreted from a free text listing. The bytecodes are like tokens from Atari BASIC, except that you cannot get the source listing back.



#4 Larry ONLINE  

Larry

    River Patroller

  • 4,034 posts
  • Location:U.S. -- Midwest

Posted Fri Nov 9, 2018 5:02 AM

Bill Wilkinson always took the position that cpu's like the Z80 were much better suited to compilers than the 6502.  Probably a lot of folks here would have a good idea of the technical reasons, but IIRC, Bill said it was related to the registers of the Z80 and lack thereof on the 6502 + the obvious clock speed differences.

 

I know that I was shocked many years ago when I ran Basic XL against MS Basic on my 4.77 MHz PC and Basic XL won (by a small amount).  And as we know now, Basic XL is not exactly a speed demon among our Basics.  Probably it would not fair to too well today against a 3+ GHz PC!  ;)

But there we are -- our 8-bit "time in a bottle."

 

-Larry



#5 Rybags OFFLINE  

Rybags

    Quadrunner

  • 15,849 posts
  • Location:Australia

Posted Fri Nov 9, 2018 5:16 AM

6502 doesn't fare well re # of registers and ease of passing parameters via the stack (in the case of some CPUS you can cheaply implement user stacks with registers).

 

68000 is an example of a CPU which does well there - with address registers and plenty of instructions to do moves with pre-decrement or post-increment of the register.

Also 6502 doesn't have a good and quick method of doing software IRQ with parameter passing - BRK is somewhat weak there and rarely used other than in debuggers.



#6 carlsson ONLINE  

carlsson

    Metagalactic Mule

  • 7,857 posts
  • Location:Västerås, Sweden

Posted Fri Nov 9, 2018 7:21 AM

On integers, two versions of Microsoft BASIC running on a 3.5 MHz Z80 and 1 MHz 6502 should be more or less equal. The Atari runs faster, so it should yield better results as you observed. I think when you get into floating point and in particular trigonometry, Microsoft may have optimized and improved their BASIC in different ways for each architecture so you would get bigger differences. Try this benchmark from Creative Computing:
 
10 PRINT "START":K=0
20 K=K+1:A=K^2:B=LOG(K):C=SIN(K)
30 IF K<1000 THEN GOTO 20
40 PRINT "STOP"
The Atari Microsoft BASIC II shouldn't take more than 100 seconds, probably closer to 70. The Microsoft BASIC 5.29 in CP/M might take over 200 seconds, though those are mainly guessed numbers.

#7 Rybags OFFLINE  

Rybags

    Quadrunner

  • 15,849 posts
  • Location:Australia

Posted Fri Nov 9, 2018 7:42 AM

Got to be careful with integers.  Supposedly the Commodore Basics just convert to FP and back, making them almost pointless.  The lesser storage used by the variable becomes meaningless too given that each reference costs an extra byte for the % suffix.  Though I suppose for arrays they'd have benefits.

 

With Atari stuff though - I'd be surprised by anything using the stock FP routines that won a contest against another computer.  It is really that bad.



#8 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • Topic Starter
  • 900 posts
  • Location:Florida, U.S.A.

Posted Fri Nov 9, 2018 1:22 PM

On integers, two versions of Microsoft BASIC running on a 3.5 MHz Z80 and 1 MHz 6502 should be more or less equal. The Atari runs faster, so it should yield better results as you observed. I think when you get into floating point and in particular trigonometry, Microsoft may have optimized and improved their BASIC in different ways for each architecture so you would get bigger differences. Try this benchmark from Creative Computing: 

10 PRINT "START":K=0
20 K=K+1:A=K^2:B=LOG(K):C=SIN(K)
30 IF K<1000 THEN GOTO 20
40 PRINT "STOP"
The Atari Microsoft BASIC II shouldn't take more than 100 seconds, probably closer to 70. The Microsoft BASIC 5.29 in CP/M might take over 200 seconds, though those are mainly guessed numbers.

Well, oddly enough, your numbers seem quite OFF for CP/M... Here is the summary (again, under Atari800 Incognito, with XEv03-FP-f14-H high performance OS/ROM):

1. Indus CP/M K*K MSBasic v5.29 29.85 secs
2. A800i K*K MSBasic vII 34.83 secs
3. A800i K*K Altirra Bas v155 38.75 secs
4. Indus CP/M K^2 MSBasic v5.29 51.19 secs
5. A800i K^2 MSBasic vII 57.95 secs
6. A800i. K^2 Altirra Bas v155 74.35 secs

Both trascendental and non-trascendental computations of ^2 are reported above, and all MSBasic versions include % termination on ALL integer variables. All results above from INTERPRETED domains (no compilers).

Also Atari's 6502 does not run anywhere near 1.0Mhz. It runs at 1.8Mhz, and Atari OS carries dormant code from 1983's 1200XL that allows ANTIC to be turned ON and OFF from ANYWHERE and independently of what you are doing, as long as OS key mapper is not (ilegally) by-passed or supressed. The high-performance ROM used on my A800i includes cycle-exact patching to enable such dormant code, therefore it is optional to include ANTIC control on actual Basic code.

#9 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • Topic Starter
  • 900 posts
  • Location:Florida, U.S.A.

Posted Fri Nov 9, 2018 1:28 PM

Please take note that FastBasic is not a "compiled" language. It does not generate 6502 instructions... it is still interpreted by a runtime lib. The parser only generates the bytecodes to be interpreted from a free text listing. The bytecodes are like tokens from Atari BASIC, except that you cannot get the source listing back.


Thanks!!!

In this case, it should be made clear that FastBasic is not a true compiler per se, as it gives the impression from reading its documentation that somehow it is. Even though WAY, WAY faster than anything out-there for Atari, this explains why it lags quite behind CP/M true compilers, like CBasic.

So our only option left at the table is to code directly on 6502 assembly, resembling as closely as possible the structure of synthetic code that an optimized compiler would output.

Coming right next...

Edited by Faicuai, Fri Nov 9, 2018 1:35 PM.


#10 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • Topic Starter
  • 900 posts
  • Location:Florida, U.S.A.

Posted Fri Nov 9, 2018 3:20 PM

Bill Wilkinson always took the position that cpu's like the Z80 were much better suited to compilers than the 6502.  Probably a lot of folks here would have a good idea of the technical reasons, but IIRC, Bill said it was related to the registers of the Z80 and lack thereof on the 6502 + the obvious clock speed differences.

 

I know that I was shocked many years ago when I ran Basic XL against MS Basic on my 4.77 MHz PC and Basic XL won (by a small amount).  And as we know now, Basic XL is not exactly a speed demon among our Basics.  Probably it would not fair to too well today against a 3+ GHz PC!  ;)

But there we are -- our 8-bit "time in a bottle."

 

-Larry

 

Well, let's see how the 6502 handles integer-handling, directly in Assembler.

 

Not being an 6502 machine language expert, I decided to dust-off my VERY old Atari Assembler manual, plus 6502 Quick Reference card. In doing so, I wanted to achieve what (under my criteria) would look like a well-optimized compiler code:

  1. Fully-parametric code (e.g. gets all of loops' "For:[START] to [END]" parameters from RAM, not hardcoded)
  2. Stores and maintain all loop's variables runtime values, PER computation, on RAM, not in CPU's registers.
  3. Code structure allows for indefinite / infinite FOR-NEXT nesting, as long as RAM is available.
  4. Uses or re-uses only ONE CPU register, and NOT Acumulator, regardless of any give FOR-NEXT nesting depth.
  5. NO hardcoded jumps, so code can be moved around relatively freely, and if anyone wants to run in another 6502 machine can do so.
  6. For maximum performance, Page-0 addressing (per 6502's reference card) to be used during memory-address decoding. It seems very little happens on the top-half of that page, as far as I could monitor.

After considering the above premises, I came up with the following creature, with I tried as hard as I humanly could to not to look like a crude hack-job:

 

A800_Programming-ASM-1.jpg

 

 

When handling nested-loop levels, the right fly-back entry is "NXT_" and when full-cycling loops locally, "CMP_" is the correct re-entry, and in any case, runtime status of loop-variables is kept on RAM in every step of the way.

 

In short, 65,536 additions, 65,536 comparisons, 65,536 Page0-writes, and 768 page0-reads.

 

Now, as for the results? It runs in approx. 550 milliseconds, (0.55 secs), time after time. That means, almost exactly HALF of CBasic's compiled code (1100 milliseconds, 1.10 secs). That is a 2:1 performance advantage ratio, which, in my opinion, is paltry, as it would only take a similar code-approach for Z80A as above, to blow that out of the water.

 

 

Attached is an .ATR image (SDX) which you can open in Altirra or any SDX-equipped machine, for your inspection.

 

Attached File  Scratchpad-II.atr   179.64KB   3 downloads

 

 

I can only say my HAT's off to those guys at Digital Research... They really nailed some serious optimizations on CBasic, dating all the way back to April, 1983 (which is what is reported on-screen during compiling time). 


Edited by Faicuai, Fri Nov 9, 2018 3:32 PM.


#11 Kyle22 OFFLINE  

Kyle22

    River Patroller

  • 3,660 posts
  • Call my BBS! telnet://broadway1.lorexddns.net
  • Location:McKees Rocks (Pittsburgh), PA

Posted Fri Nov 9, 2018 10:25 PM

Back in the day, I used the CB80 compiler on a 4Mhz. TeleVideo system running CBIS Network-OS (Hacked by me to support 4 hard disks and also become Network-NZ). This was a full blown multi-user NZ-COM system. CB80 was AWESOME!

 

I thoroughly enjoyed using that system. CB80 is screaming fast!

 

I haven't looked online yet (the thought just popped into my mind). Does anyone have / know where this is available? I want to run it on my Indus.

 

I used WordStar Non-Document mode to write the source, then saved and compiled it.

 

It's a beautiful thing.



#12 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • Topic Starter
  • 900 posts
  • Location:Florida, U.S.A.

Posted Sat Nov 10, 2018 10:23 AM

Back in the day, I used the CB80 compiler on a 4Mhz. TeleVideo system running CBIS Network-OS (Hacked by me to support 4 hard disks and also become Network-NZ). This was a full blown multi-user NZ-COM system. CB80 was AWESOME!
 
I thoroughly enjoyed using that system. CB80 is screaming fast!
 
I haven't looked online yet (the thought just popped into my mind). Does anyone have / know where this is available? I want to run it on my Indus.
 
I used WordStar Non-Document mode to write the source, then saved and compiled it.
 
It's a beautiful thing.


I will get a CP/M image of CBasic80 II for you, if that is what you refer to. I will post it here.

#13 Faicuai OFFLINE  

Faicuai

    Dragonstomper

  • Topic Starter
  • 900 posts
  • Location:Florida, U.S.A.

Posted Sat Nov 10, 2018 3:23 PM

Back in the day, I used the CB80 compiler on a 4Mhz. TeleVideo system running CBIS Network-OS (Hacked by me to support 4 hard disks and also become Network-NZ). This was a full blown multi-user NZ-COM system. CB80 was AWESOME!

 

I thoroughly enjoyed using that system. CB80 is screaming fast!

 

I haven't looked online yet (the thought just popped into my mind). Does anyone have / know where this is available? I want to run it on my Indus.

 

I used WordStar Non-Document mode to write the source, then saved and compiled it.

 

It's a beautiful thing.

 

Here you go (ignore one-time binary transfer procedure, if you already know about it, images attached below):

  1. Set your IndusGT drive to SIO ID #1 (Drive 1), and find a BLANK, unused floppy disk.
  2. Load CBasic II image above, on SIO-attached storage device on SIO ID #2 (Drive 2) Attached File  CPM-CBasicII-1.atr   179.64KB   8 downloads
  3. Load this binary-copy utility image into an SIO storage device on SIO ID #3 or #4 (Attached File  Sector_COPIERS.ATR   130.02KB   5 downloads).
  4. Select D3 or D3 above as your boot image, and TURN OFF IndusGT.
  5. On Atari 800i, MUST go to BIOS and boot image in #3 (above) in Colleen Mode (52KBytes). In Ultimate1MB, XL/XE OS will work.
  6. Once copy-utility menu shows up, select US Doubler copier (MyCopyr 2.1 will be the title when it loads).
  7. Set "source" to SIO ID Drive 2 (as set in #2 above), and set "Destination" to SIO ID Drive 1 (as set in #1 above).
  8. Set FORMAT ON, and set Verify and FORMAT SKEW to OFF. Start binary transfer by pressing START.
  9. When finished, leave IndusGT ON (with floppy inserted), and reboot Atari back on your preferred DOS.
  10. Attach this image (Attached File  DOS-CPM_Scratchpad-2.atr   179.64KB   10 downloads) anywhere you want (preferably on HD/SIDE), on either Incognito / Ultimate SIDE or attached SIO device (if SIO device, make sure you attached to D2 and above).
  11. Go to above image, and run CPM-TOOL v5, and select FIX BOOT sector on DRIVE 1 (default). Your IndusGT will spin very shortly.
  12. Exit CPM_Tool, and load TRUBT05C.COM from image above, and press DRIVE+ERROR buttons and voila!

Cheers!


Edited by Faicuai, Sat Nov 10, 2018 3:29 PM.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users