Jump to content

Photo

BASIC Interpreter Speed Comparison: Prime Number Generation

BASIC Performance Primes

55 replies to this topic

#1 Darklantha OFFLINE  

Darklantha

    Combat Commando

  • 5 posts

Posted Sun Nov 12, 2017 4:24 PM

Hello all,

 

I've recently completed a series of videos that compare the BASIC interpreter performance of a variety of 8-bit vintage computers.  I use prime number generation as the test bed.  I think folks in this forum will be pleasantly surprised by the results!

 

https://www.youtube....rm64PQifIOApqdp

 

Keep it vintage,

 

-Rusty


Edited by Darklantha, Sun Nov 12, 2017 4:24 PM.


#2 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,662 posts
  • Location:10-0-11-00:02

Posted Sun Nov 12, 2017 5:18 PM

This very nicely done. Having always been a proponent of Turbo Basic it's nice to see it came out on top. During the early year I was almost always required to use some form of Microsoft basic which I called slow and missing in features. It looks like being number two wasn't as bad as it seemed when bench marked. What was unexpected was how distant the the third and lower rankings were in speed. Being the person I am, I decided to use the old stop watch method to roughly test what you have presented.. It was pretty much in agreement with your findings. Within fractions of a second on most and as much as a second on others

 

one question though. My Atari turboxl timed out at 52.01 yours on screen shows 52 and your spread sheet shows 53 seconds.

 

Might that be a typo or a rounding?  my  .01 could be due to slow reflexes or stop watch response. Very good test none the less.



#3 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Mon Nov 13, 2017 11:46 AM

While that certainly compares programs running the same code, it isn't the fastest way to calculate primes.
If you use a temporary variable to store I/J, it cuts the number of divides in half.
That accounts for 20+ seconds on some machines and it should also benefit the Atari.
What it means for the benchmark, is your version will heavily favor a BASIC with a highly optimized divide.

Using exactly the same code also eliminates any potential optimizations available under other BASICs.
For example, some BASICs let you pack a lot on a single line allowing you to eliminate calling GOTO and searching for line numbers.

Here is an optimized CoCo version.  It takes about 43 1/2 seconds to complete. 
The POKE sets the CoCo 1/2 into high speed mode.
Notice that the use of inverse logic and ELSE eliminates the GOTOs entirely.
This is also the standard BASIC.  My version of BASIC for the MC-10 completes this in about 66 seconds in the current version.

0 POKE 65495,0100 CLS
110 PRINT "Welcome to the prime number generator."
115 N = 32767
120 INPUT "Upper limit (32767)";N
125 TIMER=0
130 FOR I = 3 TO N
140  FOR J = 2 TO (I-1):X=I/J:IF X <> INT(X) THEN NEXT J:PRINT I;:ELSE PRINT ".";
210 NEXT I
220 FINAL = TIMER/60
230 PRINT
240 PRINT "TIME ELAPSED IN SECONDS: ";FINAL
260 END

Edited by JamesD, Mon Nov 13, 2017 11:48 AM.


#4 krslam OFFLINE  

krslam

    Moonsweeper

  • 462 posts
  • Location:Seattle

Posted Mon Nov 13, 2017 12:39 PM

Could you just tabulate the results for those of us too lazy to watch 30 minutes of video?



#5 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Mon Nov 13, 2017 1:14 PM

 

While that certainly compares programs running the same code, it isn't the fastest way to calculate primes.
If you use a temporary variable to store I/J, it cuts the number of divides in half.
That accounts for 20+ seconds on some machines and it should also benefit the Atari.
What it means for the benchmark, is your version will heavily favor a BASIC with a highly optimized divide.

Using exactly the same code also eliminates any potential optimizations available under other BASICs.
For example, some BASICs let you pack a lot on a single line allowing you to eliminate calling GOTO and searching for line numbers.

Here is an optimized CoCo version.  It takes about 43 1/2 seconds to complete. 
The POKE sets the CoCo 1/2 into high speed mode.
Notice that the use of inverse logic and ELSE eliminates the GOTOs entirely.
This is also the standard BASIC.  My version of BASIC for the MC-10 completes this in about 66 seconds in the current version.

0 POKE 65495,0100 CLS
110 PRINT "Welcome to the prime number generator."
115 N = 32767
120 INPUT "Upper limit (32767)";N
125 TIMER=0
130 FOR I = 3 TO N
140  FOR J = 2 TO (I-1):X=I/J:IF X <> INT(X) THEN NEXT J:PRINT I;:ELSE PRINT ".";
210 NEXT I
220 FINAL = TIMER/60
230 PRINT
240 PRINT "TIME ELAPSED IN SECONDS: ";FINAL
260 END

 

 

FWIW, it's just over 45 seconds if you use this line 140.  It requires more parsing.

140  FOR J = 2 TO (I-1):X=I/J:IF X = INT(X) THEN PRINT ".";: ELSE NEXT J:PRINT I;


#6 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Mon Nov 13, 2017 1:50 PM

Could you just tabulate the results for those of us too lazy to watch 30 minutes of video?

He did, he listed the address of this spreadsheet in the description of the video.
https://docs.google....1tI8/edit#gid=0



#7 KLund1 OFFLINE  

KLund1

    Moonsweeper

  • 478 posts

Posted Mon Nov 13, 2017 6:01 PM

I have not watched the video, but why no TRS-80, Apple II, Pet? Also, are there only 3rd party versions for Atari. If you take those out, things look a bit different. There must have been 3rd party manufacturer compatible versions for the other machines? Plus some not so compatible, like Atari MS Basic. Some of the more popular 3rd party Basics should have been included, Or possible in a next installment episode?



#8 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,662 posts
  • Location:10-0-11-00:02

Posted Mon Nov 13, 2017 7:17 PM

The point he made was clear, he used both built in and third party to benchmark them to see how they stacked up and the Atari took top honors with it's official MS basic and third party TurboBasicXL

Why no acorn why no vax why no  bbc micro spectrum ti calculator s-100?... Maybe he could not get every machine on the on earth....so he did what mostly everyone had within the that time period and grouped likewise tech as generality...

 

The TRS 80 model 102 was in the mix, Tandy Radio Shack model 102... you might wanna see the video, most people didn't have TRS in general till the late COCO era, I was the only one to have one in my geographic area for years, and that was for software dev purposes


Edited by _The Doctor__, Mon Nov 13, 2017 7:29 PM.


#9 krslam OFFLINE  

krslam

    Moonsweeper

  • 462 posts
  • Location:Seattle

Posted Mon Nov 13, 2017 9:10 PM

I'm surprised the C64 and Atari MS BASIC came out so close (79 vs 76 seconds), considering the Atari is clocked at 1.79MHz vs the 64's 1.02MHz.



#10 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 279 posts
  • Location:Viņa del Mar, Chile

Posted Mon Nov 13, 2017 9:22 PM

Hi!
 

Hello all,
 
I've recently completed a series of videos that compare the BASIC interpreter performance of a variety of 8-bit vintage computers.  I use prime number generation as the test bed.  I think folks in this forum will be pleasantly surprised by the results!
 
https://www.youtube....rm64PQifIOApqdp
 
Keep it vintage,
 
-Rusty


A really slow prime testing algorithm ;)

With my FastBasic, run-time is 11 seconds (NTSC Atari):
Screenshot from 2017-11-14 00-17-39.png

Program source is (note that FastBasic does not support GOTO):
Screenshot from 2017-11-14 00-20-35.png

#11 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Tue Nov 14, 2017 1:39 AM

Technically, you can skip checking even numbers since they are never prime.  
That cuts results in half even on stock BASICs.


The "IF I/J * J = I" test yielded strange results for me..
*edit*
Maybe it's due to different accuracy in the math libraries.


Edited by JamesD, Tue Nov 14, 2017 1:51 AM.


#12 pirx OFFLINE  

pirx

    Moonsweeper

  • 377 posts
  • Location:Poland

Posted Tue Nov 14, 2017 6:20 AM

additionally you need to check only J less or equal to square root of I (calculated once). Rough binary approximation may be used instead, just few binary operations for this and no loop. this is possibly not important here as this is not meant to be a fast algo, but a benchmark.

 

"IF I/J * J = I" makes sense only when using integer arithmetic and this is the case for FB.



#13 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 279 posts
  • Location:Viņa del Mar, Chile

Posted Tue Nov 14, 2017 8:50 AM

Hi!
 

additionally you need to check only J less or equal to square root of I (calculated once). Rough binary approximation may be used instead, just few binary operations for this and no loop. this is possibly not important here as this is not meant to be a fast algo, but a benchmark.
 
"IF I/J * J = I" makes sense only when using integer arithmetic and this is the case for FB.


Yes. Also, I wrote the code that way to better approximate the original. It's faster to simply compare " I MOD J = 0 " , giving a run-time of 7.6 seconds:
Screenshot from 2017-11-14 11-38-23.png

And for the small numbers in the test, actually using "INT(SQR(I))" as the limit slows down to more than 10 seconds, this is because on most numbers the FOR bails out a lot earlier than that (not primes). It starts to get faster than the full loop if you go to about 1500.

A much better speedup is using "STEP 2" in the FOR loops, avoiding testing even numbers, this gives 4 seconds of run-time:
Screenshot from 2017-11-14 11-39-17.png

The test program is:
Screenshot from 2017-11-14 11-40-28.png

But, as you said, this is only a simple benchmark of BASIC dialects, not a fast prime test.

#14 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Tue Nov 14, 2017 9:19 AM

...

"IF I/J * J = I" makes sense only when using integer arithmetic and this is the case for FB.

I would definitely call that a difference in precision!   :grin: 
It also explains the speed.

This would work on Apple's Integer BASIC, and on machines with Level II BASIC since they support integers as well as single and double precision.
That includes the TRS-80 Model I, III, IV, VZ-200, and VZ-300.
The IV and VZ should do well do to the higher clock speed.

I would expect BASIC-09 on the CoCo to do really well, it had impressive times on Ahl's benchmark, though it is a compiler and part of the reason it did so well was probably due to using the hardware multiply instruction in it's math library.  I don't know that for sure but it makes sense given the result.
 


Edited by JamesD, Tue Nov 14, 2017 9:21 AM.


#15 thorfdbg OFFLINE  

thorfdbg

    Dragonstomper

  • 739 posts

Posted Tue Nov 14, 2017 3:43 PM

 

I've recently completed a series of videos that compare the BASIC interpreter performance of a variety of 8-bit vintage computers.  I use prime number generation as the test bed.  I think folks in this forum will be pleasantly surprised by the results!

This test is mostly limited by the math pack, and the poor performance of the bad math pack implementation of the Ataris. Thus, I would say that it measures mostly the math speed,especially the division.



#16 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,662 posts
  • Location:10-0-11-00:02

Posted Tue Nov 14, 2017 3:47 PM

So provide him with the bestest fastest third party basics/math packs and let him have at with those.. :) no picking just offer it and your program as well :)

 

If they don't find ya handsome at least let em find ya handy!


Edited by _The Doctor__, Tue Nov 14, 2017 3:49 PM.


#17 Darklantha OFFLINE  

Darklantha

    Combat Commando

  • Topic Starter
  • 5 posts

Posted Tue Nov 14, 2017 9:43 PM

Hi folks,

 

This is exactly the kind of discussion that I was hoping the video series would spark.  Good stuff!  Yes, my prime number program is not optimized but I was interested in seeing how a bog standard, potentially naive, but highly compatible program might run on each BASIC.  Something that your average user might have written back in the day.

 

BTW, dmsc, your fast basic looks really awesome.


  • jhd likes this

#18 Darklantha OFFLINE  

Darklantha

    Combat Commando

  • Topic Starter
  • 5 posts

Posted Tue Nov 14, 2017 9:53 PM

I'm surprised the C64 and Atari MS BASIC came out so close (79 vs 76 seconds), considering the Atari is clocked at 1.79MHz vs the 64's 1.02MHz.

 

The built in Atari BASIC is notoriously slow.  Also, if I remember correctly, the Antic chip locks out the system bus on the Atari some percentage of the time thereby reducing the 6502's effective clock rate.  I am not sure about this though.  Hopefully, someone on this forum can provide a more accurate insight.



#19 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,662 posts
  • Location:10-0-11-00:02

Posted Wed Nov 15, 2017 10:42 AM

all the machines have lower effective clock rates as cycles get used for various things... ram video and some for sound etc.....

on some platforms people also claim their machines are doubling their clock rates and accessing ram 2x at both rise and fall but inspection with a decent logic analyzer quickly disproves such nonsense...

 

of course you can choose different graphics modes on the Atari to speed things up (3,4,5,6) or even shut antic off and get a boost that way.....custom displays also... If you want to explore some possibilities


Edited by _The Doctor__, Wed Nov 15, 2017 10:44 AM.


#20 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Wed Nov 15, 2017 4:46 PM

Level II BASIC seems to automatically switch to float if the numerator is less than the divisor. 
So you have to take the INT(I/J)*J=I which is slower due to the type conversions.
Apple's Integer BASIC should still work though.



#21 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Thu Nov 16, 2017 3:53 PM

This is the fastest version I tried, and it should work as is if a machine has a BASIC that supports ELSE, long enough lines, and Microsoft like syntax.

Line 0 can be deleted for anything other than a CoCo.  The TIMER related stuff is also CoCo specific, and should be replaced with any systems specific time code.

This algorithm only checks odd numbers for I since evens are never primes, and it only needs to check numbers for J up to I/2 since anything larger won't divide into I at least 2 times. 

Defining the variables at the top cut about a second off of the time.  This is common on all Microsoft BASICs.  
Placing most accessed variables at the start of the variable table first makes it quicker to find them, though I'm not sure if X or J should be first..
Deleting spaces would also make it faster but I left them in for readability.
*edit* Deleting spaces took almost a second off of the times below.

This takes about 32 seconds on my enhanced MC-10 BASIC, and the MC-10 is only clocked at .89 MHz.
I could run it on the CoCo but I have to type it in by hand since the VCC emulator doesn't have a quick type feature, but it should take about 20-22 seconds.

0 POKE 65495,0
10 X=0:J=0:I=0
100 CLS
110 PRINT "Welcome to the prime number generator."
115 N = 32767
120 INPUT "Upper limit (32767)";N
125 TIMER=0
130 FOR I = 3 TO N STEP 2:FOR J = 2 TO I/2:X=I/J:IF X <> INT(X) THEN NEXT J:PRINT I;:NEXT I:ELSE PRINT ".";:NEXT I
220 FINAL = TIMER/60
230 PRINT
240 PRINT "TIME ELAPSED IN SECONDS: ";FINAL
260 END

Edited by JamesD, Thu Nov 16, 2017 4:00 PM.


#22 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Thu Nov 16, 2017 5:04 PM

I figured SQR(I) would be slower than I/2 (when limit is 250) so I never tried it. 
That does not appear to be the case when using floating point!  LOL
The MC-10 finished it in around 10 1/2 seconds.

Maybe I/2 for integer BASICs when counting primes to less than 1500.
 


Edited by JamesD, Thu Nov 16, 2017 5:15 PM.


#23 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 279 posts
  • Location:Viņa del Mar, Chile

Posted Thu Nov 16, 2017 6:13 PM

Hi!
 

I figured SQR(I) would be slower than I/2 (when limit is 250) so I never tried it. 
That does not appear to be the case when using floating point!  LOL
The MC-10 finished it in around 10 1/2 seconds.

Maybe I/2 for integer BASICs when counting primes to less than 1500.


A better approximation for the integer square-root is the following:
 
? "Prime Number Generator"
Input "Upper Limit?",N
START=TIME
E=0 : L=0             ' Initial E/L
FOR I=3 TO N STEP 2

 ' Approximation for "E=INT(SQR(I))"
 IF L<1 : INC E : L=E: ENDIF
 DEC L

 FOR J=3 TO E STEP 2
  IF I MOD J = 0
   EXIT
  ENDIF
 NEXT J
 IF J>E
  ? I;".";
 ELSE
  ? ;"..";
 ENDIF
NEXT I

FINAL=TIME
?
? "Start Time: ";START
? "End Time: ";FINAL
? "Total: "; (FINAL-START)/60.0
Using this, I get 1.23 seconds:
Screenshot from 2017-11-16 21-09-05.png

Note that in FastBasic the comments and code structure does not affect runtime, but "INC X" and "DEC X" are faster than the corresponding "X=X+1" and "X=X-1".

Perhaps you could port FastBasic to the MC-10 (and the 6803 CPU), it will probably will be faster!.

Edited by dmsc, Thu Nov 16, 2017 6:16 PM.


#24 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Thu Nov 16, 2017 6:31 PM

An Apple II+ (emulation) runs this in about 12 1/2 seconds
 

10 X=0:J=0:I=0
100 HOME
110 PRINT "Welcome to the prime number generator."
115 N = 32767
120 INPUT "Upper limit (32767)";N
130 FORI=3TONSTEP2:FORJ=2TOSQR(I):X=I/J:IFX<>INT(X)THENNEXTJ:PRINTI;:NEXTI:GOTO230
140 PRINT".";:NEXTI
230 PRINT
240 PRINT "FINISHED"
260 END

The Plus/4 (emulation) takes just over 14 seconds. 
Adding support for more RAM really slowed down the interpreter on it and the C128.
The Plus/4 supports ELSE, but it's editor didn't like the long line.

10 X=0:J=0:I=0
110 PRINT "Welcome to the prime number generator."
115 N = 32767
120 INPUT "Upper limit (32767)";N
130 FORI=3TONSTEP2:FORJ=2TOSQR(I):X=I/J:IFX<>INT(X)THENNEXTJ:PRINTI;:NEXTI:GOTO230
140 PRINT".";:NEXTI
230 PRINT
240 PRINT "FINISHED"
260 END

The CoCo finishes it in 8.75 seconds.
If you remove spaces from the last CoCo version above (a previous post), line 130 must have a space between N and STEP2 due to a bug in the parser.
 


Edited by JamesD, Thu Nov 16, 2017 6:48 PM.


#25 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,750 posts
  • Location:Flyover State

Posted Thu Nov 16, 2017 6:45 PM

Hi!
 

A better approximation for the integer square-root is the following:
<snip>


Note that in FastBasic the comments and code structure does not affect runtime, but "INC X" and "DEC X" are faster than the corresponding "X=X+1" and "X=X-1".

Perhaps you could port FastBasic to the MC-10 (and the 6803 CPU), it will probably will be faster!.

 

Someone was working on a BASIC compiler for the MC-10 at one time.  I'll have to see if that's available.
I was thinking of porting the UCSD P-Machine to it so it could run any of the UCSD compilers.  It could take advantage of some of the RAM expansions and drivewire disk interface.

If I go beyond 8K for the BASIC ROM, I can use several faster math functions including SQR and divide.  

I figure even with floating point I might drop the time of this benchmark by a couple seconds.

The "Idiot Compiler" for the Apple II should make this really fast on it.  Old BASIC compilers for it sped up code by 30% or more and Idiot leaves those in the dust.
 







Also tagged with one or more of these keywords: BASIC, Performance, Primes

1 user(s) are browsing this forum

1 members, 0 guests, 0 anonymous users