BASIC Interpreter Speed Comparison: Prime Number Generation

thorfdbg · December 5, 2017

What do you mean by better math model?

Microsoft Basic uses binary to represent floating point numbers, Atari Basic a kludgy decimal format with the base 100 which makes multiplications and divisions unbearably slow. The latter does not matter in this case - for additions and subtractions, it is somewhat ok.

JamesD · December 5, 2017

Microsoft Basic uses binary to represent floating point numbers, Atari Basic a kludgy decimal format with the base 100 which makes multiplications and divisions unbearably slow. The latter does not matter in this case - for additions and subtractions, it is somewhat ok.

You mean binary coded decimal?

Well, Microsoft's math library isn't exactly the most efficient thing I've ever seen either.

But standard Atari BASIC C turned in a time of 4.17 seconds vs 3.133 on the C64 and the Atari is clocked faster.

Faster times that have been posted for the Atari are due to optimized math libs, or through the use of a compiler.

tecci06 · December 5, 2017

had to do a change in the code for the ACORN Electron:

10CLS:CLEAR
20K%=0:I%=0:T%=0:P%=0
100PRINT"Prime Number Generator"
110INPUT"Upper Limit";N%
120ETIME=TIME
130T%=INT((N%-3)/2)
140DIMA%(T%+1)
160FORI%=0TOT%:A%(I%)=0:NEXT
200FORI%=0TOT%:IFA%(I%)THENPRINT"..";:NEXT:GOTO330
220P%=I%+I%+3:PRINT;P%;".";:K%=I%+P%:IFK%<=T%FORK%=K%TOT%STEPP%:A%(K%)=1:NEXT
260NEXT
330ETIME=(TIME-ETIME)/100
340PRINT:PRINT"Total: ";ETIME
360END

Line 330: it has to divide by 100, not 60 to get the correct seconds.

Did check this value by hand timing with a limit of 10000.

Now the results are:

Limit 250: 1,41
Limit 10000: 52,11

For the C64, the division by 60 is correct.

Edited December 5, 2017 by tecci06

JamesD · December 5, 2017

I'm guessing that will change the BBC Micro results as well.

_The Doctor__ · December 5, 2017

standard Atari basic is not built in Atari basic... Atari's official Basics are BASIC A,B,C and MicroSoft Basic.... built in basic is something done for convenience later in the line, and was simply to get people started. It was implemented as it was ready to go first...

Edited December 5, 2017 by _The Doctor__

tecci06 · December 6, 2017

I compiled the Plus/4 version with Austrospeed.

With the blanking code:

Limit 250 = 1.05 seconds

Limit 10000 = 43.3333333 seconds

Limit 20000 = 87.2333333 seconds

Skipping the parser makes a huge difference, but it's obviously not as efficient as FastBASIC.

It's probably making ROM calls and bankswitching between ROM and RAM.

It still handles larger arrays.

Made this on the Commodore 64 with Basic-Boss Compiler V2.4

Limit 250: 0,33 seconds

Limit 10000: 26,283 seconds

Limit 20000: 53,7 seconds

Edited December 6, 2017 by tecci06

JamesD · December 6, 2017

Last night I created a simple patch for the Plus/4's CHRGOT code that reads the next byte of the program because it banks in and out memory for every byte.
The result is about 4% faster than the original code here, but it still banks in and out RAM/ROM when copying variables to/from the floating point registers.
There is still more speed to be had with more patching but I don't know what % improvement it would offer.
Another 4% would certainly be worth pursuing, but it's not going to make the Plus/4 one of the leaders on this benchmark.
It would need another 50% to get into that category and I think that would require a new ROM if it's even possible.

Standard BASIC without screen blanking:
Limit 250 3.61666667

Limit 10000 147.216667
Limit 20000 295.65

Patched CHRGOT without screen blanking:

Limit 250 3.48333333
Limit 10000 141.833333
Limit 20000 284.866667

Patched CHRGOT with screen blanking
Limit 250 2.31666667
Limit 10000 94.51966667

Limit 20000 189.8

*NOTE: Subsequent runs turned in slightly different times. It may depend on where the clock was when the program started.

Edited December 6, 2017 by JamesD

JamesD · December 6, 2017

Plus/4 code.
The latest version of the benchmark is listed first, then the added patch (which only needs to be executed once after startup), and then the code to blank the screen.
The patch actually has to be deleted (at least lines 0 and 5) before you can run the code additional times


10 K=0:I=0:T=0:P=0
30 SCNCLR
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N

120  eTime=TIME
130  T=(N-3)/2
140  DIMA(T+1)

160 FORI=0TOT:A(I)=0:NEXT
200 FORI=0TOT:IFA(I)THENPRINT"..";:NEXT:GOTO330
210P=I+I+3:PRINTP;".";:K=I+P:IFK<=TTHENFORK=KTOTSTEPP:A(K)=1:NEXT:NEXT:GOTO330
260 NEXT

330  eTime=(TIME-eTime)/60
340  PRINT
350  PRINT "Total: ";eTime
360 END

0 REM012345678901234567890123456789012
5 A=4102:FORI=0 TO 30:READ T:POKE A+I,T :NEXT I:SYS A

10000 DATA 160,18,185,18,16,153,121,4,136,16,247,96
10010 DATA 160,0,177,59,201,58,144,1,96,233,47,201
10020 DATA 240,240,235,56,233,208,96


115 POKE65286,PEEK(65286)AND239
335 POKE65286,PEEK(65286)OR16

Edited December 6, 2017 by JamesD

JamesD · December 7, 2017

A little fix to the patch so you don't have to delete line 0.

You can delete line 1 and the lines with DATA statements after the first RUN.

Then you can save the code with the patch already embedded in line 0.


0 REM012345678901234567890123456789012345
1 FORI=0 TO 35:READ T:POKE 4102+I,T :NEXT I
2 SYS 4102
 
10000 DATA 160,18,185,23,16,153,121,4,136,16,247,200,152,153,122,4,96
10010 DATA 160,01,177,59,201,58,144,1,96,233,47,201
10020 DATA 240,240,235,56,233,208,96

Edited December 7, 2017 by JamesD

evilmoo · December 17, 2017

I'm not sure if anyone has taken this into account or not, but as you test each number, you only have to test with the primes you've found so far. For example, if you've already determined the number isn't divisible by three, there's no way it's divisible by 9, 15, 21, 33, etc. So if you make the effort to save your discovered primes, you can minimize your division attempts.

JamesD · December 17, 2017

I'm not sure if anyone has taken this into account or not, but as you test each number, you only have to test with the primes you've found so far. For example, if you've already determined the number isn't divisible by three, there's no way it's divisible by 9, 15, 21, 33, etc. So if you make the effort to save your discovered primes, you can minimize your division attempts.

The more recent code uses a large array to flag which numbers and their multiples have been found already.

That's the limiting factor on largest prime number the machines can calculate with this code.

evilmoo · December 17, 2017

The more recent code uses a large array to flag which numbers and their multiples have been found already.

That's the limiting factor on largest prime number the machines can calculate with this code.

You can further encode that data by saving the distance from the previous prime, rather than all the possible numbers.

so 3, 5, 7, 11, 13, 17, 19, 23, 29 becomes

(start at 3) 2, 2, 4, 2, 4, 2, 4, 6

and since the distance is always even, you can divide it by 2

or 1,1,2,1,2,1,2,3.

And at least on Atari, you can save it as a CHR$() in a string array rather than a numeric array. I'm not sure when the distance goes over 512, but that should keep you guys busy for a little while.

JamesD · January 17, 2018

I ran across an article in a 1982 issue of BYTE magazine where they were comparing several Pascal compilers for CP/M.
One of the benchmarks involved calculating the first 1000 primes.
The fastest did it in 24.1 seconds. Twice what it takes several machines to do that in BASIC here.

jvas · January 17, 2018

I ran across an article in a 1982 issue of BYTE magazine where they were comparing several Pascal compilers for CP/M.

One of the benchmarks involved calculating the first 1000 primes.

The fastest did it in 24.1 seconds. Twice what it takes several machines to do that in BASIC here.

If the very same algorithm had been implemented in pascal as in basic, it would have been faster, I bet.

JamesD · January 17, 2018

If the very same algorithm had been implemented in pascal as in basic, it would have been faster, I bet.

JamesD · February 8, 2018

My latest version of BASIC for the MC-10 finishes 10,000 in around 96 seconds.

Faicuai · December 14, 2018

Little old (since Feb.) but nonethelss VERY COOL thread!

Thought some may be interested in latest figures with Altirra Basic 1.55, Fast Basic and Digital Research's CBasic-80 under CP/M (IndusGT+RamCharger). All this on i-800 (Incognito), running XL03-FP high-performance OS, and SDX 4.49c:

Altirra Basic 1.55:
- N=250: 1.2166 secs.
- N=2500: 14.25 secs.
- N=10000: 59.2166 secs.
FastBasic 3.6:
- N=250: 0.3833 secs.
- N=2500: 4.1 secs
- N=10000: 16.7333 secs.
CBasic-80 CP/M:
- N=10000: 16.2000 secs.
- (outputting on SLOW, emulated terminal !!!)

All of the above runs with ANTIC-DMA turned off, which I can control at will via OS (and keyboard) at any point during run-time (no need to reflect in benchmark code). I can also afford the luxury of turning Antic off INSIDE CP/M terminal emulation, without ever leaving, but to no effect, because the CP/M computer is handling screen output, directly, through its own BIOS.

FastBasic results are really fast (even faster than C64 Basic Boss TRUE compiler) and all this with a puny 8-9KB footprint (!!!), and I just wonder how far the Atari could go with compiled machine-code, instead. In any case, CP/M CBasic compiler from DRI is the real boss here!

Edited December 14, 2018 by Faicuai

777ismyname · March 1, 2019

I suggest you test all the computers in that video on Assembly for speed and for highest number of digits accuracy.

You will find the TI99/4A does 10 digits and the rest top out at 6 to 8 at best.

The reason TI99/4A would win hands down is 16 bit CPU vs the rest are all 8 bit CPU's.

Do you have results for your RXB extended BASIC? I have a few TI99/4A in storage; I've got to get over there and get some retro gear and will be grabbing a TI to mod this spring and summer.

Sent from my ASUS PadFone X using Tapatalk

777ismyname · March 1, 2019

Little old (since Feb.) but nonethelss VERY COOL thread!

Thought some may be interested in latest figures with Altirra Basic 1.55, Fast Basic and Digital Research's CBasic-80 under CP/M (IndusGT+RamCharger). All this on i-800 (Incognito), running XL03-FP high-performance OS, and SDX 4.49c:

Altirra Basic 1.55:

N=250: 1.2166 secs.

N=2500: 14.25 secs.

N=10000: 59.2166 secs.

FastBasic 3.6:

N=250: 0.3833 secs.

N=2500: 4.1 secs

N=10000: 16.7333 secs.

CBasic-80 CP/M:

N=10000: 16.2000 secs.

(outputting on SLOW, emulated terminal !!!)

All of the above runs with ANTIC-DMA turned off, which I can control at will via OS (and keyboard) at any point during run-time (no need to reflect in benchmark code). I can also afford the luxury of turning Antic off INSIDE CP/M terminal emulation, without ever leaving, but to no effect, because the CP/M computer is handling screen output, directly, through its own BIOS.

FastBasic results are really fast (even faster than C64 Basic Boss TRUE compiler) and all this with a puny 8-9KB footprint (!!!), and I just wonder how far the Atari could go with compiled machine-code, instead. In any case, CP/M CBasic compiler from DRI is the real boss here!

Faicuai, is the source you used for Fast BASIC different than what dsmc/Daniel has on the Fast BASIC atr image? I'd be curious to see it run under 4.0, as well as in Rapidus mode

dsmc has done an absolutely stellar job with FB, and is equally as nice and helpful; he is a top notch asset to the Atari community.

Sent from my ASUS PadFone X using Tapatalk

RXB · March 2, 2019

Do you have results for your RXB extended BASIC? I have a few TI99/4A in storage; I've got to get over there and get some retro gear and will be grabbing a TI to mod this spring and summer.

Sent from my ASUS PadFone X using Tapatalk

It is a known fact that 16 bit CPU in the TI99/4A can do double the number of decimal places due to 16 bits it twice 8 bit CPU's.

This has always been a bragging point of the 9900 CPU as the only 16 bit CPU.

777ismyname · March 7, 2019

It is a known fact that 16 bit CPU in the TI99/4A can do double the number of decimal places due to 16 bits it twice 8 bit CPU's.

This has always been a bragging point of the 9900 CPU as the only 16 bit CPU.

Ummm, okay? Do you have any comparative results for Sieve or similar benchmarks for RXB? As far as floating point math, for what we all use these for these days, the "bitness" between the 6502 and 9900 doesn't amount to a hill of beans.

You should fire up an instance of Fast BASIC 4.0 in Altirra and check out the PI demo. It is quite amazing what it does calculating PI to a few hundred places and the speed of how it does it.

Sincerely, this isn't an electronic dick measuring contest for number of bits of a given CPU, I am genuinely interested in knowing what I asked. I haven't cranked up a TI99 4A for BASIC - or any other programming ad far that goes - in over 30 years. I have an 8'x8' area that will be ready for some retro gear as soon as I get the desks and shelves built, and one of my TIs will have a dedicated spot.

Sent from my Moto G (5) Plus using Tapatalk

Edited March 7, 2019 by 777ismyname

dmsc · March 8, 2019

Hi!

Ummm, okay? Do you have any comparative results for Sieve or similar benchmarks for RXB? As far as floating point math, for what we all use these for these days, the "bitness" between the 6502 and 9900 doesn't amount to a hill of beans.

You should fire up an instance of Fast BASIC 4.0 in Altirra and check out the PI demo. It is quite amazing what it does calculating PI to a few hundred places and the speed of how it does it.

About that sample program (PI.BAS), it can be made twice as fast with very little changes, see current version at https://github.com/dmsc/fastbasic/blob/master/samples/int/pi.bas, calculating 254 digits in 134 seconds.

Also, by using a different formula ( PI/4 = 12*ATN(1/18) + 8*ATN(1/18) - 5*ATN(1/18) ) you can reach 432 digits in 430 seconds, see attached program.

PI-254.XEX

PI-432.BAS

PI-432.XEX

777ismyname · March 8, 2019

Hi!

About that sample program (PI.BAS), it can be made twice as fast with very little changes, see current version at https://github.com/dmsc/fastbasic/blob/master/samples/int/pi.bas, calculating 254 digits in 134 seconds.

Also, by using a different formula ( PI/4 = 12*ATN(1/18) + 8*ATN(1/18) - 5*ATN(1/18) ) you can reach 432 digits in 430 seconds, see attached program.

You are the man, Daniel! I am setting up another laptop right now with Atari stuff and will try this out! I may port it to other BASICs and do another side by side video.

Sent from my Moto G (5) Plus using Tapatalk

RXB · March 8, 2019

You are the man, Daniel! I am setting up another laptop right now with Atari stuff and will try this out! I may port it to other BASICs and do another side by side video.

Sent from my Moto G (5) Plus using Tapatalk

I was talking about at the time in 1979 not today.

Or 10 years later.

Just giving you a history lesson and you did not need to be a jerk about it.

Perfect example was in 1979 using PI everyone else vs TI99/4A that was 6 more decimal places.

Yes many were faster but also totally less accurate.

Having less bits in a CPU means you are forced to round up and down by half as many bits, just a physical limitation.

Edited March 8, 2019 by RXB

dmsc · March 8, 2019

Hi!

I was talking about at the time in 1979 not today.

Or 10 years later.

Just giving you a history lesson and you did not need to be a jerk about it.

Perfect example was in 1979 using PI everyone else vs TI99/4A that was 6 more decimal places.

Yes many were faster but also totally less accurate.

Having less bits in a CPU means you are forced to round up and down by half as many bits, just a physical limitation.

Sorry, but that is not how it works, it is like saying that because you have only ten fingers you can only count up to 10.

The advantage of a 16 bit CPU over an 8 bit one is that you can process twice the number of bits in one instruction, so you have the potential to make math operations faster. And of course, if you need more memory, having more bits in your registers allows to access that memory faster.

The problem with the TMS9900 (the CPU inside the TI99/4A) is that it was really slow, taking too much cycles to perform any operation. For example, to add one 16 bit number in a register to a one in a stack (as used in FastBasic) you do:

A *R1+, R0

This takes 14 cycles for the operation, plus 1 cycle to read the instruction, 2 cycles to read R0 and R1, 4 cycles to read *R1, 8 cycles to increment R1 by two and 1 cycle to write the result to R0, so a grand total of 30 cycles!

In 6502, it is the same as (from FastBasic source):

        clc
        adc     stack_l, y
        pha
        txa
        adc     stack_h, y
        tax
        pla
        inc     sptr

All of those 8 instructions execute in 2+4+3+2+4+2+4+5 = 26 cycles!

So, even as the 6502 can only process 8 bits a time, it does it in less cycles that the TMS9900, negating any real advantage. And of course, many operations (like text processing) don't require 16 bits, so they are a lot faster.

BASIC Interpreter Speed Comparison: Prime Number Generation

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members