Jump to content

Photo

BASIC Interpreter Speed Comparison: Prime Number Generation

BASIC Performance Primes

133 replies to this topic

#26 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 287 posts
  • Location:Viņa del Mar, Chile

Posted Thu Nov 16, 2017 7:10 PM

Hi!
 

Someone was working on a BASIC compiler for the MC-10 at one time.  I'll have to see if that's available.
I was thinking of porting the UCSD P-Machine to it so it could run any of the UCSD compilers.  It could take advantage of some of the RAM expansions and drivewire disk interface.
If I go beyond 8K for the BASIC ROM, I can use several faster math functions including SQR and divide.  
I figure even with floating point I might drop the time of this benchmark by a couple seconds.

The "Idiot Compiler" for the Apple II should make this really fast on it.  Old BASIC compilers for it sped up code by 30% or more and Idiot leaves those in the dust.


I looked at the P-Machine before writing the FastBasic runtime, but I decided to use simpler tokens for the parsed code, to minimize code size. Also, as the parser is designed to run in the 6502, the tokens are similar to the language.

Currently FastBasic for 6502 is 8.2K for the integer version and 9.4K for the floating point version, this includes the full-screen editor, the parser, the runtime and the compiler. I suspect that the 6803 code should be smaller.

Did you try my integer-SQR approximation?

#27 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,855 posts
  • Location:10-0-11-00:02

Posted Thu Nov 16, 2017 7:43 PM

Holy crap Atari at 1.23 seconds, now that is fast....

 

just for overkill, shits &giggles switch off antic then back on when done with resulting time...


Edited by _The Doctor__, Thu Nov 16, 2017 7:45 PM.


#28 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 287 posts
  • Location:Viņa del Mar, Chile

Posted Thu Nov 16, 2017 8:46 PM

Hi!,
 

Holy crap Atari at 1.23 seconds, now that is fast....
 
just for overkill, shits &giggles switch off antic then back on when done with resulting time...


Just 0.83 seconds (NTSC atari). Note that about 50% of the time is spent inside "PRINT" routine, not in the actual calculations.

This is the main program:
Screenshot from 2017-11-16 23-44-13.png

 
? "Prime Number Generator"
Input "Upper Limit?",N
START=TIME
POKE 559,0 ' Disable DMA
E=0 : L=0  ' Integer SQR approx.

FOR I=3 TO N STEP 2
 IF L<1 : INC E : L=E: ENDIF
 DEC L
 FOR J=3 TO E STEP 2
  IF I MOD J = 0
   EXIT
  ENDIF
 NEXT J
 IF J>E
  ? I;".";
 ELSE
  ? ;"..";
 ENDIF
NEXT I

POKE 559,34 ' Enable DMA
FINAL=TIME
? : ? "Start Time: ";START
? "End Time: ";FINAL
? "Total: "; (FINAL-START)/60.0

Edited by dmsc, Thu Nov 16, 2017 9:01 PM.


#29 MrFish ONLINE  

MrFish

    River Patroller

  • 4,273 posts
  • Location:1010-1010

Posted Thu Nov 16, 2017 9:24 PM

Note that about 50% of the time is spent inside "PRINT" routine, not in the actual calculations.

 

I was thinking about that earlier myself.

 

Why not do the calculations and save to variables/array (whatever), then stop the timer and print the results after.



#30 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Thu Nov 16, 2017 9:25 PM

Hi!
 

I looked at the P-Machine before writing the FastBasic runtime, but I decided to use simpler tokens for the parsed code, to minimize code size. Also, as the parser is designed to run in the 6502, the tokens are similar to the language.

Currently FastBasic for 6502 is 8.2K for the integer version and 9.4K for the floating point version, this includes the full-screen editor, the parser, the runtime and the compiler. I suspect that the 6803 code should be smaller.

Did you try my integer-SQR approximation?

Native code generation on the 6803 can be a bit inefficient unless you take an unusual approach, so bytecode compilers are a lot easier to make for it.
If I were making a bytecode interpreter from scratch there are several things I'd do differently than UCSD.
#1 being I'd switch to a register based rather than stack based machine because it's faster on machines without stack relative addressing, and especially those with a direct page.
It would also be easy to automate conversion from bytecodes to native code for speed critical code. It would be possible to have just in time compilation of code modules when they are loaded.
But there are existing compilers for Pascal, BASIC Fortran, C, and supposedly ADA, that support the UCSD P-Machine.  I have never found the last two though.
I'm certainly not against the idea of porting FastBasic.
I have also toyed with the idea of porting D-Pascal, which is a Pascal subset, it is available for the Apple II, C64, and Plus/4.

Code density is definitely better on the 6803 than the 6502.  The MC-10 only has an 8K ROM and the original Applesoft BASIC (not Applesoft II) is 12K.  There are definitely differences in what is supported, but that should give you an idea of how they compare. 

I haven't messed with any of the integer code beyond the initial test on the VZ.  I'm pretty sure it wouldn't matter on anything but Apple Integer BASIC, and I'd have to dig that up to test it.  It should run in a little over 3 seconds on a standard Apple II.  I probably couldn't even accurately time it by hand.  The MC-10 was bad enough already.  The IIc Plus would surely run it in under 1 second using integer BASIC, and the Idiot Compiler will probably be under 5 seconds with standard Applesoft II BASIC, I haven't tried it though so I could be wrong.
 


Edited by JamesD, Thu Nov 16, 2017 9:27 PM.


#31 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Fri Nov 17, 2017 9:23 AM

Using the SQR update and some creative code entry to fit as much on one line as the 64 character input buffer will hold... the VZ took about 19 seconds.

Not terrible, but over 2 times the CoCo.  It's almost 2 times the MC-10 even though it has almost 4x the clock speed.

I've been working on a disassembly of the VZ ROM and it's not very efficient code, so it's not really surprising.
ROM space was the concern rather than speed.
 



#32 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Fri Nov 17, 2017 11:13 AM

After some messing around, this seems to be the best combination for the VZ and it takes just over 18 seconds, but this is hand timing so I'm not sure exactly how much difference it is vs just using single precision or without initializing the variables.  Running versions side by side show a clear difference though.
If you don't initialize the variables like this, it is no faster than defining all variables as single precision numbers.

10 DEFSNG X:DEFINT J,I:X=0:J=0:I=0
100 CLS
110 PRINT "Welcome to the prime number generator."
115 N = 32767
120 INPUT "Upper limit (32767)";N
130 FORI=3TONSTEP2
140 FORJ=2TOSQR(I):X=I/J:IFX<>INT(X)NEXTJ:?I;:NEXTI:GOTO230
150 PRINT".";:NEXTI
230 PRINT
240 PRINT "Finished"
260 END


Edited by JamesD, Fri Nov 17, 2017 11:17 AM.


#33 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Fri Nov 17, 2017 11:16 AM

Holy crap Atari at 1.23 seconds, now that is fast....

 

just for overkill, shits &giggles switch off antic then back on when done with resulting time...

Keep in mind the other versions are using floating point, so technically, it's not doing the same work.
If you are using some math that doesn't work with integers, you couldn't get away with this sort of optimization.



#34 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,855 posts
  • Location:10-0-11-00:02

Posted Fri Nov 17, 2017 11:39 AM

what was the last fast math pack with unroll optimizations for the Atari, and has it been made a part of Basic/OS choices? I remember some work in those areas but don't know if they were completed or just slowly faded of thread lists and obscurity. TurboBasic XL and Microsoft Basic still did an impressive job.



#35 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Fri Nov 17, 2017 11:58 AM

what was the last fast math pack with unroll optimizations for the Atari, and has it been made a part of Basic/OS choices? I remember some work in those areas but don't know if they were completed or just slowly faded of thread lists and obscurity. TurboBasic XL and Microsoft Basic still did an impressive job.

OSS BASIC XE included a faster math library if I remember right, and it is faster than BASIC XL on other benchmarks.
I'm not sure how it compared to a strait fast math ROM.  The Ahl's Benchmark thread might shed some light on this.

I think the BASIC with Altirra and BASIC++ are faster than any Atari BASIC derivative from this test.
Some of the optimizations really don't come into play until you have hundreds of lines of code and the interpreter has to spend more time hunting for lines.

*edit*
If you really want speed, you have to go with some sort of compiler.
There's no searching for line numbers, no searching through the variable table, no parsing, no converting between ASCII and numeric values at runtime, etc...
Compilers with bytecode interpreters and constant use of ROM routines typically show at least a 30% speed increase.
Native code compilers are even faster.
No matter what optimizations you make to these interpreters, they will never equal a compiler.


Edited by JamesD, Fri Nov 17, 2017 12:08 PM.


#36 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Sat Nov 18, 2017 5:21 PM

...
I have also toyed with the idea of porting D-Pascal, which is a Pascal subset, it is available for the Apple II, C64, and Plus/4.
...
 

G-Pascal, not D-Pascal



#37 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 287 posts
  • Location:Viņa del Mar, Chile

Posted Sat Nov 18, 2017 9:04 PM

Hi!
 

Native code generation on the 6803 can be a bit inefficient unless you take an unusual approach, so bytecode compilers are a lot easier to make for it.
If I were making a bytecode interpreter from scratch there are several things I'd do differently than UCSD.
#1 being I'd switch to a register based rather than stack based machine because it's faster on machines without stack relative addressing, and especially those with a direct page.


I tried a register based byte-code, but it was slower because one of the most expensive operations in the interpreter is *reading* the byte-code, so minimizing the byte-code size make the code faster. In my interpreter, this is optimized by using a routine located in page zero, so the pointer and the load instructions are the same, this is the main interpreter loop (from https://github.com/d...interpreter.asm):
cload:  ldy     $1234           ;4
        inc     z:cload+1       ;5
        bne     adj             ;2
        inc     z:cload+2       ;1.016
adj:    sty     z:jump+1        ;3
ldsptr: ldy     #0              ;2
jump:   jmp     (OP_JUMP)       ;5 = total 27.016 cycles per call

stack_ptr               =       ldsptr+1
code_ptr                =       cload+1
The code jumps through the jump-table to the opcode execution, with the stack pointer in the Y register and the 16 bit top of the stack in A/X register pair, so operation on the top of the stack only modify A/X. As is, the interpreter can execute about 60k instructions per second.

One big optimization is using a split stack (with the low-part and high-part of the value stored at different locations).

In my test of a (simple) register based byte-code, the register-move operations were expensive, so the final code was slower.
 

It would also be easy to automate conversion from bytecodes to native code for speed critical code. It would be possible to have just in time compilation of code modules when they are loaded.


This is certainly possible with a stack-based byte-code, by keeping track of the stack pointer in the compiler you can generate machine code for a sequence, you only have to be careful with recursion. In FastBasic, the language does not support GOTOs to keep the stack always consistent.
 

But there are existing compilers for Pascal, BASIC Fortran, C, and supposedly ADA, that support the UCSD P-Machine.  I have never found the last two though.
I'm certainly not against the idea of porting FastBasic.
I have also toyed with the idea of porting D-Pascal, which is a Pascal subset, it is available for the Apple II, C64, and Plus/4.


Didn't knew G-Pascal, on a simple look it also uses a stack-based byte-code, but as a joined stack with a 16bit stack-pointer, making stack operations slow.
 

Code density is definitely better on the 6803 than the 6502.  The MC-10 only has an 8K ROM and the original Applesoft BASIC (not Applesoft II) is 12K.  There are definitely differences in what is supported, but that should give you an idea of how they compare.


I did a quick look to the opcodes, and I think that all 16bit math code would be a lot smaller, and the interpreter could keep the top-of-stack always in A/B registers, but I don't know if it's possible to implement the jump-table without using A/B (like in the 6502).

#38 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Sun Nov 19, 2017 12:57 AM

I know we have hijacked the thread, but I'll let it die after this comment.
 

Hi!
 

I tried a register based byte-code, but it was slower because one of the most expensive operations in the interpreter is *reading* the byte-code, so minimizing the byte-code size make the code faster. In my interpreter, this is optimized by using a routine located in page zero, so the pointer and the load instructions are the same, this is the main interpreter loop (from https://github.com/d...interpreter.asm):
<snip>
Didn't knew G-Pascal, on a simple look it also uses a stack-based byte-code, but as a joined stack with a 16bit stack-pointer, making stack operations slow.
 

I did a quick look to the opcodes, and I think that all 16bit math code would be a lot smaller, and the interpreter could keep the top-of-stack always in A/B registers, but I don't know if it's possible to implement the jump-table without using A/B (like in the 6502).

G-Pascal as a direct implementation of a Niklaus Wirth P-Machine but with editor and compiler coded in assembly instead of Pascal, and it is a subset for size and complexity purposes somewhat like Tiny Pascal.

16 bit index and data registers make a huge difference.  If the 6803 had a 2nd index register like the 6809, 68hc11, etc... you could dedicate an index register to the bytecode program counter.
The 68hc12 has 3 index registers.

I came up with the following code off the top of my head, but it should be accurate.
I think these even work on the 6800... except register D and where I mention the 6809.
 
 
Maintaining a pointer to the top of the stack is a no brainter 
You just push stuff onto the hardware stack, and for a stack relative addressing you grab a copy of the stack pointer into X and can index off of that
 

TSX
LDAA  0,X

LDD   3,X
STD 3,x

 
 
To pop the last call off the stack you can simply do this.  The 6809 can directly modify the stack pointer with LEAS and are way better at this.
  

TSX
LDAB  #NumberOfBytesToPop 
ABX
TXS
 

To avoid this sort of thing it's best to just pop parameters from the stack if possible which automatically adjusts the stack, and you don't need to modify X.

  
The 6809 has stack relative addressing and you just index off of the stack pointer.  It's almost cheating.
You can even use the user stack pointer if you want to keep a separate stack pointer from the OS..
 

Incrementing the program counter is as simple as incrementing X or loading B with the number of bytes the opcode required and then adding them to the program counter in X.
 

;bottom of a bytecode function that requires 3 bytes for parameters. 4 = opcode + parameters
 LDAB  #4
 JMP   MainLoop

 
Note that all opcodes requiring a single byte jump to SingleBytePCInc.  This should really be whatever is most common and B should be set accordingly.
For space purposes, there might be a TwoBytePCInc before this that BRAnches to MainLoop.  Apple Pascal does this.

This is the slow version. 
 

SingleBytePCInc:
LDAB #1
MainLoop:
LDX    ProgramCounter ;4
ABX ;3
STX     ProgramCounter ;4

LDAB  ,X ;get bytecode 4
ROLB ;multipy by 2 2
LDX   #BYTECODEPOINTERTABLE ;point to jump table 3
ABX ;add offset for bytecode 3
JMP 0,X ;call it 3

You don't have to maintain anything on the direct page other than keeping a copy of the program counter.
 
All the X shuffling is costly, but it would make other functions smaller and this avoids self modifying code.


The fast version uses self modifying code and assumes X contains the program counter on entry.
Any bytecode routine that modifies X would have to restore X to the program counter before exit .
The table must be on a 256 byte boundary,
 

AdjustPC:
ABX ;                               3
BRA MainLoop ;                      3

IncPC:
 INX ;increment the program counter 3

MainLoop:
 LDAB  ,X ;get next bytecode        4
 ROLB ;multipy by 2                 2
 STAB Jump+2 ;update jump address   4
Jump:
 JMP  BYTECODEPOINTERTABLE ; Jump   3

In spite of not having an instruction prefetch like the 6502, the 6803 is somewhat faster than the 6502 here, but some of that may be lost within a given bytecode routine to use X.  Even then, it only adds 8 clock cycles and still less than the 6502.  The code is also smaller than the 6502 code.

 



#39 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Sun Nov 19, 2017 1:20 AM

Actually, the program counter would usually be restored here most of the time it's modified.
Best case is 16 clock cycles

RestorePCAndAdjust:
 LDX ProgramCounter ;               4
AdjustPC:
 ABX ;                              3
 BRA MainLoop ;                     3

RestorePCAndInc:
 LDX ProgramCounter ;               4
IncPC:
 INX ;increment the program counter 3

MainLoop:
 LDAB  ,X ;get next bytecode        4
 ROLB ;multipy by 2                 2
 STAB Jump+2 ;update jump address   4
Jump:
 JMP  BYTECODEPOINTERTABLE ; Jump   3



Edited by JamesD, Sun Nov 19, 2017 1:23 AM.


#40 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,855 posts
  • Location:10-0-11-00:02

Posted Sun Nov 19, 2017 12:35 PM

I hate all the 'if' and 'most' of the time...... really don't like the uncertainty and then mixing two processor to make points...

if you stick with the concrete and make all the points for one processor then make all the points for the other processor it would make sense and we wouldn't end up with such stuff

 

usually it works most of the time but if this happens then it might or might not..... no way !


Edited by _The Doctor__, Sun Nov 19, 2017 12:37 PM.


#41 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Sun Nov 19, 2017 4:45 PM

I hate all the 'if' and 'most' of the time...... really don't like the uncertainty and then mixing two processor to make points...

if you stick with the concrete and make all the points for one processor then make all the points for the other processor it would make sense and we wouldn't end up with such stuff

 

usually it works most of the time but if this happens then it might or might not..... no way !

Haters gonna hate.

The kind of comparison you want would require me to I port his bytecode interpreter and I'm not sure I want to due that.  
It's a non-standard language and I could spend the same time porting E-BASIC or Pascal. 
I don't know if they are as fast, but the code would be more portable..

Just keep in mind we are comparing clock cycles and that's at the same MHz.   
But the Atari is clocked twice as fast.  The best case 16 clock cycles on the MC-10 is the same amount of time as 32 clock cycles on the Atari.  5 more than the best case on the 6502.
The MC-10 would still be executing this benchmark in under 2 seconds.
 



#42 _The Doctor__ OFFLINE  

_The Doctor__

    River Patroller

  • 2,855 posts
  • Location:10-0-11-00:02

Posted Sun Nov 19, 2017 7:11 PM

Go for it! everyone would enjoy having all of them ported over :)

 

sorry you feel that way, I can follow dmsc and he's provided good evidence. Very solid. his works all of the time. I'm sure you can post yours and we can run them on either real machines or emulation in a pinch, that would be pretty cool.

 

I'll have to dig out the computers again. Last time I had the MC-10 out was for taipan


Edited by _The Doctor__, Sun Nov 19, 2017 7:17 PM.


#43 Level42 OFFLINE  

Level42

    Stargunner

  • 1,643 posts
  • Location:Ridderkerk, The Netherlands

Posted Mon Nov 20, 2017 6:24 AM

I'm surprised the C64 and Atari MS BASIC came out so close (79 vs 76 seconds), considering the Atari is clocked at 1.79MHz vs the 64's 1.02MHz.


Interesting stuff. I usually show the original version of Dropzone to quite down C64 fanboys when it comes to speed......especially at NTSC :D

#44 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Mon Nov 20, 2017 9:48 AM

My enhanced BASIC on the MC-10 uses the hardware multiply for the floating point multiplication.
So if we can multiply by 1/n, it should be faster than dividing by n, but it's only faster if it can be pre-calculated such as with a constant since 1/n is in itself, a division.

Out of curiosity, I tried precalculating an array with 1/J values and then multiplying I by Array(J) instead of dividing by J.
The array lookups offset the faster math calculations on the MC-10, so it's pretty close to the same speed, possibly even slightly slower.

However, this made me realize there is some merit to this approach.
The proper way to do this, is to only store primes in the array, then divide with those in the inner loop which counts to the size of the array, .
If array(j) > SQR(I) we can stop testing, it's a prime.  If we reach the end of the array, we have to test SQR(I).
If it's a prime, add it to the array and bump the array size.
This eliminates duplication of effort.  For example, 6, 9, and 15, are multiples of 3 and testing anything but 3 is redundant.
It's the same reason we eliminated even numbers, they are all divisible by 2.
This may be too slow unless it's compiled, I haven't worked out the code yet.  It's worth a try.
If this is faster on a regular interpreter, then storing the numbers as 1/n and multiplying should make it faster on the MC-10.  



#45 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 287 posts
  • Location:Viņa del Mar, Chile

Posted Mon Nov 20, 2017 2:24 PM

Hi!
 

My enhanced BASIC on the MC-10 uses the hardware multiply for the floating point multiplication.
So if we can multiply by 1/n, it should be faster than dividing by n, but it's only faster if it can be pre-calculated such as with a constant since 1/n is in itself, a division.

Out of curiosity, I tried precalculating an array with 1/J values and then multiplying I by Array(J) instead of dividing by J.
The array lookups offset the faster math calculations on the MC-10, so it's pretty close to the same speed, possibly even slightly slower.

However, this made me realize there is some merit to this approach.
The proper way to do this, is to only store primes in the array, then divide with those in the inner loop which counts to the size of the array, .
If array(j) > SQR(I) we can stop testing, it's a prime.  If we reach the end of the array, we have to test SQR(I).
If it's a prime, add it to the array and bump the array size.
This eliminates duplication of effort.  For example, 6, 9, and 15, are multiples of 3 and testing anything but 3 is redundant.
It's the same reason we eliminated even numbers, they are all divisible by 2.
This may be too slow unless it's compiled, I haven't worked out the code yet.  It's worth a try.
If this is faster on a regular interpreter, then storing the numbers as 1/n and multiplying should make it faster on the MC-10.


After all that, I think you are so far of the original algorithm that's better to implement a proper sieve. This is the so called "sieve of Sundaram", that is an implementation of the Eratosthenes sieve that only stores odd numbers, with A(I)==Prime(2*I+3):
  ? "Prime Number Generator"
  Input "Upper Limit?",N
  sTime = TIME
  Top = (N-3)/2
  DIM A(Top+1) Byte
  ' Mark all as primes
  FOR I = 0 TO Top
    A(I) = 1
  NEXT I
  ' For each odd number,
  FOR I = 0 TO Top
    ' If 2*I+3 is prime:
    IF A(I)
      ' Mark all odd multiples as composite
      Prime = I + I + 3
      PRINT Prime;".";
      FOR K = I + Prime TO Top STEP Prime
        A(K) = 0
      NEXT K
    ELSE
      PRINT "..";
    ENDIF
  NEXT I
  eTime = TIME
  PRINT
  PRINT "Total: "; (eTime-sTime)/60.0
In my FastBasic, this takes less than one second to search up to 250, but about 50% of the time is spent inside "PRINT", and the percentage increases to 66% when searching primes up to 21846.

#46 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Mon Nov 20, 2017 7:37 PM

That takes around 3 - 3.5 seconds on the MC-10 without any kind of compiler, but I have to verify the accuracy of my port..  
*edit* Fixed a bug.
*edit* It's skipping 169 but the previous version said it was a prime.  I looked it up and 169 is not a prime.
Not only is this faster, it's more accurate.  I'm not fixing the other one.

 

*edit*  The CoCo runs it in 2.217 seconds.  I haven't tried defining the variables at the top yet.

*edit* 2.183 for the coco and around 3 for the MC-10.

Clearing the screen first drops it to 2.1 and 2.5 - 3 

 

10 POKE65497,0
20 K=0:T=0:P=0:I=0
30 CLS
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N
120  TIMER=0
130  T = (N-3)/2
140  DIM A(T+1)
160  FOR I=0 TO T:A(I)=1:NEXT I
200  FOR I=0 TO T
220   IF A(I) THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN FOR K=K TO T STEP P:A(K)=0:NEXT K:ELSE ELSE PRINT "..";
320  NEXT I
330  eTime = TIMER/60
340  PRINT
350  PRINT "Total: ";eTime
360 END

Edited by JamesD, Mon Nov 20, 2017 8:33 PM.


#47 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Mon Nov 20, 2017 11:00 PM

The Plus/4 version.  4.216 seconds.

20 K=0:T=0:P=0:I=0
30 SCNCLR
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N
120  eTime=TIME
130  T = (N-3)/2
140  DIM A(T+1)
160  FOR I=0 TO T:A(I)=1:NEXT I
200  FOR I=0 TO T
220   IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN 226 
225   PRINT "..";:NEXTI:GOTO330
226   FOR K=K TO T STEP P:A(K)=0:NEXT K,I
330  eTime = (TIME - eTime)/60
340  PRINT
350  PRINT "Total: ";eTime
360 END


Edited by JamesD, Mon Nov 20, 2017 11:25 PM.


#48 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Mon Nov 20, 2017 11:21 PM

Apple II, a little over 3 seconds.

20 K=0:T=0:P=0:I=0
30 HOME
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N
130  T = (N-3)/2
140  DIM A(T+1)
160  FOR I=0 TO T:A(I)=1:NEXT I
200  FOR I=0 TO T
220   IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN 226 
225   PRINT "..";:NEXTI:GOTO340
226   FOR K=K TO T STEP P:A(K)=0:NEXT K,I
340  PRINT
350  PRINT "FINISHED "
360 END


#49 dmsc OFFLINE  

dmsc

    Moonsweeper

  • 287 posts
  • Location:Viņa del Mar, Chile

Posted Tue Nov 21, 2017 6:07 AM

Hi!
 

Apple II, a little over 3 seconds.

20 K=0:T=0:P=0:I=0
30 HOME
100  PRINT "Prime Number Generator"
110  INPUT "Upper Limit";N
130  T = (N-3)/2
140  DIM A(T+1)
160  FOR I=0 TO T:A(I)=1:NEXT I
200  FOR I=0 TO T
220   IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN 226 

Following the nested IF/THEN in one line is not easy, but I think the last IF has a problem, as if the condition is not meet, this falls through the PRINT "..";, but you already did PRINT P;"."; , so you end with two extra dots.

Perhaps changing lines 220 and 226 with:
220   IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:GOTO226
226   IF K<=T THEN FOR K=K TO T STEP P:A(K)=0:NEXT K,I:GOTO340
227   NEXTI
Also, if this is MS Basic, I suppose you could begin the program with a "DEFINT A-Z" to simply use integer variables for all calculations?

#50 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,859 posts
  • Location:Flyover State

Posted Tue Nov 21, 2017 12:39 PM

Hi!
 
Following the nested IF/THEN in one line is not easy, but I think the last IF has a problem, as if the condition is not meet, this falls through the PRINT "..";, but you already did PRINT P;"."; , so you end with two extra dots.

Perhaps changing lines 220 and 226 with:

220   IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:GOTO226
226   IF K<=T THEN FOR K=K TO T STEP P:A(K)=0:NEXT K,I:GOTO340
227   NEXTI
Also, if this is MS Basic, I suppose you could begin the program with a "DEFINT A-Z" to simply use integer variables for all calculations?

 

Nice catch.  I saw the extra dots but ignored it.  It was getting late and printing 2 characters takes very little time. 
Without a system clock to time with I couldn't even say how much faster this is. 
It should definitely be a measurable difference though, especially if the screen has to scroll due to those dots..

I *think* you can use this but not 100% sure (untested).

The =1 was added when I was trying to track down a bug.
Applesoft does have a few syntax differences to be more compatible with Integer BASIC, so it may not though.

220   IF A(I) THEN P=I+I+3:PRINT P;".";:K=I+P:GOTO226

Applesoft doesn't support integers, only some Z80 versions and the Atari version of Microsoft BASIC do... at least I think the Atari version of Microsoft does.

Using int for the loop counters made the VZ version faster but only when I initialized the variables in the order I wanted them to appear in the variable table.

The 6803 and 6809 versions don't support integers either and their math library uses floating point numbers with an additional digit for greater accuracy.
In spite of what should be slower math routines, the enhanced MC-10 ROM shows they are clearly capable of performing faster floating point math when 16 bit registers are used
And not all math functions were optimized due to space limitations.
The 65816 would benefit from some of the same optimizations, but mode switching would steal some clock cycles limiting where you can use them.

*edit*
BTW, this is one of the best examples of why having the ELSE statement is important.


*edit*
I tried the fix on the Plus/4 version, but it skips the last prime.   The time was 4.2 though that's faster than a fully working version.  
So the dots add hundredths of a second to the result.
 


Edited by JamesD, Tue Nov 21, 2017 12:57 PM.






Also tagged with one or more of these keywords: BASIC, Performance, Primes

0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users