dmsc Posted November 17, 2017 Share Posted November 17, 2017 Hi! Someone was working on a BASIC compiler for the MC-10 at one time. I'll have to see if that's available. I was thinking of porting the UCSD P-Machine to it so it could run any of the UCSD compilers. It could take advantage of some of the RAM expansions and drivewire disk interface. If I go beyond 8K for the BASIC ROM, I can use several faster math functions including SQR and divide. I figure even with floating point I might drop the time of this benchmark by a couple seconds. The "Idiot Compiler" for the Apple II should make this really fast on it. Old BASIC compilers for it sped up code by 30% or more and Idiot leaves those in the dust. I looked at the P-Machine before writing the FastBasic runtime, but I decided to use simpler tokens for the parsed code, to minimize code size. Also, as the parser is designed to run in the 6502, the tokens are similar to the language. Currently FastBasic for 6502 is 8.2K for the integer version and 9.4K for the floating point version, this includes the full-screen editor, the parser, the runtime and the compiler. I suspect that the 6803 code should be smaller. Did you try my integer-SQR approximation? Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted November 17, 2017 Share Posted November 17, 2017 (edited) Holy crap Atari at 1.23 seconds, now that is fast.... just for overkill, shits &giggles switch off antic then back on when done with resulting time... Edited November 17, 2017 by _The Doctor__ Quote Link to comment Share on other sites More sharing options...
dmsc Posted November 17, 2017 Share Posted November 17, 2017 (edited) Hi!, Holy crap Atari at 1.23 seconds, now that is fast.... just for overkill, shits &giggles switch off antic then back on when done with resulting time... Just 0.83 seconds (NTSC atari). Note that about 50% of the time is spent inside "PRINT" routine, not in the actual calculations. This is the main program: ? "Prime Number Generator" Input "Upper Limit?",N START=TIME POKE 559,0 ' Disable DMA E=0 : L=0 ' Integer SQR approx. FOR I=3 TO N STEP 2 IF L<1 : INC E : L=E: ENDIF DEC L FOR J=3 TO E STEP 2 IF I MOD J = 0 EXIT ENDIF NEXT J IF J>E ? I;"."; ELSE ? ;".."; ENDIF NEXT I POKE 559,34 ' Enable DMA FINAL=TIME ? : ? "Start Time: ";START ? "End Time: ";FINAL ? "Total: "; (FINAL-START)/60.0 Edited November 17, 2017 by dmsc 2 Quote Link to comment Share on other sites More sharing options...
+MrFish Posted November 17, 2017 Share Posted November 17, 2017 Note that about 50% of the time is spent inside "PRINT" routine, not in the actual calculations. I was thinking about that earlier myself. Why not do the calculations and save to variables/array (whatever), then stop the timer and print the results after. 2 Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 17, 2017 Share Posted November 17, 2017 (edited) Hi! I looked at the P-Machine before writing the FastBasic runtime, but I decided to use simpler tokens for the parsed code, to minimize code size. Also, as the parser is designed to run in the 6502, the tokens are similar to the language. Currently FastBasic for 6502 is 8.2K for the integer version and 9.4K for the floating point version, this includes the full-screen editor, the parser, the runtime and the compiler. I suspect that the 6803 code should be smaller. Did you try my integer-SQR approximation? Native code generation on the 6803 can be a bit inefficient unless you take an unusual approach, so bytecode compilers are a lot easier to make for it. If I were making a bytecode interpreter from scratch there are several things I'd do differently than UCSD. #1 being I'd switch to a register based rather than stack based machine because it's faster on machines without stack relative addressing, and especially those with a direct page. It would also be easy to automate conversion from bytecodes to native code for speed critical code. It would be possible to have just in time compilation of code modules when they are loaded. But there are existing compilers for Pascal, BASIC Fortran, C, and supposedly ADA, that support the UCSD P-Machine. I have never found the last two though. I'm certainly not against the idea of porting FastBasic. I have also toyed with the idea of porting D-Pascal, which is a Pascal subset, it is available for the Apple II, C64, and Plus/4. Code density is definitely better on the 6803 than the 6502. The MC-10 only has an 8K ROM and the original Applesoft BASIC (not Applesoft II) is 12K. There are definitely differences in what is supported, but that should give you an idea of how they compare. I haven't messed with any of the integer code beyond the initial test on the VZ. I'm pretty sure it wouldn't matter on anything but Apple Integer BASIC, and I'd have to dig that up to test it. It should run in a little over 3 seconds on a standard Apple II. I probably couldn't even accurately time it by hand. The MC-10 was bad enough already. The IIc Plus would surely run it in under 1 second using integer BASIC, and the Idiot Compiler will probably be under 5 seconds with standard Applesoft II BASIC, I haven't tried it though so I could be wrong. Edited November 17, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 17, 2017 Share Posted November 17, 2017 Using the SQR update and some creative code entry to fit as much on one line as the 64 character input buffer will hold... the VZ took about 19 seconds. Not terrible, but over 2 times the CoCo. It's almost 2 times the MC-10 even though it has almost 4x the clock speed. I've been working on a disassembly of the VZ ROM and it's not very efficient code, so it's not really surprising.ROM space was the concern rather than speed. Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 17, 2017 Share Posted November 17, 2017 (edited) After some messing around, this seems to be the best combination for the VZ and it takes just over 18 seconds, but this is hand timing so I'm not sure exactly how much difference it is vs just using single precision or without initializing the variables. Running versions side by side show a clear difference though.If you don't initialize the variables like this, it is no faster than defining all variables as single precision numbers. 10 DEFSNG X:DEFINT J,I:X=0:J=0:I=0 100 CLS 110 PRINT "Welcome to the prime number generator." 115 N = 32767 120 INPUT "Upper limit (32767)";N 130 FORI=3TONSTEP2 140 FORJ=2TOSQR(I)=I/J:IFX<>INT(X)NEXTJ:?I;:NEXTI:GOTO230 150 PRINT".";:NEXTI 230 PRINT 240 PRINT "Finished" 260 END Edited November 17, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 17, 2017 Share Posted November 17, 2017 Holy crap Atari at 1.23 seconds, now that is fast.... just for overkill, shits &giggles switch off antic then back on when done with resulting time... Keep in mind the other versions are using floating point, so technically, it's not doing the same work. If you are using some math that doesn't work with integers, you couldn't get away with this sort of optimization. Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted November 17, 2017 Share Posted November 17, 2017 what was the last fast math pack with unroll optimizations for the Atari, and has it been made a part of Basic/OS choices? I remember some work in those areas but don't know if they were completed or just slowly faded of thread lists and obscurity. TurboBasic XL and Microsoft Basic still did an impressive job. Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 17, 2017 Share Posted November 17, 2017 (edited) what was the last fast math pack with unroll optimizations for the Atari, and has it been made a part of Basic/OS choices? I remember some work in those areas but don't know if they were completed or just slowly faded of thread lists and obscurity. TurboBasic XL and Microsoft Basic still did an impressive job. OSS BASIC XE included a faster math library if I remember right, and it is faster than BASIC XL on other benchmarks. I'm not sure how it compared to a strait fast math ROM. The Ahl's Benchmark thread might shed some light on this. I think the BASIC with Altirra and BASIC++ are faster than any Atari BASIC derivative from this test. Some of the optimizations really don't come into play until you have hundreds of lines of code and the interpreter has to spend more time hunting for lines. *edit* If you really want speed, you have to go with some sort of compiler. There's no searching for line numbers, no searching through the variable table, no parsing, no converting between ASCII and numeric values at runtime, etc... Compilers with bytecode interpreters and constant use of ROM routines typically show at least a 30% speed increase. Native code compilers are even faster. No matter what optimizations you make to these interpreters, they will never equal a compiler. Edited November 17, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 18, 2017 Share Posted November 18, 2017 ... I have also toyed with the idea of porting D-Pascal, which is a Pascal subset, it is available for the Apple II, C64, and Plus/4. ... G-Pascal, not D-Pascal Quote Link to comment Share on other sites More sharing options...
dmsc Posted November 19, 2017 Share Posted November 19, 2017 Hi! Native code generation on the 6803 can be a bit inefficient unless you take an unusual approach, so bytecode compilers are a lot easier to make for it. If I were making a bytecode interpreter from scratch there are several things I'd do differently than UCSD. #1 being I'd switch to a register based rather than stack based machine because it's faster on machines without stack relative addressing, and especially those with a direct page. I tried a register based byte-code, but it was slower because one of the most expensive operations in the interpreter is *reading* the byte-code, so minimizing the byte-code size make the code faster. In my interpreter, this is optimized by using a routine located in page zero, so the pointer and the load instructions are the same, this is the main interpreter loop (from https://github.com/dmsc/fastbasic/blob/master/src/interpreter.asm): cload: ldy $1234 ;4 inc z:cload+1 ;5 bne adj ;2 inc z:cload+2 ;1.016 adj: sty z:jump+1 ;3 ldsptr: ldy #0 ;2 jump: jmp (OP_JUMP) ;5 = total 27.016 cycles per call stack_ptr = ldsptr+1 code_ptr = cload+1 The code jumps through the jump-table to the opcode execution, with the stack pointer in the Y register and the 16 bit top of the stack in A/X register pair, so operation on the top of the stack only modify A/X. As is, the interpreter can execute about 60k instructions per second. One big optimization is using a split stack (with the low-part and high-part of the value stored at different locations). In my test of a (simple) register based byte-code, the register-move operations were expensive, so the final code was slower. It would also be easy to automate conversion from bytecodes to native code for speed critical code. It would be possible to have just in time compilation of code modules when they are loaded. This is certainly possible with a stack-based byte-code, by keeping track of the stack pointer in the compiler you can generate machine code for a sequence, you only have to be careful with recursion. In FastBasic, the language does not support GOTOs to keep the stack always consistent. But there are existing compilers for Pascal, BASIC Fortran, C, and supposedly ADA, that support the UCSD P-Machine. I have never found the last two though. I'm certainly not against the idea of porting FastBasic. I have also toyed with the idea of porting D-Pascal, which is a Pascal subset, it is available for the Apple II, C64, and Plus/4. Didn't knew G-Pascal, on a simple look it also uses a stack-based byte-code, but as a joined stack with a 16bit stack-pointer, making stack operations slow. Code density is definitely better on the 6803 than the 6502. The MC-10 only has an 8K ROM and the original Applesoft BASIC (not Applesoft II) is 12K. There are definitely differences in what is supported, but that should give you an idea of how they compare. I did a quick look to the opcodes, and I think that all 16bit math code would be a lot smaller, and the interpreter could keep the top-of-stack always in A/B registers, but I don't know if it's possible to implement the jump-table without using A/B (like in the 6502). 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 19, 2017 Share Posted November 19, 2017 I know we have hijacked the thread, but I'll let it die after this comment. Hi! I tried a register based byte-code, but it was slower because one of the most expensive operations in the interpreter is *reading* the byte-code, so minimizing the byte-code size make the code faster. In my interpreter, this is optimized by using a routine located in page zero, so the pointer and the load instructions are the same, this is the main interpreter loop (from https://github.com/dmsc/fastbasic/blob/master/src/interpreter.asm):<snip>Didn't knew G-Pascal, on a simple look it also uses a stack-based byte-code, but as a joined stack with a 16bit stack-pointer, making stack operations slow. I did a quick look to the opcodes, and I think that all 16bit math code would be a lot smaller, and the interpreter could keep the top-of-stack always in A/B registers, but I don't know if it's possible to implement the jump-table without using A/B (like in the 6502). G-Pascal as a direct implementation of a Niklaus Wirth P-Machine but with editor and compiler coded in assembly instead of Pascal, and it is a subset for size and complexity purposes somewhat like Tiny Pascal.16 bit index and data registers make a huge difference. If the 6803 had a 2nd index register like the 6809, 68hc11, etc... you could dedicate an index register to the bytecode program counter.The 68hc12 has 3 index registers.I came up with the following code off the top of my head, but it should be accurate.I think these even work on the 6800... except register D and where I mention the 6809. Maintaining a pointer to the top of the stack is a no brainter You just push stuff onto the hardware stack, and for a stack relative addressing you grab a copy of the stack pointer into X and can index off of that TSX LDAA 0,X LDD 3,X STD 3,x To pop the last call off the stack you can simply do this. The 6809 can directly modify the stack pointer with LEAS and are way better at this. TSX LDAB #NumberOfBytesToPop ABX TXS To avoid this sort of thing it's best to just pop parameters from the stack if possible which automatically adjusts the stack, and you don't need to modify X. The 6809 has stack relative addressing and you just index off of the stack pointer. It's almost cheating.You can even use the user stack pointer if you want to keep a separate stack pointer from the OS.. Incrementing the program counter is as simple as incrementing X or loading B with the number of bytes the opcode required and then adding them to the program counter in X. ;bottom of a bytecode function that requires 3 bytes for parameters. 4 = opcode + parameters LDAB #4 JMP MainLoop Note that all opcodes requiring a single byte jump to SingleBytePCInc. This should really be whatever is most common and B should be set accordingly.For space purposes, there might be a TwoBytePCInc before this that BRAnches to MainLoop. Apple Pascal does this.This is the slow version. SingleBytePCInc: LDAB #1 MainLoop: LDX ProgramCounter ;4 ABX ;3 STX ProgramCounter ;4 LDAB ,X ;get bytecode 4 ROLB ;multipy by 2 2 LDX #BYTECODEPOINTERTABLE ;point to jump table 3 ABX ;add offset for bytecode 3 JMP 0,X ;call it 3 You don't have to maintain anything on the direct page other than keeping a copy of the program counter. All the X shuffling is costly, but it would make other functions smaller and this avoids self modifying code.The fast version uses self modifying code and assumes X contains the program counter on entry.Any bytecode routine that modifies X would have to restore X to the program counter before exit .The table must be on a 256 byte boundary, AdjustPC: ABX ; 3 BRA MainLoop ; 3 IncPC: INX ;increment the program counter 3 MainLoop: LDAB ,X ;get next bytecode 4 ROLB ;multipy by 2 2 STAB Jump+2 ;update jump address 4 Jump: JMP BYTECODEPOINTERTABLE ; Jump 3 In spite of not having an instruction prefetch like the 6502, the 6803 is somewhat faster than the 6502 here, but some of that may be lost within a given bytecode routine to use X. Even then, it only adds 8 clock cycles and still less than the 6502. The code is also smaller than the 6502 code. Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 19, 2017 Share Posted November 19, 2017 (edited) Actually, the program counter would usually be restored here most of the time it's modified.Best case is 16 clock cycles RestorePCAndAdjust: LDX ProgramCounter ; 4 AdjustPC: ABX ; 3 BRA MainLoop ; 3 RestorePCAndInc: LDX ProgramCounter ; 4 IncPC: INX ;increment the program counter 3 MainLoop: LDAB ,X ;get next bytecode 4 ROLB ;multipy by 2 2 STAB Jump+2 ;update jump address 4 Jump: JMP BYTECODEPOINTERTABLE ; Jump 3 Edited November 19, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted November 19, 2017 Share Posted November 19, 2017 (edited) I hate all the 'if' and 'most' of the time...... really don't like the uncertainty and then mixing two processor to make points... if you stick with the concrete and make all the points for one processor then make all the points for the other processor it would make sense and we wouldn't end up with such stuff usually it works most of the time but if this happens then it might or might not..... no way ! Edited November 19, 2017 by _The Doctor__ Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 19, 2017 Share Posted November 19, 2017 I hate all the 'if' and 'most' of the time...... really don't like the uncertainty and then mixing two processor to make points... if you stick with the concrete and make all the points for one processor then make all the points for the other processor it would make sense and we wouldn't end up with such stuff usually it works most of the time but if this happens then it might or might not..... no way ! Haters gonna hate. The kind of comparison you want would require me to I port his bytecode interpreter and I'm not sure I want to due that. It's a non-standard language and I could spend the same time porting E-BASIC or Pascal. I don't know if they are as fast, but the code would be more portable.. Just keep in mind we are comparing clock cycles and that's at the same MHz. But the Atari is clocked twice as fast. The best case 16 clock cycles on the MC-10 is the same amount of time as 32 clock cycles on the Atari. 5 more than the best case on the 6502. The MC-10 would still be executing this benchmark in under 2 seconds. Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted November 20, 2017 Share Posted November 20, 2017 (edited) Go for it! everyone would enjoy having all of them ported over sorry you feel that way, I can follow dmsc and he's provided good evidence. Very solid. his works all of the time. I'm sure you can post yours and we can run them on either real machines or emulation in a pinch, that would be pretty cool. I'll have to dig out the computers again. Last time I had the MC-10 out was for taipan Edited November 20, 2017 by _The Doctor__ Quote Link to comment Share on other sites More sharing options...
Level42 Posted November 20, 2017 Share Posted November 20, 2017 I'm surprised the C64 and Atari MS BASIC came out so close (79 vs 76 seconds), considering the Atari is clocked at 1.79MHz vs the 64's 1.02MHz. Interesting stuff. I usually show the original version of Dropzone to quite down C64 fanboys when it comes to speed......especially at NTSC Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 20, 2017 Share Posted November 20, 2017 My enhanced BASIC on the MC-10 uses the hardware multiply for the floating point multiplication.So if we can multiply by 1/n, it should be faster than dividing by n, but it's only faster if it can be pre-calculated such as with a constant since 1/n is in itself, a division.Out of curiosity, I tried precalculating an array with 1/J values and then multiplying I by Array(J) instead of dividing by J.The array lookups offset the faster math calculations on the MC-10, so it's pretty close to the same speed, possibly even slightly slower.However, this made me realize there is some merit to this approach.The proper way to do this, is to only store primes in the array, then divide with those in the inner loop which counts to the size of the array, .If array(j) > SQR(I) we can stop testing, it's a prime. If we reach the end of the array, we have to test SQR(I).If it's a prime, add it to the array and bump the array size.This eliminates duplication of effort. For example, 6, 9, and 15, are multiples of 3 and testing anything but 3 is redundant.It's the same reason we eliminated even numbers, they are all divisible by 2.This may be too slow unless it's compiled, I haven't worked out the code yet. It's worth a try.If this is faster on a regular interpreter, then storing the numbers as 1/n and multiplying should make it faster on the MC-10. Quote Link to comment Share on other sites More sharing options...
dmsc Posted November 20, 2017 Share Posted November 20, 2017 Hi! My enhanced BASIC on the MC-10 uses the hardware multiply for the floating point multiplication. So if we can multiply by 1/n, it should be faster than dividing by n, but it's only faster if it can be pre-calculated such as with a constant since 1/n is in itself, a division. Out of curiosity, I tried precalculating an array with 1/J values and then multiplying I by Array(J) instead of dividing by J. The array lookups offset the faster math calculations on the MC-10, so it's pretty close to the same speed, possibly even slightly slower. However, this made me realize there is some merit to this approach. The proper way to do this, is to only store primes in the array, then divide with those in the inner loop which counts to the size of the array, . If array(j) > SQR(I) we can stop testing, it's a prime. If we reach the end of the array, we have to test SQR(I). If it's a prime, add it to the array and bump the array size. This eliminates duplication of effort. For example, 6, 9, and 15, are multiples of 3 and testing anything but 3 is redundant. It's the same reason we eliminated even numbers, they are all divisible by 2. This may be too slow unless it's compiled, I haven't worked out the code yet. It's worth a try. If this is faster on a regular interpreter, then storing the numbers as 1/n and multiplying should make it faster on the MC-10. After all that, I think you are so far of the original algorithm that's better to implement a proper sieve. This is the so called "sieve of Sundaram", that is an implementation of the Eratosthenes sieve that only stores odd numbers, with A(I)==Prime(2*I+3): ? "Prime Number Generator" Input "Upper Limit?",N sTime = TIME Top = (N-3)/2 DIM A(Top+1) Byte ' Mark all as primes FOR I = 0 TO Top A(I) = 1 NEXT I ' For each odd number, FOR I = 0 TO Top ' If 2*I+3 is prime: IF A(I) ' Mark all odd multiples as composite Prime = I + I + 3 PRINT Prime;"."; FOR K = I + Prime TO Top STEP Prime A(K) = 0 NEXT K ELSE PRINT ".."; ENDIF NEXT I eTime = TIME PRINT PRINT "Total: "; (eTime-sTime)/60.0In my FastBasic, this takes less than one second to search up to 250, but about 50% of the time is spent inside "PRINT", and the percentage increases to 66% when searching primes up to 21846. 2 Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 21, 2017 Share Posted November 21, 2017 (edited) That takes around 3 - 3.5 seconds on the MC-10 without any kind of compiler, but I have to verify the accuracy of my port.. *edit* Fixed a bug.*edit* It's skipping 169 but the previous version said it was a prime. I looked it up and 169 is not a prime.Not only is this faster, it's more accurate. I'm not fixing the other one. *edit* The CoCo runs it in 2.217 seconds. I haven't tried defining the variables at the top yet.*edit* 2.183 for the coco and around 3 for the MC-10. Clearing the screen first drops it to 2.1 and 2.5 - 3 10 POKE65497,0 20 K=0:T=0:P=0:I=0 30 CLS 100 PRINT "Prime Number Generator" 110 INPUT "Upper Limit";N 120 TIMER=0 130 T = (N-3)/2 140 DIM A(T+1) 160 FOR I=0 TO T:A(I)=1:NEXT I 200 FOR I=0 TO T 220 IF A(I) THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN FOR K=K TO T STEP P:A(K)=0:NEXT K:ELSE ELSE PRINT ".."; 320 NEXT I 330 eTime = TIMER/60 340 PRINT 350 PRINT "Total: ";eTime 360 END Edited November 21, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 21, 2017 Share Posted November 21, 2017 (edited) The Plus/4 version. 4.216 seconds. 20 K=0:T=0:P=0:I=0 30 SCNCLR 100 PRINT "Prime Number Generator" 110 INPUT "Upper Limit";N 120 eTime=TIME 130 T = (N-3)/2 140 DIM A(T+1) 160 FOR I=0 TO T:A(I)=1:NEXT I 200 FOR I=0 TO T 220 IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN 226 225 PRINT "..";:NEXTI:GOTO330 226 FOR K=K TO T STEP P:A(K)=0:NEXT K,I 330 eTime = (TIME - eTime)/60 340 PRINT 350 PRINT "Total: ";eTime 360 END Edited November 21, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 21, 2017 Share Posted November 21, 2017 Apple II, a little over 3 seconds. 20 K=0:T=0:P=0:I=0 30 HOME 100 PRINT "Prime Number Generator" 110 INPUT "Upper Limit";N 130 T = (N-3)/2 140 DIM A(T+1) 160 FOR I=0 TO T:A(I)=1:NEXT I 200 FOR I=0 TO T 220 IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN 226 225 PRINT "..";:NEXTI:GOTO340 226 FOR K=K TO T STEP P:A(K)=0:NEXT K,I 340 PRINT 350 PRINT "FINISHED " 360 END Quote Link to comment Share on other sites More sharing options...
dmsc Posted November 21, 2017 Share Posted November 21, 2017 Hi! Apple II, a little over 3 seconds. 20 K=0:T=0:P=0:I=0 30 HOME 100 PRINT "Prime Number Generator" 110 INPUT "Upper Limit";N 130 T = (N-3)/2 140 DIM A(T+1) 160 FOR I=0 TO T:A(I)=1:NEXT I 200 FOR I=0 TO T 220 IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:IF K<=T THEN 226 Following the nested IF/THEN in one line is not easy, but I think the last IF has a problem, as if the condition is not meet, this falls through the PRINT "..";, but you already did PRINT P;"."; , so you end with two extra dots. Perhaps changing lines 220 and 226 with: 220 IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:GOTO226 226 IF K<=T THEN FOR K=K TO T STEP P:A(K)=0:NEXT K,I:GOTO340 227 NEXTI Also, if this is MS Basic, I suppose you could begin the program with a "DEFINT A-Z" to simply use integer variables for all calculations? Quote Link to comment Share on other sites More sharing options...
JamesD Posted November 21, 2017 Share Posted November 21, 2017 (edited) Hi! Following the nested IF/THEN in one line is not easy, but I think the last IF has a problem, as if the condition is not meet, this falls through the PRINT "..";, but you already did PRINT P;"."; , so you end with two extra dots. Perhaps changing lines 220 and 226 with: 220 IF A(I)=1 THEN P=I+I+3:PRINT P;".";:K=I+P:GOTO226 226 IF K<=T THEN FOR K=K TO T STEP P:A(K)=0:NEXT K,I:GOTO340 227 NEXTI Also, if this is MS Basic, I suppose you could begin the program with a "DEFINT A-Z" to simply use integer variables for all calculations? Nice catch. I saw the extra dots but ignored it. It was getting late and printing 2 characters takes very little time. Without a system clock to time with I couldn't even say how much faster this is. It should definitely be a measurable difference though, especially if the screen has to scroll due to those dots.. I *think* you can use this but not 100% sure (untested). The =1 was added when I was trying to track down a bug. Applesoft does have a few syntax differences to be more compatible with Integer BASIC, so it may not though. 220 IF A(I) THEN P=I+I+3:PRINT P;".";:K=I+P:GOTO226 Applesoft doesn't support integers, only some Z80 versions and the Atari version of Microsoft BASIC do... at least I think the Atari version of Microsoft does. Using int for the loop counters made the VZ version faster but only when I initialized the variables in the order I wanted them to appear in the variable table. The 6803 and 6809 versions don't support integers either and their math library uses floating point numbers with an additional digit for greater accuracy. In spite of what should be slower math routines, the enhanced MC-10 ROM shows they are clearly capable of performing faster floating point math when 16 bit registers are used And not all math functions were optimized due to space limitations. The 65816 would benefit from some of the same optimizations, but mode switching would steal some clock cycles limiting where you can use them. *edit* BTW, this is one of the best examples of why having the ELSE statement is important. *edit* I tried the fix on the Plus/4 version, but it skips the last prime. The time was 4.2 though that's faster than a fully working version. So the dots add hundredths of a second to the result. Edited November 21, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.