thorfdbg Posted December 5, 2017 Share Posted December 5, 2017 What do you mean by better math model? Microsoft Basic uses binary to represent floating point numbers, Atari Basic a kludgy decimal format with the base 100 which makes multiplications and divisions unbearably slow. The latter does not matter in this case - for additions and subtractions, it is somewhat ok. Quote Link to comment Share on other sites More sharing options...
JamesD Posted December 5, 2017 Share Posted December 5, 2017 Microsoft Basic uses binary to represent floating point numbers, Atari Basic a kludgy decimal format with the base 100 which makes multiplications and divisions unbearably slow. The latter does not matter in this case - for additions and subtractions, it is somewhat ok. You mean binary coded decimal? Well, Microsoft's math library isn't exactly the most efficient thing I've ever seen either. But standard Atari BASIC C turned in a time of 4.17 seconds vs 3.133 on the C64 and the Atari is clocked faster. Faster times that have been posted for the Atari are due to optimized math libs, or through the use of a compiler. Quote Link to comment Share on other sites More sharing options...
tecci06 Posted December 5, 2017 Share Posted December 5, 2017 (edited) had to do a change in the code for the ACORN Electron: 10CLS:CLEAR 20K%=0:I%=0:T%=0:P%=0 100PRINT"Prime Number Generator" 110INPUT"Upper Limit";N% 120ETIME=TIME 130T%=INT((N%-3)/2) 140DIMA%(T%+1) 160FORI%=0TOT%:A%(I%)=0:NEXT 200FORI%=0TOT%:IFA%(I%)THENPRINT"..";:NEXT:GOTO330 220P%=I%+I%+3:PRINT;P%;".";:K%=I%+P%:IFK%<=T%FORK%=K%TOT%STEPP%:A%(K%)=1:NEXT 260NEXT 330ETIME=(TIME-ETIME)/100 340PRINT:PRINT"Total: ";ETIME 360END Line 330: it has to divide by 100, not 60 to get the correct seconds. Did check this value by hand timing with a limit of 10000. Now the results are: Limit 250: 1,41Limit 10000: 52,11 For the C64, the division by 60 is correct. Edited December 5, 2017 by tecci06 Quote Link to comment Share on other sites More sharing options...
JamesD Posted December 5, 2017 Share Posted December 5, 2017 I'm guessing that will change the BBC Micro results as well. Quote Link to comment Share on other sites More sharing options...
_The Doctor__ Posted December 5, 2017 Share Posted December 5, 2017 (edited) standard Atari basic is not built in Atari basic... Atari's official Basics are BASIC A,B,C and MicroSoft Basic.... built in basic is something done for convenience later in the line, and was simply to get people started. It was implemented as it was ready to go first... Edited December 5, 2017 by _The Doctor__ Quote Link to comment Share on other sites More sharing options...
tecci06 Posted December 6, 2017 Share Posted December 6, 2017 (edited) I compiled the Plus/4 version with Austrospeed. With the blanking code: Limit 250 = 1.05 seconds Limit 10000 = 43.3333333 seconds Limit 20000 = 87.2333333 seconds Skipping the parser makes a huge difference, but it's obviously not as efficient as FastBASIC. It's probably making ROM calls and bankswitching between ROM and RAM. It still handles larger arrays. Made this on the Commodore 64 with Basic-Boss Compiler V2.4 Limit 250: 0,33 seconds Limit 10000: 26,283 seconds Limit 20000: 53,7 seconds Edited December 6, 2017 by tecci06 Quote Link to comment Share on other sites More sharing options...
JamesD Posted December 6, 2017 Share Posted December 6, 2017 (edited) Last night I created a simple patch for the Plus/4's CHRGOT code that reads the next byte of the program because it banks in and out memory for every byte.The result is about 4% faster than the original code here, but it still banks in and out RAM/ROM when copying variables to/from the floating point registers. There is still more speed to be had with more patching but I don't know what % improvement it would offer.Another 4% would certainly be worth pursuing, but it's not going to make the Plus/4 one of the leaders on this benchmark.It would need another 50% to get into that category and I think that would require a new ROM if it's even possible.Standard BASIC without screen blanking:Limit 250 3.61666667 Limit 10000 147.216667Limit 20000 295.65 Patched CHRGOT without screen blanking: Limit 250 3.48333333Limit 10000 141.833333Limit 20000 284.866667Patched CHRGOT with screen blankingLimit 250 2.31666667Limit 10000 94.51966667 Limit 20000 189.8*NOTE: Subsequent runs turned in slightly different times. It may depend on where the clock was when the program started. Edited December 6, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted December 6, 2017 Share Posted December 6, 2017 (edited) Plus/4 code. The latest version of the benchmark is listed first, then the added patch (which only needs to be executed once after startup), and then the code to blank the screen.The patch actually has to be deleted (at least lines 0 and 5) before you can run the code additional times 10 K=0:I=0:T=0:P=0 30 SCNCLR 100 PRINT "Prime Number Generator" 110 INPUT "Upper Limit";N 120 eTime=TIME 130 T=(N-3)/2 140 DIMA(T+1) 160 FORI=0TOT:A(I)=0:NEXT 200 FORI=0TOT:IFA(I)THENPRINT"..";:NEXT:GOTO330 210P=I+I+3:PRINTP;".";:K=I+P:IFK<=TTHENFORK=KTOTSTEPP:A(K)=1:NEXT:NEXT:GOTO330 260 NEXT 330 eTime=(TIME-eTime)/60 340 PRINT 350 PRINT "Total: ";eTime 360 END 0 REM012345678901234567890123456789012 5 A=4102:FORI=0 TO 30:READ T:POKE A+I,T :NEXT I:SYS A 10000 DATA 160,18,185,18,16,153,121,4,136,16,247,96 10010 DATA 160,0,177,59,201,58,144,1,96,233,47,201 10020 DATA 240,240,235,56,233,208,96 115 POKE65286,PEEK(65286)AND239 335 POKE65286,PEEK(65286)OR16 Edited December 6, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted December 7, 2017 Share Posted December 7, 2017 (edited) A little fix to the patch so you don't have to delete line 0. You can delete line 1 and the lines with DATA statements after the first RUN. Then you can save the code with the patch already embedded in line 0. 0 REM012345678901234567890123456789012345 1 FORI=0 TO 35:READ T:POKE 4102+I,T :NEXT I 2 SYS 4102 10000 DATA 160,18,185,23,16,153,121,4,136,16,247,200,152,153,122,4,96 10010 DATA 160,01,177,59,201,58,144,1,96,233,47,201 10020 DATA 240,240,235,56,233,208,96 Edited December 7, 2017 by JamesD Quote Link to comment Share on other sites More sharing options...
evilmoo Posted December 17, 2017 Share Posted December 17, 2017 I'm not sure if anyone has taken this into account or not, but as you test each number, you only have to test with the primes you've found so far. For example, if you've already determined the number isn't divisible by three, there's no way it's divisible by 9, 15, 21, 33, etc. So if you make the effort to save your discovered primes, you can minimize your division attempts. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted December 17, 2017 Share Posted December 17, 2017 I'm not sure if anyone has taken this into account or not, but as you test each number, you only have to test with the primes you've found so far. For example, if you've already determined the number isn't divisible by three, there's no way it's divisible by 9, 15, 21, 33, etc. So if you make the effort to save your discovered primes, you can minimize your division attempts. The more recent code uses a large array to flag which numbers and their multiples have been found already. That's the limiting factor on largest prime number the machines can calculate with this code. Quote Link to comment Share on other sites More sharing options...
evilmoo Posted December 17, 2017 Share Posted December 17, 2017 The more recent code uses a large array to flag which numbers and their multiples have been found already. That's the limiting factor on largest prime number the machines can calculate with this code. You can further encode that data by saving the distance from the previous prime, rather than all the possible numbers. so 3, 5, 7, 11, 13, 17, 19, 23, 29 becomes (start at 3) 2, 2, 4, 2, 4, 2, 4, 6 and since the distance is always even, you can divide it by 2 or 1,1,2,1,2,1,2,3. And at least on Atari, you can save it as a CHR$() in a string array rather than a numeric array. I'm not sure when the distance goes over 512, but that should keep you guys busy for a little while. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 17, 2018 Share Posted January 17, 2018 I ran across an article in a 1982 issue of BYTE magazine where they were comparing several Pascal compilers for CP/M.One of the benchmarks involved calculating the first 1000 primes. The fastest did it in 24.1 seconds. Twice what it takes several machines to do that in BASIC here. 1 Quote Link to comment Share on other sites More sharing options...
jvas Posted January 17, 2018 Share Posted January 17, 2018 I ran across an article in a 1982 issue of BYTE magazine where they were comparing several Pascal compilers for CP/M. One of the benchmarks involved calculating the first 1000 primes. The fastest did it in 24.1 seconds. Twice what it takes several machines to do that in BASIC here. If the very same algorithm had been implemented in pascal as in basic, it would have been faster, I bet. Quote Link to comment Share on other sites More sharing options...
JamesD Posted January 17, 2018 Share Posted January 17, 2018 If the very same algorithm had been implemented in pascal as in basic, it would have been faster, I bet. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted February 8, 2018 Share Posted February 8, 2018 My latest version of BASIC for the MC-10 finishes 10,000 in around 96 seconds. 1 Quote Link to comment Share on other sites More sharing options...
Faicuai Posted December 14, 2018 Share Posted December 14, 2018 (edited) Little old (since Feb.) but nonethelss VERY COOL thread! Thought some may be interested in latest figures with Altirra Basic 1.55, Fast Basic and Digital Research's CBasic-80 under CP/M (IndusGT+RamCharger). All this on i-800 (Incognito), running XL03-FP high-performance OS, and SDX 4.49c: Altirra Basic 1.55: N=250: 1.2166 secs. N=2500: 14.25 secs. N=10000: 59.2166 secs. FastBasic 3.6: N=250: 0.3833 secs. N=2500: 4.1 secs N=10000: 16.7333 secs. CBasic-80 CP/M: N=10000: 16.2000 secs. (outputting on SLOW, emulated terminal !!!) All of the above runs with ANTIC-DMA turned off, which I can control at will via OS (and keyboard) at any point during run-time (no need to reflect in benchmark code). I can also afford the luxury of turning Antic off INSIDE CP/M terminal emulation, without ever leaving, but to no effect, because the CP/M computer is handling screen output, directly, through its own BIOS. FastBasic results are really fast (even faster than C64 Basic Boss TRUE compiler) and all this with a puny 8-9KB footprint (!!!), and I just wonder how far the Atari could go with compiled machine-code, instead. In any case, CP/M CBasic compiler from DRI is the real boss here! Edited December 14, 2018 by Faicuai 2 Quote Link to comment Share on other sites More sharing options...
777ismyname Posted March 1, 2019 Share Posted March 1, 2019 I suggest you test all the computers in that video on Assembly for speed and for highest number of digits accuracy. You will find the TI99/4A does 10 digits and the rest top out at 6 to 8 at best. The reason TI99/4A would win hands down is 16 bit CPU vs the rest are all 8 bit CPU's. Do you have results for your RXB extended BASIC? I have a few TI99/4A in storage; I've got to get over there and get some retro gear and will be grabbing a TI to mod this spring and summer. Sent from my ASUS PadFone X using Tapatalk 1 Quote Link to comment Share on other sites More sharing options...
777ismyname Posted March 1, 2019 Share Posted March 1, 2019 Little old (since Feb.) but nonethelss VERY COOL thread! Thought some may be interested in latest figures with Altirra Basic 1.55, Fast Basic and Digital Research's CBasic-80 under CP/M (IndusGT+RamCharger). All this on i-800 (Incognito), running XL03-FP high-performance OS, and SDX 4.49c: Altirra Basic 1.55: N=250: 1.2166 secs. N=2500: 14.25 secs. N=10000: 59.2166 secs. FastBasic 3.6: N=250: 0.3833 secs. N=2500: 4.1 secs N=10000: 16.7333 secs. CBasic-80 CP/M: N=10000: 16.2000 secs. (outputting on SLOW, emulated terminal !!!) All of the above runs with ANTIC-DMA turned off, which I can control at will via OS (and keyboard) at any point during run-time (no need to reflect in benchmark code). I can also afford the luxury of turning Antic off INSIDE CP/M terminal emulation, without ever leaving, but to no effect, because the CP/M computer is handling screen output, directly, through its own BIOS. FastBasic results are really fast (even faster than C64 Basic Boss TRUE compiler) and all this with a puny 8-9KB footprint (!!!), and I just wonder how far the Atari could go with compiled machine-code, instead. In any case, CP/M CBasic compiler from DRI is the real boss here! Faicuai, is the source you used for Fast BASIC different than what dsmc/Daniel has on the Fast BASIC atr image? I'd be curious to see it run under 4.0, as well as in Rapidus mode dsmc has done an absolutely stellar job with FB, and is equally as nice and helpful; he is a top notch asset to the Atari community. Sent from my ASUS PadFone X using Tapatalk Quote Link to comment Share on other sites More sharing options...
RXB Posted March 2, 2019 Share Posted March 2, 2019 Do you have results for your RXB extended BASIC? I have a few TI99/4A in storage; I've got to get over there and get some retro gear and will be grabbing a TI to mod this spring and summer. Sent from my ASUS PadFone X using Tapatalk It is a known fact that 16 bit CPU in the TI99/4A can do double the number of decimal places due to 16 bits it twice 8 bit CPU's. This has always been a bragging point of the 9900 CPU as the only 16 bit CPU. Quote Link to comment Share on other sites More sharing options...
777ismyname Posted March 7, 2019 Share Posted March 7, 2019 (edited) It is a known fact that 16 bit CPU in the TI99/4A can do double the number of decimal places due to 16 bits it twice 8 bit CPU's. This has always been a bragging point of the 9900 CPU as the only 16 bit CPU. Ummm, okay? Do you have any comparative results for Sieve or similar benchmarks for RXB? As far as floating point math, for what we all use these for these days, the "bitness" between the 6502 and 9900 doesn't amount to a hill of beans. You should fire up an instance of Fast BASIC 4.0 in Altirra and check out the PI demo. It is quite amazing what it does calculating PI to a few hundred places and the speed of how it does it. Sincerely, this isn't an electronic dick measuring contest for number of bits of a given CPU, I am genuinely interested in knowing what I asked. I haven't cranked up a TI99 4A for BASIC - or any other programming ad far that goes - in over 30 years. I have an 8'x8' area that will be ready for some retro gear as soon as I get the desks and shelves built, and one of my TIs will have a dedicated spot. Sent from my Moto G (5) Plus using Tapatalk Edited March 7, 2019 by 777ismyname Quote Link to comment Share on other sites More sharing options...
dmsc Posted March 8, 2019 Share Posted March 8, 2019 Hi! Ummm, okay? Do you have any comparative results for Sieve or similar benchmarks for RXB? As far as floating point math, for what we all use these for these days, the "bitness" between the 6502 and 9900 doesn't amount to a hill of beans. You should fire up an instance of Fast BASIC 4.0 in Altirra and check out the PI demo. It is quite amazing what it does calculating PI to a few hundred places and the speed of how it does it. About that sample program (PI.BAS), it can be made twice as fast with very little changes, see current version at https://github.com/dmsc/fastbasic/blob/master/samples/int/pi.bas, calculating 254 digits in 134 seconds. Also, by using a different formula ( PI/4 = 12*ATN(1/18) + 8*ATN(1/18) - 5*ATN(1/18) ) you can reach 432 digits in 430 seconds, see attached program. PI-254.XEX PI-432.BAS PI-432.XEX 3 Quote Link to comment Share on other sites More sharing options...
777ismyname Posted March 8, 2019 Share Posted March 8, 2019 Hi! About that sample program (PI.BAS), it can be made twice as fast with very little changes, see current version at https://github.com/dmsc/fastbasic/blob/master/samples/int/pi.bas, calculating 254 digits in 134 seconds. Also, by using a different formula ( PI/4 = 12*ATN(1/18) + 8*ATN(1/18) - 5*ATN(1/18) ) you can reach 432 digits in 430 seconds, see attached program. You are the man, Daniel! I am setting up another laptop right now with Atari stuff and will try this out! I may port it to other BASICs and do another side by side video. Sent from my Moto G (5) Plus using Tapatalk 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted March 8, 2019 Share Posted March 8, 2019 (edited) You are the man, Daniel! I am setting up another laptop right now with Atari stuff and will try this out! I may port it to other BASICs and do another side by side video. Sent from my Moto G (5) Plus using Tapatalk I was talking about at the time in 1979 not today. Or 10 years later. Just giving you a history lesson and you did not need to be a jerk about it. Perfect example was in 1979 using PI everyone else vs TI99/4A that was 6 more decimal places. Yes many were faster but also totally less accurate. Having less bits in a CPU means you are forced to round up and down by half as many bits, just a physical limitation. Edited March 8, 2019 by RXB Quote Link to comment Share on other sites More sharing options...
dmsc Posted March 8, 2019 Share Posted March 8, 2019 Hi! I was talking about at the time in 1979 not today. Or 10 years later. Just giving you a history lesson and you did not need to be a jerk about it. Perfect example was in 1979 using PI everyone else vs TI99/4A that was 6 more decimal places. Yes many were faster but also totally less accurate. Having less bits in a CPU means you are forced to round up and down by half as many bits, just a physical limitation. Sorry, but that is not how it works, it is like saying that because you have only ten fingers you can only count up to 10. The advantage of a 16 bit CPU over an 8 bit one is that you can process twice the number of bits in one instruction, so you have the potential to make math operations faster. And of course, if you need more memory, having more bits in your registers allows to access that memory faster. The problem with the TMS9900 (the CPU inside the TI99/4A) is that it was really slow, taking too much cycles to perform any operation. For example, to add one 16 bit number in a register to a one in a stack (as used in FastBasic) you do: A *R1+, R0This takes 14 cycles for the operation, plus 1 cycle to read the instruction, 2 cycles to read R0 and R1, 4 cycles to read *R1, 8 cycles to increment R1 by two and 1 cycle to write the result to R0, so a grand total of 30 cycles! In 6502, it is the same as (from FastBasic source): clc adc stack_l, y pha txa adc stack_h, y tax pla inc sptr All of those 8 instructions execute in 2+4+3+2+4+2+4+5 = 26 cycles! So, even as the 6502 can only process 8 bits a time, it does it in less cycles that the TMS9900, negating any real advantage. And of course, many operations (like text processing) don't require 16 bits, so they are a lot faster. 4 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.