Asmusr Posted December 4, 2020 Share Posted December 4, 2020 How come GCC is faster than assembly? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 4, 2020 Share Posted December 4, 2020 As Tursi said in a post here, GCC did some optimizations that he would not have considered normally. GCC is a monster and uses the latest ideas developed over 20+ years to make fast code including keeping up to 8 parameters in registers from what I understand (which is not much) 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted December 5, 2020 Share Posted December 5, 2020 8 hours ago, TheBF said: Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec 7 sec TurboForth 48 sec 29 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 780 sec It's not relevant when the intention is to compete with GCC, but the optimized Pascal program actually did reach 263 seconds. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 5, 2020 Share Posted December 5, 2020 Well then we shall update the official record. Thank you for keeping us honest. I am not sure we are "competing" with GCC but it does provide something of Gold Standard for flat out performance in compilers. However I bet you can squeeze a hell of lot more program into a given chunk of memory with UCSD Pascal. Byte code rules that space. Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec 7 sec TurboForth 48 sec 29 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 263 sec Quote Link to comment Share on other sites More sharing options...
GDMike Posted December 5, 2020 Share Posted December 5, 2020 3 hours ago, TheBF said: including keeping up to 8 parameters in registers Interesting ideas I may want to consider Quote Link to comment Share on other sites More sharing options...
apersson850 Posted December 5, 2020 Share Posted December 5, 2020 13 hours ago, TheBF said: However I bet you can squeeze a hell of lot more program into a given chunk of memory with UCSD Pascal. Byte code rules that space. That's all a question about what you want to benchmark. The traditional benchmark has always been about execution time. The UCSD Pascal concept, as implemented on the TI 99/4A, had two main design objectives. Portability between computer systems and ability to run in small memories. Today, when "every" computer either runs Windows or can emulate it, portability is not important. Today, when a single personal computer has more memory alone than all TI 99/4A computers ever sold had togethter, small memory is not important. But for the 99/4A, at least the second aspect is still important. In spite of various paged memory modules available today, it's still a 64 K address range CPU there inside. What if a "benchmark" could measure how much you have to implement yourself, to be able to run an application that doesn't fit in memory, but has to be loaded piecemeal from disk as it runs? An application where you can dynamically allocate a buffer space you need temporarily, even if that involved having to move a piece of code already loaded into memory, in the middle of running that code? An application which benefits from a library of general functions, loaded into memory only on demand, as well as another library of functions, developed for this application but otherwise working in the same way? An application where time critcal sections can be implemented in assembly language, with an assembler/linker that not only gives you access to parameters sent to the assembly routine, but also links to the program's global data, if needed, and can save its own data in the global data pool, between invocations, even if the assembly program has to be rolled out of memory on occasions, to free up memory for other stuff? You can of course do all this in assembly, Forth, in Extended BASIC (a bit awkward, probably, but doable). When you use Pascal with the UCSD p-system on the TI 99/4A, all this is supported by the system from day one. You just have to use it. The largest application I ran on my 99/4A (for a purpose, not just for fun) was 4000+ lines of source code, with a substantial data part in a four-way (!) linked list, processed recursively. Unfortunately, no benchmark can ever measure how well such an application is supported by the system. I ported the same program to a PC (using Turbo Pascal 4.0). It ran in a few more seconds on the PC as it took minutes on the 99/4A. But that's not the major thing. The major thing is that it does run on the TI too. Since Borland's Turbo Pascal 4.0 adapted several ideas from the UCSD Pascal 4.0, the code is almost identical too. Just a few system access calls (like reading a function key from the keyboard) are different. There's no benchmark to measure such a transfer of an application from the TI to another platform either. 4 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 25, 2021 Share Posted February 25, 2021 I came back here to see how my earlier systems did with the Tursi Benchmark. In Version 2.66 I made some improvements to the DO LOOP compiler and to my sprite library. Nice to see something made a difference but wow, waaay too much time invested. Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec 7 sec Camel99 Forth 47.3 28 sec TurboForth 48 sec 29 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 263 sec Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 26, 2021 Share Posted February 26, 2021 On 2/25/2021 at 12:06 AM, TheBF said: I came back here to see how my earlier systems did with the Tursi Benchmark. In Version 2.66 I made some improvements to the DO LOOP compiler and to my sprite library. Nice to see something made a difference but wow, waaay too much time invested. Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec 7 sec Camel99 Forth 47.3 28 sec TurboForth 48 sec 29 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 263 sec A new VDP driver tricks from @Matthew180 and @Jedimatt changed the Camel99 Numbers to 46.56 and 26.65 timed by hand. (My ELAPSE timer measures a little slower on VDP heavy code because interrupts are off so much.) And ... I have one extra level of optimization using INLINE[ ] and that gets down to 20.31 Inline re-compiles code primitives from the kernel into "super-instructions" that run without hitting the Forth list interpreter. It can't do loops yet. Is that cheating? : TURSI.INLINE 100 0 DO INLINE[ 239 0 ] DO INLINE[ I $301 VC! ] LOOP INLINE[ 175 0 ] DO INLINE[ I $300 VC! ] LOOP INLINE[ 0 239 ] DO INLINE[ I $301 VC! -1 ] +LOOP INLINE[ 0 175 ] DO INLINE[ I $300 VC! -1 ] +LOOP LOOP ; \ v2.66 21.43 v2.67 new vdp driver 20.31 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 (edited) Staying up way to late because this new DTC system is working well so I can't sleep. Running some old tests to see what happens. Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec 7 sec Camel99 (DTC) 43.2 sec 25 sec TurboForth 48 sec 29 sec Camel99 (ITC) 48 sec 28 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 263 sec ( vanilla Forth using new DSK1.DIRSPRIT library ) DECIMAL : TURSI.FORTH 100 0 DO 239 0 DO I 0 0 LOCATE LOOP 175 0 DO 239 I 0 LOCATE LOOP 0 239 DO I 175 0 LOCATE -1 +LOOP 0 175 DO 0 I 0 LOCATE -1 +LOOP LOOP ; HEX 300 CONSTANT $300 301 CONSTANT $301 DECIMAL ( more direct translation of Tursi ASM code to Forth) : TURSI.OPT 100 0 DO 239 0 DO I $301 VC! LOOP 175 0 DO I $300 VC! LOOP 0 239 DO I $301 VC! -1 +LOOP 0 175 DO I $300 VC! -1 +LOOP LOOP ; Edited April 22, 2022 by TheBF Edited to correct UCSD Pascal Optimized results 2 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted April 22, 2022 Share Posted April 22, 2022 Now you have that older table, where the optimized Pascal program isn't 263 seconds in here again. Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted April 22, 2022 Share Posted April 22, 2022 If anyone is interested here's the original Byte article regarding the benchmarking of the common programming languages of the time using the sieve of Eratosthenes and includes interesting comparative performance tables. I used it to benchmark the Pascal language on the ZX-81 recently (listing below). Erastosthenes sieve benchmark.pdf 5 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 2 hours ago, apersson850 said: Now you have that older table, where the optimized Pascal program isn't 263 seconds in here again. My Apology. Corrected Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 33 minutes ago, Vorticon said: If anyone is interested here's the original Byte article regarding the benchmarking of the common programming languages of the time using the sieve of Eratosthenes and includes interesting comparative performance tables. I used it to benchmark the Pascal language on the ZX-81 recently (listing below). Erastosthenes sieve benchmark.pdf 11.29 MB · 1 download Thanks for posting this. It's great to see the results in the article and the ads. I am amazed at the COBOL results when it is a compiled language. WT*? I remember discussion from back in the 20th century on comp.lang.forth. All the Forth guys would start talking like the four Yorkshiremen from Monty Python. "That's not 'ow I'd code a sieve" "Well if you ask me, that isn't even real Forth" "Yes well when I was a boy we wrote all our Forth words in Assembler like real programmers" "Assembler! You were lucky to 'ave Assembler. Why we wrote our code on the street with old piece of charcoal" "And if you tell the kids that today they won't believe it!" I will start a new Topic: Byte Sieve Benchmark. We can post normal code results and optimized results as with the Tursi benchmark . Everyone can add to it with their favourite language. Could I volunteer you do try the FORTRAN version when you have time, @VORTICON? @Pixelpendant might have to invent one for us in LOGO. The Byte mag. code may need adjustments for the local dialects but we should try and remain close to the original Byte listing. The optimized versions are open season. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 2 hours ago, apersson850 said: Now you have that older table, where the optimized Pascal program isn't 263 seconds in here again. Done Quote Link to comment Share on other sites More sharing options...
+Vorticon Posted April 22, 2022 Share Posted April 22, 2022 Incidentally, this topic has complete scans of most of the Byte issues. I've downloaded and archived every single one of them. The ads alone are worth their weight in gold, not to mention the in-depth articles that put to shame any modern computing magazine. 2 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 22, 2022 Share Posted April 22, 2022 Where is the GPL code for this as I think I could punch it up a little faster. Also where is the XB Code? FOUND IT! 100 CALL CLEAR 110 CALL MAGNIFY(2) 120 CALL SPRITE(#1,42,2,1,1) 130 CNT=100 140 FOR X=1 TO 240 :: CALL LOCATE(#1,1,X):: NEXT X 150 FOR Y=1 TO 176 :: CALL LOCATE(#1,Y,240):: NEXT Y 160 FOR X=240 TO 1 STEP -1 :: CALL LOCATE(#1,176,X):: NEXT X 170 FOR Y=176 TO 1 STEP -1 :: CALL LOCATE(#1,Y,1):: NEXT Y 180 CNT=CNT-1 :: IF CNT>0 THEN 140 190 END Hmm how come Sprite Auto motion is not being used? Could you think of a worse example for XB to move sprites in a single direction? Also why is line 130 not FOR CNT= 1 to 100 and line 180 not NEXT CNT ???? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 16 minutes ago, RXB said: Where is the GPL code for this as I think I could punch it up a little faster. Also where is the XB Code? FOUND IT! 100 CALL CLEAR 110 CALL MAGNIFY(2) 120 CALL SPRITE(#1,42,2,1,1) 130 CNT=100 140 FOR X=1 TO 240 :: CALL LOCATE(#1,1,X):: NEXT X 150 FOR Y=1 TO 176 :: CALL LOCATE(#1,Y,240):: NEXT Y 160 FOR X=240 TO 1 STEP -1 :: CALL LOCATE(#1,176,X):: NEXT X 170 FOR Y=176 TO 1 STEP -1 :: CALL LOCATE(#1,Y,1):: NEXT Y 180 CNT=CNT-1 :: IF CNT>0 THEN 140 190 END Hmm how come Sprite Auto motion is not being used? Could you think of a worse example for XB to move sprites in a single direction? Also why is line 130 not FOR CNT= 1 to 100 and line 180 not NEXT CNT ???? Super idea Rich. That could go in the Optimized column for XB. Write it up. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 22, 2022 Share Posted April 22, 2022 Tried this but XB COINC is just to freaking slow most of time: 100 CALL CLEAR 110 CALL MAGNIFY(2) 120 CALL SPRITE(#1,42,2,1,1) 130 FOR CNT=1 TO 100 140 CALL LOCATE(#1,1,1) :: CALL MOTION(#1,0,127) 141 CALL COINC(#1,1,240,8,X) :: IF X THEN 150 ELSE 141 150 CALL LOCATE(#1,1,240) :: CALL MOTION(#1,127,0) 151 CALL COINC(#1,176,1,8,Y) :: IF Y THEN 160 ELSE 151 160 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,0,-127) 161 CALL COINC(#1,176,1,8,X) :: IF X THEN 170 ELSE 161 170 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,-127,0) 171 CALL COINC(#1,1,1,8,Y) :: IF Y THEN 180 ELSE 171 180 NEXT CNT 190 END Wonder if RXB CALL COLLIDE would work better? Nope only way to make it work is slow sprites so you get a hit! Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 36 minutes ago, RXB said: Tried this but XB COINC is just to freaking slow most of time: 100 CALL CLEAR 110 CALL MAGNIFY(2) 120 CALL SPRITE(#1,42,2,1,1) 130 FOR CNT=1 TO 100 140 CALL LOCATE(#1,1,1) :: CALL MOTION(#1,0,127) 141 CALL COINC(#1,1,240,8,X) :: IF X THEN 150 ELSE 141 150 CALL LOCATE(#1,1,240) :: CALL MOTION(#1,127,0) 151 CALL COINC(#1,176,1,8,Y) :: IF Y THEN 160 ELSE 151 160 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,0,-127) 161 CALL COINC(#1,176,1,8,X) :: IF X THEN 170 ELSE 161 170 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,-127,0) 171 CALL COINC(#1,1,1,8,Y) :: IF Y THEN 180 ELSE 171 180 NEXT CNT 190 END Wonder if RXB CALL COLLIDE would work better? Nope only way to make it work is slow sprites so you get a hit! But can you make it spin the sprite faster than moving it manually? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 I modified your idea Rich to use CALL POSITION and it goes faster than using call LOCATE. (manual movement) If SPEED is more than 45 it misses sometimes. 100 CALL CLEAR 110 LET SPEED=45 120 CALL MAGNIFY(2) 130 CALL SPRITE(#1,42,2,1,1) 140 FOR CNT=1 TO 100 150 CALL LOCATE(#1,2,7):: CALL MOTION(#1,0,SPEED) 160 CALL POSITION(#1,ROW,COL):: IF COL<235 THEN 160 170 CALL LOCATE(#1,1,236):: CALL MOTION(#1,SPEED,0) 180 CALL POSITION(#1,ROW,COL):: IF ROW<171 THEN 180 190 CALL LOCATE(#1,172,236):: CALL MOTION(#1,0,SPEED*-1) 200 CALL POSITION(#1,ROW,COL):: IF COL>8 THEN 200 210 CALL LOCATE(#1,172,7):: CALL MOTION(#1,SPEED*-1,0) 220 CALL POSITION(#1,ROW,COL):: IF ROW>8 THEN 220 230 NEXT CNT 240 END 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 22, 2022 Share Posted April 22, 2022 Of course there is always another way... 100 REM TURSI'S BENCHMARK 110 CALL CLEAR 120 DISPLAY AT(10,10):"Extended Basic Rules!" 130 CALL MAGNIFY(2) 140 CALL SPRITE(#1,42,3,1,1) 150 FOR N=1 TO 100 160 FOR I=1 TO 240 STEP 15 170 CALL LOCATE(#1,1,I) 180 NEXT I 190 FOR I=1 TO 176 STEP 15 200 CALL LOCATE(#1,I,239) 210 NEXT I 220 FOR I=240 TO 1 STEP -15 230 CALL LOCATE(#1,176,I) 240 NEXT I 250 FOR I=176 TO 1 STEP -15 260 CALL LOCATE(#1,I,1) 270 NEXT I 280 NEXT N 290 END Cheat! Classic99 QI399.046 2022-04-22 16-24-16.mp4 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 22, 2022 Share Posted April 22, 2022 1 hour ago, TheBF said: But can you make it spin the sprite faster than moving it manually? No auto motion is way faster. Hardware is always faster then Software! Quote Link to comment Share on other sites More sharing options...
apersson850 Posted April 22, 2022 Share Posted April 22, 2022 It's no longer the same thing, though. In the implementation. If we only consider the looks, then it is. To truthfully follow the original you could select a speed which spends one interrupt per pixel. Which speed is that? Why code SPEED*-1 when -SPEED most certainly is faster? 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted April 22, 2022 Share Posted April 22, 2022 My first product every produced was WINDYXB here is a demo of it: 2022-04-22 14-41-37.mkv Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted April 22, 2022 Share Posted April 22, 2022 On 1/24/2016 at 11:29 AM, Lee Stewart said: I would like to revise the fbForth optimized code. The following is more in line with the TurboForth code I was attempting to port. It defines V! similar to how it is defined in TurboForth: Spoiler HEX ASM: V! *SP+ R0 MOV, ( pop addr) *SP+ R1 MOV, ( pop value) R1 SWPB, ( get LSB of value into MSB) 0 LIMI, ( disable interrupts) R0 4000 ORI, ( tell VDP processor "hey, this is a *write*") R0 SWPB, ( get low byte of address) R0 8C02 @() MOVB, ( write it to vdp address register) R0 SWPB, ( get high byte of address) R0 8C02 @() MOVB, ( write it) R1 8C00 @() MOVB, ( write payload) 2 LIMI, ( enable interrupts) ;ASM : TEST GRAPHICS PAGE 1 MAGNIFY 0 0 1 02A 0 SPRITE 064 0 DO 0EF 0 DO I 301 V! LOOP 0AF 0 DO I 300 V! LOOP 0 0EF DO I 301 V! -1 +LOOP 0 0AF DO I 300 V! -1 +LOOP LOOP MON ; DECIMAL This runs in 26 seconds! ...lee I modified the above code to the following: Spoiler HEX \ Sprite #0 x from loop index to Sprite Attribute Table ASM: SPR0IX! *RP R1 MOV, \ get index value R0 301 LI, \ sprite #0 x location R1 SWPB, \ get LSB of value into MSB 0 LIMI, \ disable interrupts R0 4000 ORI, \ tell VDP processor "hey, this is a *write*" R0 SWPB, \ get low byte of address R0 8C02 @() MOVB, \ write it to vdp address register R0 SWPB, \ get high byte of address R0 8C02 @() MOVB, \ write it R1 8C00 @() MOVB, \ write payload 2 LIMI, \ enable interrupts ;ASM \ Sprite #0 y from loop index to Sprite Attribute Table ASM: SPR0IY! *RP R1 MOV, \ get index value R0 300 LI, \ sprite #0 y location R1 SWPB, \ get LSB of value into MSB 0 LIMI, \ disable interrupts R0 4000 ORI, \ tell VDP processor "hey, this is a *write*" R0 SWPB, \ get low byte of address R0 8C02 @() MOVB, \ write it to vdp address register R0 SWPB, \ get high byte of address R0 8C02 @() MOVB, \ write it R1 8C00 @() MOVB, \ write payload 2 LIMI, \ enable interrupts ;ASM : TEST GRAPHICS PAGE 1 MAGNIFY \ magnified single size sprites 0 0 1 02A 0 SPRITE \ define sprite #0 064 0 DO 0EF 0 DO 301 SPR0IX! LOOP \ sprite right across top of screen 0AF 0 DO 300 SPR0IY! LOOP \ sprite down right side of screen 0 0EF DO 301 SPR0IX! -1 +LOOP \ sprite left across bottom of screen 0 0AF DO 300 SPR0IY! -1 +LOOP \ sprite up left side of screen LOOP BYE ; DECIMAL I changed V! to the very specific SPR0IX! and SPR0IY! because the first two instructions in each are specific to the x or y value of sprite #0 as the index to a DO ... LOOP and the x or y value of sprite #0’s position in the Sprite Attribute Table. Of course, this compromises generalization and I would say this is not in the spirit of Forth, but I wanted to show the speed difference that streamlining the interior of a loop can manage—the speed went down from 26 seconds to 18 seconds. ...lee 3 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.