Benchmarking Languages

Asmusr · December 4, 2020

How come GCC is faster than assembly?

+TheBF · December 4, 2020

As Tursi said in a post here, GCC did some optimizations that he would not have considered normally.

GCC is a monster and uses the latest ideas developed over 20+ years to make fast code including keeping up to 8 parameters in registers from what I understand

(which is not much)

apersson850 · December 5, 2020

8 hours ago, TheBF said:


Language     First Pass    Optimized
GCC            15 sec         5 sec
Assembly       17 sec         5 sec
Machine Forth  17 sec         7 sec
TurboForth     48 sec        29 sec
Compiled XB    51 sec       none yet
FbForth        70 sec        26 sec
GPL            80 sec       none yet
ABASIC        490 sec       none yet
XB           2000 sec      none yet
UCSD Pascal  7300 sec      780 sec

It's not relevant when the intention is to compete with GCC, but the optimized Pascal program actually did reach 263 seconds.

+TheBF · December 5, 2020

Well then we shall update the official record.

Thank you for keeping us honest. I am not sure we are "competing" with GCC but it does provide something of Gold Standard for flat out performance in compilers.

However I bet you can squeeze a hell of lot more program into a given chunk of memory with UCSD Pascal. Byte code rules that space.

Language     First Pass    Optimized
GCC            15 sec         5 sec
Assembly       17 sec         5 sec
Machine Forth  17 sec         7 sec
TurboForth     48 sec        29 sec
Compiled XB    51 sec       none yet
FbForth        70 sec        26 sec
GPL            80 sec       none yet
ABASIC        490 sec       none yet
XB           2000 sec      none yet
UCSD Pascal  7300 sec       263 sec

GDMike · December 5, 2020

3 hours ago, TheBF said:

including keeping up to 8 parameters in registers

Interesting ideas I may want to consider

apersson850 · December 5, 2020

13 hours ago, TheBF said:

However I bet you can squeeze a hell of lot more program into a given chunk of memory with UCSD Pascal. Byte code rules that space.

That's all a question about what you want to benchmark.

The traditional benchmark has always been about execution time.

The UCSD Pascal concept, as implemented on the TI 99/4A, had two main design objectives. Portability between computer systems and ability to run in small memories.

Today, when "every" computer either runs Windows or can emulate it, portability is not important.

Today, when a single personal computer has more memory alone than all TI 99/4A computers ever sold had togethter, small memory is not important.

But for the 99/4A, at least the second aspect is still important. In spite of various paged memory modules available today, it's still a 64 K address range CPU there inside.

What if a "benchmark" could measure how much you have to implement yourself, to be able to run an application that doesn't fit in memory, but has to be loaded piecemeal from disk as it runs?

An application where you can dynamically allocate a buffer space you need temporarily, even if that involved having to move a piece of code already loaded into memory, in the middle of running that code?

An application which benefits from a library of general functions, loaded into memory only on demand, as well as another library of functions, developed for this application but otherwise working in the same way?

An application where time critcal sections can be implemented in assembly language, with an assembler/linker that not only gives you access to parameters sent to the assembly routine, but also links to the program's global data, if needed, and can save its own data in the global data pool, between invocations, even if the assembly program has to be rolled out of memory on occasions, to free up memory for other stuff?

You can of course do all this in assembly, Forth, in Extended BASIC (a bit awkward, probably, but doable).

When you use Pascal with the UCSD p-system on the TI 99/4A, all this is supported by the system from day one. You just have to use it.

The largest application I ran on my 99/4A (for a purpose, not just for fun) was 4000+ lines of source code, with a substantial data part in a four-way (!) linked list, processed recursively.

Unfortunately, no benchmark can ever measure how well such an application is supported by the system.

I ported the same program to a PC (using Turbo Pascal 4.0). It ran in a few more seconds on the PC as it took minutes on the 99/4A. But that's not the major thing. The major thing is that it does run on the TI too. Since Borland's Turbo Pascal 4.0 adapted several ideas from the UCSD Pascal 4.0, the code is almost identical too. Just a few system access calls (like reading a function key from the keyboard) are different.

There's no benchmark to measure such a transfer of an application from the TI to another platform either.

+TheBF · February 25, 2021

I came back here to see how my earlier systems did with the Tursi Benchmark.

In Version 2.66 I made some improvements to the DO LOOP compiler and to my sprite library.

Nice to see something made a difference but wow, waaay too much time invested.

Language     First Pass    Optimized
GCC            15 sec         5 sec
Assembly       17 sec         5 sec
Machine Forth  17 sec         7 sec
Camel99 Forth  47.3          28 sec
TurboForth     48 sec        29 sec
Compiled XB    51 sec       none yet
FbForth        70 sec        26 sec
GPL            80 sec       none yet
ABASIC        490 sec       none yet
XB           2000 sec      none yet
UCSD Pascal  7300 sec       263 sec

+TheBF · February 26, 2021

On 2/25/2021 at 12:06 AM, TheBF said:
I came back here to see how my earlier systems did with the Tursi Benchmark.

In Version 2.66 I made some improvements to the DO LOOP compiler and to my sprite library.

Nice to see something made a difference but wow, waaay too much time invested.
Language     First Pass    Optimized
GCC            15 sec         5 sec
Assembly       17 sec         5 sec
Machine Forth  17 sec         7 sec
Camel99 Forth  47.3          28 sec
TurboForth     48 sec        29 sec
Compiled XB    51 sec       none yet
FbForth        70 sec        26 sec
GPL            80 sec       none yet
ABASIC        490 sec       none yet
XB           2000 sec      none yet
UCSD Pascal  7300 sec       263 sec

A new VDP driver tricks from @Matthew180 and @Jedimatt changed the Camel99 Numbers to 46.56 and 26.65 timed by hand.

(My ELAPSE timer measures a little slower on VDP heavy code because interrupts are off so much.)

And ... I have one extra level of optimization using INLINE[ ] and that gets down to 20.31

Inline re-compiles code primitives from the kernel into "super-instructions" that run without hitting the Forth list interpreter.

It can't do loops yet.

Is that cheating?

: TURSI.INLINE
      100 0
      DO
  INLINE[ 239 0 ] DO  INLINE[ I $301 VC! ]     LOOP
  INLINE[ 175 0 ] DO  INLINE[ I $300 VC! ]     LOOP
  INLINE[ 0 239 ] DO  INLINE[ I $301 VC! -1 ] +LOOP
  INLINE[ 0 175 ] DO  INLINE[ I $300 VC! -1 ] +LOOP
      LOOP ;
\ v2.66  21.43  v2.67 new vdp driver 20.31

+TheBF · April 22, 2022

Staying up way to late because this new DTC system is working well so I can't sleep.

Running some old tests to see what happens.

Language     First Pass    Optimized
GCC            15 sec         5 sec
Assembly       17 sec         5 sec
Machine Forth  17 sec         7 sec
Camel99 (DTC)  43.2 sec      25 sec
TurboForth     48 sec        29 sec
Camel99 (ITC)  48 sec        28 sec 
Compiled XB    51 sec       none yet
FbForth        70 sec        26 sec
GPL            80 sec       none yet
ABASIC        490 sec       none yet
XB           2000 sec      none yet
UCSD Pascal  7300 sec       263 sec

( vanilla Forth using new DSK1.DIRSPRIT library )
DECIMAL
: TURSI.FORTH
       100 0
       DO
           239 0 DO    I   0  0 LOCATE       LOOP
           175 0 DO  239   I  0 LOCATE       LOOP
           0 239 DO    I 175  0 LOCATE   -1 +LOOP
           0 175 DO    0   I  0 LOCATE   -1 +LOOP
       LOOP ;

HEX 300 CONSTANT $300
    301 CONSTANT $301

DECIMAL
( more direct translation of Tursi ASM code to Forth)
: TURSI.OPT
      100 0
      DO
           239 0 DO   I $301 VC!     LOOP
           175 0 DO   I $300 VC!     LOOP
           0 239 DO   I $301 VC! -1  +LOOP
           0 175 DO   I $300 VC! -1  +LOOP
      LOOP ;

Edited April 22, 2022 by TheBF
Edited to correct UCSD Pascal Optimized results

apersson850 · April 22, 2022

Now you have that older table, where the optimized Pascal program isn't 263 seconds in here again.

+Vorticon · April 22, 2022

If anyone is interested here's the original Byte article regarding the benchmarking of the common programming languages of the time using the sieve of Eratosthenes and includes interesting comparative performance tables. I used it to benchmark the Pascal language on the ZX-81 recently (listing below).

Erastosthenes sieve benchmark.pdf

+TheBF · April 22, 2022

2 hours ago, apersson850 said:

Now you have that older table, where the optimized Pascal program isn't 263 seconds in here again.

My Apology.

Corrected

+TheBF · April 22, 2022

33 minutes ago, Vorticon said:

If anyone is interested here's the original Byte article regarding the benchmarking of the common programming languages of the time using the sieve of Eratosthenes and includes interesting comparative performance tables. I used it to benchmark the Pascal language on the ZX-81 recently (listing below).

Erastosthenes sieve benchmark.pdf 11.29 MB · 1 download

Thanks for posting this. It's great to see the results in the article and the ads.

I am amazed at the COBOL results when it is a compiled language. WT*?

I remember discussion from back in the 20th century on comp.lang.forth.

All the Forth guys would start talking like the four Yorkshiremen from Monty Python.

"That's not 'ow I'd code a sieve"

"Well if you ask me, that isn't even real Forth"

"Yes well when I was a boy we wrote all our Forth words in Assembler like real programmers"

"Assembler! You were lucky to 'ave Assembler. Why we wrote our code on the street with old piece of charcoal"

"And if you tell the kids that today they won't believe it!"

I will start a new Topic: Byte Sieve Benchmark.

We can post normal code results and optimized results as with the Tursi benchmark .

Everyone can add to it with their favourite language.

Could I volunteer you do try the FORTRAN version when you have time, @VORTICON?

@Pixelpendant might have to invent one for us in LOGO.

The Byte mag. code may need adjustments for the local dialects but we should try and remain close to the original Byte listing. The optimized versions are open season.

+TheBF · April 22, 2022

2 hours ago, apersson850 said:

Now you have that older table, where the optimized Pascal program isn't 263 seconds in here again.

Done

+Vorticon · April 22, 2022

Incidentally, this topic has complete scans of most of the Byte issues. I've downloaded and archived every single one of them. The ads alone are worth their weight in gold, not to mention the in-depth articles that put to shame any modern computing magazine.

RXB · April 22, 2022

Where is the GPL code for this as I think I could punch it up a little faster.

Also where is the XB Code?

FOUND IT!

100 CALL CLEAR
110 CALL MAGNIFY(2)
120 CALL SPRITE(#1,42,2,1,1)
130 CNT=100
140 FOR X=1 TO 240 :: CALL LOCATE(#1,1,X):: NEXT X
150 FOR Y=1 TO 176 :: CALL LOCATE(#1,Y,240):: NEXT Y
160 FOR X=240 TO 1 STEP -1 :: CALL LOCATE(#1,176,X):: NEXT X
170 FOR Y=176 TO 1 STEP -1 :: CALL LOCATE(#1,Y,1):: NEXT Y
180 CNT=CNT-1 :: IF CNT>0 THEN 140
190 END

Hmm how come Sprite Auto motion is not being used?

Could you think of a worse example for XB to move sprites in a single direction?

Also why is line 130 not FOR CNT= 1 to 100 and line 180 not NEXT CNT ????

+TheBF · April 22, 2022

16 minutes ago, RXB said:
Where is the GPL code for this as I think I could punch it up a little faster.

Also where is the XB Code?

FOUND IT!
100 CALL CLEAR
110 CALL MAGNIFY(2)
120 CALL SPRITE(#1,42,2,1,1)
130 CNT=100
140 FOR X=1 TO 240 :: CALL LOCATE(#1,1,X):: NEXT X
150 FOR Y=1 TO 176 :: CALL LOCATE(#1,Y,240):: NEXT Y
160 FOR X=240 TO 1 STEP -1 :: CALL LOCATE(#1,176,X):: NEXT X
170 FOR Y=176 TO 1 STEP -1 :: CALL LOCATE(#1,Y,1):: NEXT Y
180 CNT=CNT-1 :: IF CNT>0 THEN 140
190 END
Hmm how come Sprite Auto motion is not being used?

Could you think of a worse example for XB to move sprites in a single direction?

Also why is line 130 not FOR CNT= 1 to 100 and line 180 not NEXT CNT ????

Super idea Rich. That could go in the Optimized column for XB. Write it up.

RXB · April 22, 2022

Tried this but XB COINC is just to freaking slow most of time:

100 CALL CLEAR
110 CALL MAGNIFY(2)
120 CALL SPRITE(#1,42,2,1,1)
130 FOR CNT=1 TO 100
140 CALL LOCATE(#1,1,1) :: CALL MOTION(#1,0,127)  
141 CALL COINC(#1,1,240,8,X) :: IF X THEN 150 ELSE 141
150 CALL LOCATE(#1,1,240) :: CALL MOTION(#1,127,0) 
151 CALL COINC(#1,176,1,8,Y) :: IF Y THEN 160 ELSE 151
160 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,0,-127)
161 CALL COINC(#1,176,1,8,X) :: IF X THEN 170 ELSE 161
170 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,-127,0)
171 CALL COINC(#1,1,1,8,Y) :: IF Y THEN 180 ELSE 171
180 NEXT CNT
190 END

Wonder if RXB CALL COLLIDE would work better?

Nope only way to make it work is slow sprites so you get a hit!

+TheBF · April 22, 2022

36 minutes ago, RXB said:

Tried this but XB COINC is just to freaking slow most of time:


100 CALL CLEAR
110 CALL MAGNIFY(2)
120 CALL SPRITE(#1,42,2,1,1)
130 FOR CNT=1 TO 100
140 CALL LOCATE(#1,1,1) :: CALL MOTION(#1,0,127)  
141 CALL COINC(#1,1,240,8,X) :: IF X THEN 150 ELSE 141
150 CALL LOCATE(#1,1,240) :: CALL MOTION(#1,127,0) 
151 CALL COINC(#1,176,1,8,Y) :: IF Y THEN 160 ELSE 151
160 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,0,-127)
161 CALL COINC(#1,176,1,8,X) :: IF X THEN 170 ELSE 161
170 CALL LOCATE(#1,176,240) :: CALL MOTION(#1,-127,0)
171 CALL COINC(#1,1,1,8,Y) :: IF Y THEN 180 ELSE 171
180 NEXT CNT
190 END

Wonder if RXB CALL COLLIDE would work better?

Nope only way to make it work is slow sprites so you get a hit!

But can you make it spin the sprite faster than moving it manually?

+TheBF · April 22, 2022

I modified your idea Rich to use CALL POSITION and it goes faster than using call LOCATE. (manual movement)

If SPEED is more than 45 it misses sometimes.


100 CALL CLEAR
110 LET SPEED=45
120 CALL MAGNIFY(2)
130 CALL SPRITE(#1,42,2,1,1)
140 FOR CNT=1 TO 100
150 CALL LOCATE(#1,2,7):: CALL MOTION(#1,0,SPEED)

160 CALL POSITION(#1,ROW,COL):: IF COL<235 THEN 160
170 CALL LOCATE(#1,1,236):: CALL MOTION(#1,SPEED,0)

180 CALL POSITION(#1,ROW,COL):: IF ROW<171 THEN 180
190 CALL LOCATE(#1,172,236):: CALL MOTION(#1,0,SPEED*-1)

200 CALL POSITION(#1,ROW,COL):: IF COL>8 THEN 200
210 CALL LOCATE(#1,172,7):: CALL MOTION(#1,SPEED*-1,0)

220 CALL POSITION(#1,ROW,COL):: IF ROW>8 THEN 220
230 NEXT CNT
240 END

+TheBF · April 22, 2022

Of course there is always another way...


100 REM TURSI'S BENCHMARK
110 CALL CLEAR
120 DISPLAY AT(10,10):"Extended Basic Rules!"
130 CALL MAGNIFY(2)
140 CALL SPRITE(#1,42,3,1,1)
150 FOR N=1 TO 100
160 FOR I=1 TO 240 STEP 15
170 CALL LOCATE(#1,1,I)
180 NEXT I
190 FOR I=1 TO 176 STEP 15
200 CALL LOCATE(#1,I,239)
210 NEXT I
220 FOR I=240 TO 1 STEP -15
230 CALL LOCATE(#1,176,I)
240 NEXT I
250 FOR I=176 TO 1 STEP -15
260 CALL LOCATE(#1,I,1)
270 NEXT I
280 NEXT N
290 END

Cheat!

RXB · April 22, 2022

1 hour ago, TheBF said:

But can you make it spin the sprite faster than moving it manually?

No auto motion is way faster. Hardware is always faster then Software!

apersson850 · April 22, 2022

It's no longer the same thing, though. In the implementation. If we only consider the looks, then it is.

To truthfully follow the original you could select a speed which spends one interrupt per pixel. Which speed is that?

Why code SPEED*-1 when -SPEED most certainly is faster?

RXB · April 22, 2022

My first product every produced was WINDYXB here is a demo of it:

2022-04-22 14-41-37.mkv

+Lee Stewart · April 22, 2022

On 1/24/2016 at 11:29 AM, Lee Stewart said:

I would like to revise the fbForth optimized code. The following is more in line with the TurboForth code I was attempting to port. It defines V! similar to how it is defined in TurboForth:

Spoiler



HEX
ASM: V!
   *SP+ R0 MOV,         ( pop addr)
   *SP+ R1 MOV,         ( pop value)
   R1 SWPB,             ( get LSB of value into MSB)
   0 LIMI,              ( disable interrupts)
   R0 4000 ORI,         ( tell VDP processor "hey, this is a *write*")
   R0 SWPB,             ( get low byte of address)
   R0 8C02 @() MOVB,    ( write it to vdp address register)
   R0 SWPB,             ( get high byte of address)
   R0 8C02 @() MOVB,    ( write it)
   R1 8C00 @() MOVB,    ( write payload)
   2 LIMI,              ( enable interrupts)
;ASM

: TEST
   GRAPHICS
   PAGE
   1 MAGNIFY
   0 0 1 02A 0 SPRITE
   064 0 DO
      0EF 0 DO I 301 V! LOOP
      0AF 0 DO I 300 V! LOOP
      0 0EF DO I 301 V! -1 +LOOP
      0 0AF DO I 300 V! -1 +LOOP
   LOOP
   MON
;
DECIMAL

This runs in 26 seconds! ...lee

I modified the above code to the following:

Spoiler


HEX
\ Sprite #0 x from loop index to Sprite Attribute Table
ASM: SPR0IX!
   *RP R1 MOV,          \ get index value
   R0 301 LI,           \ sprite #0 x location
   R1 SWPB,             \ get LSB of value into MSB
   0 LIMI,              \ disable interrupts
   R0 4000 ORI,         \ tell VDP processor "hey, this is a *write*"
   R0 SWPB,             \ get low byte of address
   R0 8C02 @() MOVB,    \ write it to vdp address register
   R0 SWPB,             \ get high byte of address
   R0 8C02 @() MOVB,    \ write it
   R1 8C00 @() MOVB,    \ write payload
   2 LIMI,              \ enable interrupts
;ASM

\ Sprite #0 y from loop index to Sprite Attribute Table
ASM: SPR0IY!   
   *RP R1 MOV,          \ get index value
   R0 300 LI,           \ sprite #0 y location
   R1 SWPB,             \ get LSB of value into MSB
   0 LIMI,              \ disable interrupts
   R0 4000 ORI,         \ tell VDP processor "hey, this is a *write*"
   R0 SWPB,             \ get low byte of address
   R0 8C02 @() MOVB,    \ write it to vdp address register
   R0 SWPB,             \ get high byte of address
   R0 8C02 @() MOVB,    \ write it
   R1 8C00 @() MOVB,    \ write payload
   2 LIMI,              \ enable interrupts
;ASM

: TEST
   GRAPHICS
   PAGE
   1 MAGNIFY                           \ magnified single size sprites
   0 0 1 02A 0 SPRITE                  \ define sprite #0
   064 0 DO
      0EF 0 DO 301 SPR0IX! LOOP        \ sprite right across top of screen
      0AF 0 DO 300 SPR0IY! LOOP        \ sprite down right side of screen
      0 0EF DO 301 SPR0IX! -1 +LOOP    \ sprite left across bottom of screen
      0 0AF DO 300 SPR0IY! -1 +LOOP    \ sprite up left side of screen
   LOOP
   BYE
;
DECIMAL

I changed V! to the very specific SPR0IX! and SPR0IY! because the first two instructions in each are specific to the x or y value of sprite #0 as the index to a DO ... LOOP and the x or y value of sprite #0’s position in the Sprite Attribute Table. Of course, this compromises generalization and I would say this is not in the spirit of Forth, but I wanted to show the speed difference that streamlining the interior of a loop can manage—the speed went down from 26 seconds to 18 seconds.

...lee

Benchmarking Languages

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members