Jump to content

Photo

Benchmarking Languages


159 replies to this topic

#1 Tursi OFFLINE  

Tursi

    River Patroller

  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Jan 22, 2016 10:33 AM

While you're at it, how do you think GPL will compare with p-code? I have a very soft spot in my heart for Pascal, and would eventually like to develop a program for it on the TI, but I am concerned about its performance...


I couldn't say, I've never run it beyond the work I did debugging Classic99. In all that the best I did was assemble some Hello World programs...

For benchmarking languages, really... just write comparable programs. Trying to compare languages and implementations was always a battle, even back in the day, since algorithm matters, what parts of the language you touch matters, what parts of the hardware you need to use matters, etc. But off the top of my head, a good quick one for the TI might be something like manually moving a sprite around the outer edge of the screen, one pixel at a time (no auto-motion). See how fast you can get it whipping around. ;) Make it loop 100 times and then exit, so that you can time the total runtime.

Starting with the simple in XB...
 
100 CALL CLEAR
110 CALL MAGNIFY(2)
120 CALL SPRITE(#1,42,2,1,1)
130 CNT=100
140 FOR X=1 TO 240 :: CALL LOCATE(#1,1,X):: NEXT X
150 FOR Y=1 TO 176 :: CALL LOCATE(#1,Y,240):: NEXT Y
160 FOR X=240 TO 1 STEP -1 :: CALL LOCATE(#1,176,X):: NEXT X
170 FOR Y=176 TO 1 STEP -1 :: CALL LOCATE(#1,Y,1):: NEXT Y
180 CNT=CNT-1 :: IF CNT>0 THEN 140
190 END

ASM and TurboForth in the spoiler tag.

Spoiler


If porting - note how the corners overlap for one frame each! (For example, the X loop positions at 1,240, and then the Y loop ALSO positions at 1,240).

Alllllso, for XB you might want to only time one lap and multiply it by 100. ;)

My tests for the above test come out like so:

XB (estimated): 2000 seconds (33 mins)
Assembly (8-bit code): 7 seconds
TurboForth: 48 seconds

I attempted a UCSD Pascal version, but it kept saying it couldn't find the library on 'USES SPRITE' when I tried to compile, so I gave up... and I'm out of time for the GPL version.

#2 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Fri Jan 22, 2016 10:56 AM

Hmm.... it's academic but you might be able to make TF go faster by making it more like the assembly version. I.e use V! To poke VDP memory. I'll have a look this evening and see if it'll be any faster. I was disappointed when I saw 48 seconds, but on the other hand SPRLOC and friends actually update a copy of the sprite attribute list in cpu ram and copy portions of it to VDP so there's a lot going on under the covers.

Edited by Willsy, Fri Jan 22, 2016 10:57 AM.


#3 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,907 posts
  • Location:Denmark

Posted Fri Jan 22, 2016 11:09 AM

Hmm.... it's academic but you might be able to make TF go faster by making it more like the assembly version. I.e use V! To poke VDP memory. I'll have a look this evening and see if it'll be any faster. I was disappointed when I saw 48 seconds, but ...

 

It would then only be fair that time is spent to make the 2 other implementations faster. ;)



#4 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Fri Jan 22, 2016 11:14 AM

Yes of course!

#5 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Fri Jan 22, 2016 12:45 PM

This one is based on Tursi's code, but pokes VDP directly. Some other little optimisations:
  
VARIABLE cnt
 
hex
: asterisk DATA 4 0028 107C 1028 0000 12a dchar ;
decimal
 
: test
1 gmode
page
1 magnify
asterisk
0 0 0 42 1 sprite
100 cnt !
begin
  cnt @ 0> while
  239 0 do i $301 v! loop
  175 0 do i $300 v! loop
  0 239 do i $301 v! -1 +loop
  0 175 do i $300 v! -1 +loop
  -1 cnt +!
repeat
bye
;
and here's one that removes the need for a variable:
 
hex
: asterisk DATA 4 0028 107C 1028 0000 12a dchar ;
decimal
 
: test
    1 gmode
    page
    1 magnify
    asterisk
    0 0 0 42 1 sprite
    100 0 do
      239 0 do i $301 v! loop
      175 0 do i $300 v! loop
      0 239 do i $301 v! -1 +loop
      0 175 do i $300 v! -1 +loop
    loop
    bye 
;
Both of them take 29 seconds. So that's 3.6 times slower than assembler and 69 times faster than XB.

Rock on!

Edited by Willsy, Fri Jan 22, 2016 12:53 PM.


#6 InsaneMultitasker OFFLINE  

InsaneMultitasker

    Stargunner

  • 1,690 posts

Posted Fri Jan 22, 2016 1:21 PM

and here's one that removes the need for a variable:  Both of them take 29 seconds. So that's 3.6 times slower than assembler .

Rock on!

 

So it's about one 'forth' as fast?  ;)



#7 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Fri Jan 22, 2016 1:31 PM

So it's about one 'forth' as fast?  ;)


Ha ha yes!

#8 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,907 posts
  • Location:Denmark

Posted Fri Jan 22, 2016 3:01 PM

Both of them take 29 seconds. So that's 3.6 times slower than assembler and 69 times faster than XB.

 

So the newer language, with quite a few updates, gets optimized by its creator, and is then compared with the unoptimized versions.

 

Now let's compile the XB and have the ASM run on the GPU. It can be done.  :-D

 

 



#9 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Fri Jan 22, 2016 3:20 PM

Well not really. I was just seeing if I could improve tursi's time of 48 seconds.

#10 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,907 posts
  • Location:Denmark

Posted Fri Jan 22, 2016 3:40 PM

Well not really. I was just seeing if I could improve tursi's time of 48 seconds.


Wow. Sure looks like you did compare them:
 

Both of them take 29 seconds. So that's 3.6 times slower than assembler and 69 times faster than XB.


As Tursi said
 

Trying to compare languages and implementations was always a battle, ...


:)

#11 lucien2 OFFLINE  

lucien2

    Moonsweeper

  • 282 posts
  • Location:Switzerland

Posted Fri Jan 22, 2016 5:12 PM

GPL: 80 seconds
When we compared TF and GPL with the bricks demo 4 1/2 years ago they were closer.

Spoiler


#12 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Jan 22, 2016 9:35 PM

Rigt

Hmm.... it's academic but you might be able to make TF go faster by making it more like the assembly version. I.e use V! To poke VDP memory. I'll have a look this evening and see if it'll be any faster. I was disappointed when I saw 48 seconds, but on the other hand SPRLOC and friends actually update a copy of the sprite attribute list in cpu ram and copy portions of it to VDP so there's a lot going on under the covers.


Yeah, what I was trying to do was use the language's features. The intent was to compare to the baseline Extended BASIC code, once you start bypassing the language it becomes a debate whether it's a sensible comparison. But the assembly version can be sped up with registers and scratchpad without changing the structure (also, the workspace is in 8-bit RAM, so I move that too. That's actually a bug, I never intended to not have the workspace in scratchpad ;) ):

Spoiler


That gets it down to 4.5 seconds - and it's the scratchpad workspace that makes most of the difference (1.5s)... running this code in scratchpad only saved about 1s. Since it spends all its time writing to VDP this program is multiplexer bound. ;) So we'll round up for the table and say 5s. ;)

All that said, I totally get the desire to optimize and there's no actual cheating in the TF version directly hitting VDP RAM, since it's built in. If XB had the ability to VPOKE we could try it there -- maybe an RXB version to see if it's faster. :)

GPL: 80 seconds


Thanks Lucien! I was hoping someone would take that on. Looks pretty good!

I'll split up first pass and optimized times to be fair - barring extreme bugs the first pass may be how someone new to the language would write it, optimized will be any interested party's best time (without changing the output of the program).

To be fair there, I've retimed the assembly version using VSBW etc, since that's how a new assembly programmer would normally start. That actually takes 17 seconds!

Spoiler


So we have:

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
GPL          80 sec       none yet
XB         2000 sec       none yet
Frankly it's looking good for all of them so far versus XB. ;)

#13 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Fri Jan 22, 2016 10:00 PM

Ah. I see. Yes I think that's fair and I see what Sometimes was saying now.

#14 InsaneMultitasker OFFLINE  

InsaneMultitasker

    Stargunner

  • 1,690 posts

Posted Fri Jan 22, 2016 10:21 PM

For giggles I typed the program into Myarc's Advanced BASIC for the Geneve.  It took approximately 8.2 minutes (490 or so seconds) to complete.  Considering this BASIC is written in assembly (no GPL) I would have expected it to be a bit faster.   I wonder if some of the sluggishness in both XB and ABASIC isn't related to all the floating point manipulation.



#15 senior_falcon OFFLINE  

senior_falcon

    Dragonstomper

  • 908 posts
  • Location:Lansing, NY, USA

Posted Fri Jan 22, 2016 10:27 PM

51 seconds for compiled XB 8 bit bus

37 seconds for compiled XB 16 bit bus



#16 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Sat Jan 23, 2016 2:40 AM

51 seconds for compiled XB 8 bit bus
37 seconds for compiled XB 16 bit bus


Wow that's really good!

#17 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 2,422 posts
  • Location:Denmark

Posted Sat Jan 23, 2016 2:46 AM

This seems like a straightforward benchmark, but what does it actually mean to move a sprite around at a rate faster than 1/60s, resulting in visual frames being skipped? ;)  



#18 globeron ONLINE  

globeron

    Dragonstomper

  • 596 posts

Posted Sat Jan 23, 2016 11:17 AM

(I think I have the software somewhere it is somewhere in Tijdingen TI-GG NL magazine in th '80s), but there was this fun thing when changing the screen color continuously,

it generated kind of moving bars on the screen (I think it only works on CRT televisions (50 Hz/60Hz), as I tried it on an LCD but did not see it happening.

 

It is very simple, something like

100 Call Screen(4)

110 Call screen(5)

120 Goto 100

 

and did the same in TI-Basic, Extended Basic, TP99 (Turbo Pascal), C99 © and Assembler.

The difference was that the stripes increased (e.g. Basic had 2 or 3 large bars alternating,  but TP99 had several small stripes, and Assembler was very fast switching colours)

 

Not sure if it is a good benchmark to compare languages, but it was visual. I just tried in Classic99, but here colours switch fast.



#19 Retrospect OFFLINE  

Retrospect

    Dragonstomper

  • 866 posts
  • Location:Wakefield, England

Posted Sat Jan 23, 2016 12:18 PM

I didn't think BASIC on a TI would be able to do the raster crt bars!  ... cuz it uses CALLS which , I recently read, are one of the reasons for slowspeed.  I did this trick on a Spectrum though.  



#20 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 2,422 posts
  • Location:Denmark

Posted Sat Jan 23, 2016 12:36 PM

It is because of the emulator if it doesn't work, because the screen in some emulators is drawn too fast or is not drawn concurrently with the CPU (It does work in MESS). You should always get some type of raster bars if you change the background color at random intervals on the hardware (and is not timing it with the vertical refresh). It has nothing to do with CRT vs LCD AFAIK. The problem on the TI is keeping the bars steady because the clocks of the CPU and the VDP are not synchronized. The only way I'm aware of to get a stable raster effect is to use the 5th sprite flag to measure when the VDP is reaching a specific scan line.  

 

Edit: sorry for polluting this thread, the benchmark is fine is long as you realize it's basically about how fast you can update one VDP RAM byte with increasing or decreasing values.



#21 Lee Stewart OFFLINE  

Lee Stewart

    River Patroller

  • 3,311 posts
  • Location:Silver Run, Maryland

Posted Sat Jan 23, 2016 6:15 PM

Here are the fbForth equivalents(?) of the two TurboForth sprite runs.

 

First pass:

 

HEX
064 VARIABLE CNT
: TEST
   GRAPHICS
   PAGE
   1 MAGNIFY
   0 0 1 02A 0 SPRITE
   BEGIN CNT @ WHILE
      0EF 0 DO I 0 0 SPRPUT LOOP
      0AF 0 DO 0EF I 0 SPRPUT LOOP
      0 0EF DO I 0AF 0 SPRPUT -1 +LOOP
      0 0AF DO 0 I 0 SPRPUT -1 +LOOP
      -1 CNT +!
   REPEAT
   MON
;
DECIMAL
 
and port of the TF optimized pass:
 
HEX
: TEST
    GRAPHICS
    PAGE
    1 MAGNIFY
    0 0 1 02A 0 SPRITE
    064 0 DO
      0EF 0 DO I 301 VSBW LOOP
      0AF 0 DO I 300 VSBW LOOP
      0 0EF DO I 301 VSBW -1 +LOOP
      0 0AF DO I 300 VSBW -1 +LOOP
    LOOP
    MON 
;
DECIMAL
 
The first took 70 seconds and the second took 58 seconds.
 
I might be able to optimize further; but, fbForth cannot really compete with the scratchpad-optimized words of TurboForth that run on the 16-bit bus.
 
...lee


#22 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Jan 24, 2016 9:21 AM

Thanks for the continued updates folks! I'm finding this pretty interesting. :)

 

And yeah, the output to the screen is irrelevant, it's just about taking a normal operation to hardware (moving a sprite) and using it to benchmark the performance of the language. This is certainly not comprehensive, but I wanted something that was quick to implement and still at least somewhat real-world. :)

 

So what I see so far:

Language   First Pass    Optimized
Assembly     17 sec         5 sec
TurboForth   48 sec        29 sec
Compiled XB  51 sec        37 sec
FbForth      70 sec        58 sec
GPL          80 sec       none yet
ABASIC      490 sec       none yet
XB         2000 sec       none yet

(I included ABASIC although I don't know if it's a fair comparison since it's a different computer! ;) )

 



#23 Tursi OFFLINE  

Tursi

    River Patroller

  • Topic Starter
  • 4,746 posts
  • HarmlessLion
  • Location:BUR

Posted Sun Jan 24, 2016 9:23 AM

The original question was 'how does GPL compare?'... to be honest I'm surprised. While it is the slowest (non-BASIC) tested so far, it's not the slowest by much. Any of those languages would be just fine. :)

 

If I posted my Pascal attempt, would someone be able to help figure out why it doesn't compile?


Edited by Tursi, Sun Jan 24, 2016 9:24 AM.


#24 Lee Stewart OFFLINE  

Lee Stewart

    River Patroller

  • 3,311 posts
  • Location:Silver Run, Maryland

Posted Sun Jan 24, 2016 10:29 AM

I would like to revise the fbForth optimized code.  The following is more in line with the TurboForth code I was attempting to port.  It defines V! similar to how it is defined in TurboForth:

 

HEX
ASM:  V!
   *SP+ R0 MOV,         ( pop addr)
   *SP+ R1 MOV,         ( pop value)
   R1 SWPB,             ( get LSB of value into MSB)
   0 LIMI,              ( disable interrupts)
   R0 4000 ORI,         ( tell VDP processor "hey, this is a *write*")
   R0 SWPB,             ( get low byte of address)
   R0 8C02 @() MOVB,    ( write it to vdp address register)
   R0 SWPB,             ( get high byte of address)
   R0 8C02 @() MOVB,    ( write it)
   R1 8C00 @() MOVB,    ( write payload)
   2 LIMI,              ( enable interrupts)
;ASM
 
: TEST
    GRAPHICS
    PAGE
    1 MAGNIFY
    0 0 1 02A 0 SPRITE
    064 0 DO
      0EF 0 DO I 301 V! LOOP
      0AF 0 DO I 300 V! LOOP
      0 0EF DO I 301 V! -1 +LOOP
      0 0AF DO I 300 V! -1 +LOOP
    LOOP
    MON 
;
DECIMAL
 

This runs in 26 seconds!

 

...lee



#25 senior_falcon OFFLINE  

senior_falcon

    Dragonstomper

  • 908 posts
  • Location:Lansing, NY, USA

Posted Sun Jan 24, 2016 10:46 AM

Doggone it, now I suppose I'll have to do the program in XB using CALL LOADs.  Results later today.

 

Oops, just remembered that I need to write to VDP, not CPU.  So maybe no results today.


Edited by senior_falcon, Sun Jan 24, 2016 10:48 AM.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users