Jump to content

Photo

Benchmarking Languages


159 replies to this topic

#126 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,341 posts
  • Location:Silver Run, Maryland

Posted Wed Jul 26, 2017 9:55 PM

  Lee, 

              I can't figure out how you got FB-Forth to go faster than Turbo Forth in the optimized version.

              Can you double check it when you have nothing better to do?  

 

I am sure the reason is that TurboForth's V! calls vsbw in TurboForth, whereas my port of the same code has all of the relevant ALC within V! .  The “bl @vsbw” and return would do it, I think.

 

...lee



#127 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sat Aug 12, 2017 7:01 PM

I thought I would expand this conversation by using a classic benchmark, the Sieve of Erathosenes.

I know it's an old chestnut, but it is commonly used to see how a language does a few things.

 

I tried it first with TI-BASIC.  I used the MSX version from Rosetta code and fixed a few things so it would run on the TI-99

100 REM  Eratosthenes Sieve 
110 REM    for TI BASIC 
120 REM  ****************** 
130 REM  MSX BRRJPA
140 INPUT "Search until: ":L
150 DIM P(1500)
160 FOR N=2 TO SQR(L+1000)
170 IF P(N)<>0 THEN 210
180 FOR K=N*N TO L STEP N
190 LET P(K)=1
200 NEXT K
210 NEXT N
220 FOR N=2 TO L
230 IF P(N)<>0 THEN 250
240 PRINT N;
250 NEXT N

I use 1000 for the input value because it takes along time. The big shock for me was that XB was slower than TI-BASIC.  :? 

 

Then of course I did a version for Forth and tested it on Turbo Forth and Camel99 a couple of ways.

Here is that code which is also from Rosetta Code but slightly modified by me.

I don't agree with the way they did the looping in the WHILE statement. It's a slow thing to have multiplication to calculate a loop

I will see if I can make it more like the C version...

\ Sieve of Erathosenes in Forth
\ Tested with TurboForth and CAMEL99 Forth

\ array calculators
: []@ ( n addr -- ? )  + C@  ;
: []! ( n addr -- )    + C! ;

: ERASE  ( addr n -- ) 0 FILL ;

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < ABORT" Out of memory" ;

\ byte array uses unallocated memory at 'HERE'
DECIMAL
: PRIMES ( n -- )
        ?MEM
        CR ." Running..."
        HERE OVER ERASE
        1 0 HERE []!       \ mark as prime like 'C' version
        1 1 HERE []!
        2                  \ start at 2
        BEGIN
           2DUP DUP * >
        WHILE
           DUP HERE []@ 0=
           IF  2DUP DUP *
               DO
                  1 I HERE []!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        ." Primes: " 2 DO  I HERE []@ 0= IF I . THEN  LOOP
;

And here are the timings.  BASIC can't allocate enough memory for more than about  1500 elements because of the big floating point numbers so I limited it to 1000.

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

I would be interested in seeing the GCC results using the Rosetta code version or something similar.

 

Spoiler


#128 RXB ONLINE  

RXB

    River Patroller

  • 2,763 posts
  • Location:Vancouver, Washington, USA

Posted Sun Aug 13, 2017 1:27 AM

I use 1000 for the input value because it takes along time. The big shock for me was that XB was slower than TI-BASIC.   :? 

 

 

I side by side compared RXB and TI Basic and RXB beat TI Basic easy.


Edited by RXB, Sun Aug 13, 2017 1:28 AM.


#129 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sun Aug 13, 2017 8:10 AM

I have no doubt about it Rich.

 

Do you have an approximate number of secs?  

I did not make it clear, but I only time the loop that does the calculation.

The print out of the results I ignored.

 

I also did some testing and realized the Forth example is pretty silly in one other way.

The HERE word reads an address from a variable and is used as the base address of the data array.

So that's not at all like using a fixed memory address for the data array.

When I replaced a Forth version of HERE with an Assembly language version of HERE the ITC benchmark improved 20%!

 

"HERE" is a convenient memory location for throw away calculations where you don't need to create a variable, but not appropriate for a speed benchmark IMHO.

 

So I will change that and republish the results for all the Forth examples.



#130 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sun Aug 13, 2017 9:07 AM

So I changed the benchmark to use a VALUE to hold the array address.

It improved TF and CAMEL99 ITC version, but it actually slowed down the directed threaded version a little. 

The DTC version has HERE written in Assembler so this only proves that my code for CONSTANT is slower than fetching the contents of a variable in ASM.

 

Anyway here is result. 

\ Sieve of Erathosenes in Forth using a VALUE as array pointer
\ Tested with TurboForth and CAMEL99 Forth

0 VALUE SIEVE[]           \ holds array address

\ array calculators
: SIEVE[]@ ( n addr -- ? ) SIEVE[] + C@  ;
: SIEVE[]! ( n addr -- )   SIEVE[] + C! ;

: ERASE  ( addr n -- ) 0 FILL ;

HEX
: FREEMEM  ( -- n) FF00 HERE - ;

: ?MEM     ( n -- n )  FREEMEM OVER < ABORT" Out of memory" ;

\ byte array uses unallocated memory at HERE
DECIMAL
: PRIMES ( n -- )
        ?MEM
        HERE TO SIEVE[]       \ assign address to the array

        CR ." Running..."
        SIEVE[] OVER ERASE
        1 0 SIEVE[]!          \ mark as prime like 'C' version
        1 1 SIEVE[]!
        2                     \ start at 2
        BEGIN
           2DUP DUP * >
        WHILE
           DUP SIEVE[]@ 0=
           IF  2DUP DUP *
               DO
                  1 I SIEVE[]!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        ." Primes: " 2 DO  I SIEVE[]@ 0= IF I . THEN  LOOP
;
10,000 primes    W/HERE   W/VALUE  
TF               8.0       7.7
CAMEL99 ITC     10.28      8.4
CAMEL99 DTC      7.25      7.5

Edited by TheBF, Sun Aug 13, 2017 8:44 PM.


#131 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,341 posts
  • Location:Silver Run, Maryland

Posted Sun Aug 13, 2017 9:58 AM

Cut-and-paste did this, I’ll bet:

 

: SIEVE[]@ ( n addr -- ? ) SIEVE[] + C@  ;

 

should be

 

: SIEVE[]@ ( addr -- ? ) SIEVE[] + C@  ;

 

...lee



#132 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sun Aug 13, 2017 10:15 AM

Yup!

 

And in fact in should be 

 

: SIEVE[]@ ( n -- ? ) SIEVE[] + C@  ;

 

I merged the address into these new definitions to make it read nicer.

 



#133 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,341 posts
  • Location:Silver Run, Maryland

Posted Sun Aug 13, 2017 11:32 AM

I would say that the stack effects for ?MEM should be “( n -- n )” because n survives execution.

 

Also, why did you define SQRT ?  You don’t use it.

 

...lee



#134 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sun Aug 13, 2017 3:25 PM

I was originally working replicating the C version

A forgot to remove SQRT from this file.

Had to go socialize. Wife's orders

B

#135 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,011 posts
  • Location:Uzbekistan (no, really!)

Posted Mon Aug 14, 2017 4:05 AM

Nice!

Does the sieve just produce primes? If so there's a primes finder on the TF disk. Just type DEMOS after it boots and it'll tell you which block it's on.

Run it with 1000 PRIMES and it should do its stuff.

It can find primes to 1000 in around 2.5 seconds if I remember correctly.

If I remember correctly I adapted it from a Brodie program in Starting Forth. It was one of the first programs TF ever ran :-)

#136 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Mon Aug 14, 2017 7:25 AM

Nice!

Does the sieve just produce primes? If so there's a primes finder on the TF disk. Just type DEMOS after it boots and it'll tell you which block it's on.

Run it with 1000 PRIMES and it should do its stuff.

It can find primes to 1000 in around 2.5 seconds if I remember correctly.

If I remember correctly I adapted it from a Brodie program in Starting Forth. It was one of the first programs TF ever ran :-)

 

 

Yes this one just finds PRIME numbers.  It look like it was done by Anton, but I can't be sure.

So if we scale the Starting Forth version to 10000 it would take 25 seconds, so this one is quite a bit faster. 

 

With this version in TF it should search to over 20,000 because it uses all available memory in the expansion card.



#137 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 421 posts

Posted Wed Aug 23, 2017 1:21 PM

I entered the Pascal code as it is on the Rosetta page, using a set to keep track of the primes.

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9
Pascal           2.869 s

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

It seems the sets aren't too efficiently implemented in Pascal. Probably because they are represented by a bit in a packed field, so there's a lot of packing and unpacking.
 
I changed from using a set to an array of booleans instead. That reduced the time from 10.558 s to 4.663 s.
Then I suspected some time could be saved in setting up the array, so I used the faster system intrinsic moveleft to do that. Now the time is 2.869 seconds.
With this solution, a boolean bit is stored as an integer, so 1000 elements require 2000 bytes.

Spoiler

Edited by apersson850, Thu Aug 24, 2017 12:47 AM.


#138 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Wed Aug 23, 2017 4:25 PM

That's great optimization.

Thanks telling us about it.

#139 senior_falcon OFFLINE  

senior_falcon

    Dragonstomper

  • 926 posts
  • Location:Lansing, NY, USA

Posted Wed Aug 23, 2017 8:38 PM


1000 primes
X BASIC         14.3 secs
TI BASIC        11.9

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

 

After compiling the BASIC program in post 127 I got the following results:

For 1000 primes, compiled XB takes about 1 second.  For 8,500 primes it takes 10 seconds.  So the speeds are similar but somewhat slower.  Over the winter I will do some rewriting on this so it can load a larger program.



#140 RXB ONLINE  

RXB

    River Patroller

  • 2,763 posts
  • Location:Vancouver, Washington, USA

Posted Wed Aug 23, 2017 11:21 PM

Pretty sure XB has more accurate Primes to 10 decimal places then TI Basic does, they did put a ton of Floating Point into XB for more accuracy over TI Basic.



#141 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 421 posts

Posted Thu Aug 24, 2017 12:49 AM

Hahaha, that was a good one! :D



#142 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,922 posts
  • Location:Denmark

Posted Thu Aug 24, 2017 2:36 AM

 

It can find primes to 1000 in around 2.5 seconds if I remember correctly.


So if we scale the Starting Forth version to 10000 it would take 25 seconds, so this one is quite a bit faster.

 

 
So you just multiplied both values with ten ?  :waving:



#143 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 3,922 posts
  • Location:Denmark

Posted Thu Aug 24, 2017 2:39 AM

Pretty sure XB has more accurate Primes to 10 decimal places then TI Basic does, they did put a ton of Floating Point into XB for more accuracy over TI Basic.

 

If I may say so, - "more accurate Primes" is actually rather important.  :thumbsup:



#144 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,011 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Aug 24, 2017 5:07 AM

 
If I may say so, - "more accurate Primes" is actually rather important.  :thumbsup:

Boo! You're just being prejudiced against inaccurate primes.

WHAT DO WE WANT?
EQUAL RIGHTS FOR INACCURATE PRIMES
WHEN DO WE WANT IT?
SOME TIME IN THE NOT-TOO-DISTANT FUTURE PLUS OR MINUS AN INDETERMINATE AND VARIABLE AMOUNT OF TIME!

:-)

Edited by Willsy, Sun Aug 27, 2017 2:16 PM.


#145 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 421 posts

Posted Thu Aug 24, 2017 5:36 AM

As long as we stay within the range of signed integers, as we do here, it doesn't make any difference.



#146 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Thu Aug 24, 2017 6:12 AM

 
So you just multiplied both values with ten ?  :waving:

 

Yes.  I did not run your code with 10000 primes



#147 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 421 posts

Posted Sun Aug 27, 2017 1:27 PM

I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.



#148 TheBF ONLINE  

TheBF

    Moonsweeper

  • 319 posts
  • Location:The Great White North

Posted Sun Aug 27, 2017 3:17 PM

I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.

 

It makes me wonder how big a "Boolean" type is in the P-Code system. Any idea?  

 

Oops re-read your earlier post completely.  16 bits for a Boolean.

How big can you go before it hits the limit?

 

Could you try ARRAY of CHAR ?


Edited by TheBF, Sun Aug 27, 2017 3:22 PM.


#149 JamesD ONLINE  

JamesD

    Quadrunner

  • 7,653 posts
  • Location:Flyover State

Posted Sun Aug 27, 2017 3:34 PM

I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.

Boolean is probably 8 bit, so that would only be 10K.
The P-Machine may not be able to deal with arrays that cross memory pages.
 



#150 RXB ONLINE  

RXB

    River Patroller

  • 2,763 posts
  • Location:Vancouver, Washington, USA

Posted Sun Aug 27, 2017 8:04 PM

I suppose I could write a RXB version using up to 10 digits (999999999) and make them strings 10 digits in length and store in SAMS.

 

SAMS memory 960 Meg (1966080 bytes) to store 196608 ten digit primes.

 

So this would allow for up to 6 digit primes to be solved vs the TI Basic version that can only store 1000 primes or XB that could go as high as 3500 Primes.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users