Benchmarking Languages

+Lee Stewart · July 27, 2017

Lee,

I can't figure out how you got FB-Forth to go faster than Turbo Forth in the optimized version.

Can you double check it when you have nothing better to do?

I am sure the reason is that TurboForth's V! calls vsbw in TurboForth, whereas my port of the same code has all of the relevant ALC within V! . The “bl @vsbw” and return would do it, I think.

...lee

+TheBF · August 13, 2017

I thought I would expand this conversation by using a classic benchmark, the Sieve of Erathosenes.

I know it's an old chestnut, but it is commonly used to see how a language does a few things.

I tried it first with TI-BASIC. I used the MSX version from Rosetta code and fixed a few things so it would run on the TI-99

100 REM  Eratosthenes Sieve 
110 REM    for TI BASIC 
120 REM  ****************** 
130 REM  MSX BRRJPA
140 INPUT "Search until: ":L
150 DIM P(1500)
160 FOR N=2 TO SQR(L+1000)
170 IF P(N)<>0 THEN 210
180 FOR K=N*N TO L STEP N
190 LET P(K)=1
200 NEXT K
210 NEXT N
220 FOR N=2 TO L
230 IF P(N)<>0 THEN 250
240 PRINT N;
250 NEXT N

I use 1000 for the input value because it takes along time. The big shock for me was that XB was slower than TI-BASIC.

Then of course I did a version for Forth and tested it on Turbo Forth and Camel99 a couple of ways.

Here is that code which is also from Rosetta Code but slightly modified by me.

I don't agree with the way they did the looping in the WHILE statement. It's a slow thing to have multiplication to calculate a loop

I will see if I can make it more like the C version...

\ Sieve of Erathosenes in Forth
\ Tested with TurboForth and CAMEL99 Forth

\ array calculators
: []@ ( n addr -- ? )  + C@  ;
: []! ( n addr -- )    + C! ;

: ERASE  ( addr n -- ) 0 FILL ;

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < ABORT" Out of memory" ;

\ byte array uses unallocated memory at 'HERE'
DECIMAL
: PRIMES ( n -- )
        ?MEM
        CR ." Running..."
        HERE OVER ERASE
        1 0 HERE []!       \ mark as prime like 'C' version
        1 1 HERE []!
        2                  \ start at 2
        BEGIN
           2DUP DUP * >
        WHILE
           DUP HERE []@ 0=
           IF  2DUP DUP *
               DO
                  1 I HERE []!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        ." Primes: " 2 DO  I HERE []@ 0= IF I . THEN  LOOP
;

And here are the timings. BASIC can't allocate enough memory for more than about 1500 elements because of the big floating point numbers so I limited it to 1000.

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

I would be interested in seeing the GCC results using the Rosetta code version or something similar.

#include <stdlib.h>
#include <math.h>
 
char*
eratosthenes(int n, int *c)
{
	char* sieve;
	int i, j, m;
 
	if(n < 2)
		return NULL;
 
	*c = n-1;     /* primes count */
	m = (int) sqrt((double) n);
 
	/* calloc initializes to zero */
	sieve = calloc(n+1,sizeof(char));
	sieve[0] = 1;
	sieve[1] = 1;
	for(i = 2; i <= m; i++)
		if(!sieve[i])
			for (j = i*i; j <= n; j += i)
				if(!sieve[j]){
					sieve[j] = 1; 
					--(*c);
				}
  	return sieve;
}

RXB · August 13, 2017

I use 1000 for the input value because it takes along time. The big shock for me was that XB was slower than TI-BASIC.

I side by side compared RXB and TI Basic and RXB beat TI Basic easy.

Edited August 13, 2017 by RXB

+TheBF · August 13, 2017

I have no doubt about it Rich.

Do you have an approximate number of secs?

I did not make it clear, but I only time the loop that does the calculation.

The print out of the results I ignored.

I also did some testing and realized the Forth example is pretty silly in one other way.

The HERE word reads an address from a variable and is used as the base address of the data array.

So that's not at all like using a fixed memory address for the data array.

When I replaced a Forth version of HERE with an Assembly language version of HERE the ITC benchmark improved 20%!

"HERE" is a convenient memory location for throw away calculations where you don't need to create a variable, but not appropriate for a speed benchmark IMHO.

So I will change that and republish the results for all the Forth examples.

+TheBF · August 13, 2017

So I changed the benchmark to use a VALUE to hold the array address.

It improved TF and CAMEL99 ITC version, but it actually slowed down the directed threaded version a little.

The DTC version has HERE written in Assembler so this only proves that my code for CONSTANT is slower than fetching the contents of a variable in ASM.

Anyway here is result.

\ Sieve of Erathosenes in Forth using a VALUE as array pointer
\ Tested with TurboForth and CAMEL99 Forth

0 VALUE SIEVE[]           \ holds array address

\ array calculators
: SIEVE[]@ ( n addr -- ? ) SIEVE[] + C@  ;
: SIEVE[]! ( n addr -- )   SIEVE[] + C! ;

: ERASE  ( addr n -- ) 0 FILL ;

HEX
: FREEMEM  ( -- n) FF00 HERE - ;

: ?MEM     ( n -- n )  FREEMEM OVER < ABORT" Out of memory" ;

\ byte array uses unallocated memory at HERE
DECIMAL
: PRIMES ( n -- )
        ?MEM
        HERE TO SIEVE[]       \ assign address to the array

        CR ." Running..."
        SIEVE[] OVER ERASE
        1 0 SIEVE[]!          \ mark as prime like 'C' version
        1 1 SIEVE[]!
        2                     \ start at 2
        BEGIN
           2DUP DUP * >
        WHILE
           DUP SIEVE[]@ 0=
           IF  2DUP DUP *
               DO
                  1 I SIEVE[]!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        ." Primes: " 2 DO  I SIEVE[]@ 0= IF I . THEN  LOOP
;

10,000 primes    W/HERE   W/VALUE  
TF               8.0       7.7
CAMEL99 ITC     10.28      8.4
CAMEL99 DTC      7.25      7.5

Edited August 14, 2017 by TheBF

+Lee Stewart · August 13, 2017

Cut-and-paste did this, I’ll bet:

: SIEVE[]@ ( n addr -- ? ) SIEVE[] + C@ ;

should be

: SIEVE[]@ ( addr -- ? ) SIEVE[] + C@ ;

...lee

+TheBF · August 13, 2017

Yup!

And in fact in should be

: SIEVE[]@ ( n -- ? ) SIEVE[] + C@ ;

I merged the address into these new definitions to make it read nicer.

B

+Lee Stewart · August 13, 2017

I would say that the stack effects for ?MEM should be “( n -- n )” because n survives execution.

Also, why did you define SQRT ? You don’t use it.

...lee

+TheBF · August 13, 2017

I was originally working replicating the C version

A forgot to remove SQRT from this file.

Had to go socialize. Wife's orders

B

Willsy · August 14, 2017

Nice!

Does the sieve just produce primes? If so there's a primes finder on the TF disk. Just type DEMOS after it boots and it'll tell you which block it's on.

Run it with 1000 PRIMES and it should do its stuff.

It can find primes to 1000 in around 2.5 seconds if I remember correctly.

If I remember correctly I adapted it from a Brodie program in Starting Forth. It was one of the first programs TF ever ran :-)

+TheBF · August 14, 2017

Nice!

Does the sieve just produce primes? If so there's a primes finder on the TF disk. Just type DEMOS after it boots and it'll tell you which block it's on.

Run it with 1000 PRIMES and it should do its stuff.

It can find primes to 1000 in around 2.5 seconds if I remember correctly.

If I remember correctly I adapted it from a Brodie program in Starting Forth. It was one of the first programs TF ever ran :-)

Yes this one just finds PRIME numbers. It look like it was done by Anton, but I can't be sure.

So if we scale the Starting Forth version to 10000 it would take 25 seconds, so this one is quite a bit faster.

With this version in TF it should search to over 20,000 because it uses all available memory in the expansion card.

apersson850 · August 23, 2017

I entered the Pascal code as it is on the Rosetta page, using a set to keep track of the primes.

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9
Pascal           2.869 s

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

It seems the sets aren't too efficiently implemented in Pascal. Probably because they are represented by a bit in a packed field, so there's a lot of packing and unpacking.

I changed from using a set to an array of booleans instead. That reduced the time from 10.558 s to 4.663 s.
Then I suspected some time could be saved in setting up the array, so I used the faster system intrinsic moveleft to do that. Now the time is 2.869 seconds.
With this solution, a boolean bit is stored as an integer, so 1000 elements require 2000 bytes.

                                        Created at Aug 23, Wed, 21:50.27
                                        RAMDISK:BENCH8.TEXT

program sieve;

uses
  support,
  realtime;
  
const
  primelimit = 1000;
   
var
  primes: array[1..primelimit] of boolean;
  n,k: integer;
  needcomma: boolean;
  timer: timerid;
  elapsed: ttime;
  ch: char;
      
begin
  writeln('Sieve starts running');

  tmrnew(timer);
  tmrreset(timer);
  tmrstart(timer);
  
  (* Calcuate the primes *)
  primes[1] := false;
  primes[2] := true;
  moveleft(primes[2],primes[3],primelimit-4);
  for n := 1 to trunc(sqrt(primelimit)) do
  begin
    if primes[n] then
    begin
      k := n*n;
      while k<primelimit do
      begin
        primes[k] := false;
        k := k+n;
      end;
    end;
  end;

  tmrstop(timer);
  tmrread(timer,elapsed);

  (* Output the primes *)
  needcomma := false;
  for n := 1 to primelimit do
    if primes[n] then
    begin
      if needcomma then
        write(', ');
      write(n);
      needcomma := true;
    end;

  writeln;
  with elapsed do
  begin
    write('Time ',hour,':');
    if minute<10 then
      write('0');
    write(minute,':');
    if second<10 then
      write('0');
    write(second,',');
    if fract<100 then
      write('0');
    if fract<10 then
      write('0');
    writeln(fract);
  end;
  read(ch);
end.

Edited August 24, 2017 by apersson850

+TheBF · August 23, 2017

That's great optimization.

Thanks telling us about it.

senior_falcon · August 24, 2017

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

After compiling the BASIC program in post 127 I got the following results:

For 1000 primes, compiled XB takes about 1 second. For 8,500 primes it takes 10 seconds. So the speeds are similar but somewhat slower. Over the winter I will do some rewriting on this so it can load a larger program.

RXB · August 24, 2017

Pretty sure XB has more accurate Primes to 10 decimal places then TI Basic does, they did put a ton of Floating Point into XB for more accuracy over TI Basic.

apersson850 · August 24, 2017

Hahaha, that was a good one!

sometimes99er · August 24, 2017

It can find primes to 1000 in around 2.5 seconds if I remember correctly.

So if we scale the Starting Forth version to 10000 it would take 25 seconds, so this one is quite a bit faster.

So you just multiplied both values with ten ? :waving:

sometimes99er · August 24, 2017

Pretty sure XB has more accurate Primes to 10 decimal places then TI Basic does, they did put a ton of Floating Point into XB for more accuracy over TI Basic.

If I may say so, - "more accurate Primes" is actually rather important. :thumbsup:

Willsy · August 24, 2017

If I may say so, - "more accurate Primes" is actually rather important.

Boo! You're just being prejudiced against inaccurate primes.

WHAT DO WE WANT?

EQUAL RIGHTS FOR INACCURATE PRIMES

WHEN DO WE WANT IT?

SOME TIME IN THE NOT-TOO-DISTANT FUTURE PLUS OR MINUS AN INDETERMINATE AND VARIABLE AMOUNT OF TIME!

:-)

Edited August 27, 2017 by Willsy

apersson850 · August 24, 2017

As long as we stay within the range of signed integers, as we do here, it doesn't make any difference.

+TheBF · August 24, 2017

So you just multiplied both values with ten ?

Yes. I did not run your code with 10000 primes

apersson850 · August 27, 2017

I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.

+TheBF · August 27, 2017

I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.

~~It makes me wonder how big a "Boolean" type is in the P-Code system. Any idea?~~

Oops re-read your earlier post completely. 16 bits for a Boolean.

How big can you go before it hits the limit?

Could you try ARRAY of CHAR ?

Edited August 27, 2017 by TheBF

JamesD · August 27, 2017

I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.

Boolean is probably 8 bit, so that would only be 10K.

The P-Machine may not be able to deal with arrays that cross memory pages.

RXB · August 28, 2017

I suppose I could write a RXB version using up to 10 digits (999999999) and make them strings 10 digits in length and store in SAMS.

SAMS memory 960 Meg (1966080 bytes) to store 196608 ten digit primes.

So this would allow for up to 6 digit primes to be solved vs the TI Basic version that can only store 1000 primes or XB that could go as high as 3500 Primes.

Benchmarking Languages

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members