Jump to content
Tursi

Benchmarking Languages

Recommended Posts

 

It makes me wonder how big a "Boolean" type is in the P-Code system. Any idea?

 

Oops re-read your earlier post completely. 16 bits for a Boolean.

How big can you go before it hits the limit?

 

Could you try ARRAY of CHAR ?

That's still only 20K. Seems a bit wasteful for what could be a single bit.

I think the array of char sounds like a reasonable idea, but if the P-Machine doesn't handle arrays across boundaries, it could still be a problem.

It sounds odd that it wouldn't support that though.

Share this post


Link to post
Share on other sites

array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array.

 

Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time.

 

I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case.

Share this post


Link to post
Share on other sites

I realized I made a mistake in declaring the array of booleans. Corrected that and updated the list.

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9
Pascal           2.786 s

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

When the program is running, function memavail reports 5697 words free. My unit realtime consumes some memory too, especially as it has a segment with assembly support that's not dynamically relocatable.

Share this post


Link to post
Share on other sites

array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array.

 

Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time.

 

I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case.

Isn't the memory expansion paged?

Share this post


Link to post
Share on other sites

Well, it's split in one 8 K and one 24 K section. But they are all on the same page, at least as I define a page in this context. That is, in the same address space, which for the TMS 9900 comprises 64 Kbytes.

Share this post


Link to post
Share on other sites

Well, it's split in one 8 K and one 24 K section. But they are all on the same page, at least as I define a page in this context. That is, in the same address space, which for the TMS 9900 comprises 64 Kbytes.

Page as in turning a page, changing a page, having to change with RAM appears at an address to use it page

Share this post


Link to post
Share on other sites

Well, that's the same definition, then. And in that case it's not paged. It's the same memory expansion as we always have had.

Edited by apersson850

Share this post


Link to post
Share on other sites

array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array.

 

Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time.

 

I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case.

 

I always like the way Wirth's languages have these different data types. Forth makes you work hard for that nice stuff.

So in my Pascal Envy fever I thought I would see what it would take to make a "packed array of bits" and it is more code than I thought.

 

The overhead just to set 1 bit, according to the CLASSIC99 9901 timer is 1.8 milli-seconds in my Forth system.

 

EDIT: New times 1.5mS to set a bit.

1.1mS to read a bit ([email protected])

 

This could be really sped up with assembly language code words but it's still a lot of work to strip out bits.

 

 

EDIT: Here is the revised code with fixed RSHIFT and uses CELL+ and CELLS to be less platform dependant.

\ BOOLEAN array experiment

\ BOOLEAN data is one CELL (16 bits on TMS9900)

HEX
\ create & erase memory area for 'n' bits
: BITS:  ( n -- ) CREATE    8 /  HERE OVER 0 FILL  CELL+  ALLOT  ;  \ added 2 bytes extra

\ compute bit# in a cell & cell address
: BITFLD     ( bit# addr[] -- bit#' addr)
               SWAP 10 /MOD CELLS ROT +  ;

: [email protected]      ( bit# addr -- ? )
              BITFLD @               \ compute bit# & fetch bits in cell
              SWAP RSHIFT            \ if bit#<>0 RSHIFT,
              0001 AND ;             \ mask 1 bit
              
: BIT#>MASK ( bit# -- n )  0001 SWAP LSHIFT ;

: BSET      ( bit# addr[] -- )
              BITFLD                   ( -- bit# addr)
              SWAP BIT#>MASK >R        \ save the mask
              DUP @                    \ -- addr bits
              R> OR SWAP ! ;           \ or mask into bit, store in addr

: BRST      ( bit# addr[] -- )
              BITFLD                   ( -- bit# addr)
              SWAP BIT#>MASK INVERT >R  \ invert and save mask
              DUP @                     \ -- addr bits
              R> AND SWAP ! ;           \ mask out bits, store back in addr

\ test code
 DECIMAL
  300 BITS: ]X      \ make array X of 1000 bits

: FILLBITS   300 0 DO  I ]X BSET   LOOP ;
: CLRBITS    300 0 DO  I ]X BRST   LOOP ;
: EVENBITS   ." Erasing..."  CLRBITS 300 0 DO  I ]X BSET   2 +LOOP ;
: SHOWBITS   300 0 DO  I ]X [email protected] . LOOP ;

Edited by TheBF

Share this post


Link to post
Share on other sites

So I ran some of the *HAYES tester on my Forth system and guess what?

 

I coded my bit shifting words in a way that made them FAIL. No shock because I didn't read the spec. (DUH)

 

In the WORD [email protected] in my earlier post, I added an IF statement to cope with a 0 bit shift. Well the ANS/ISO Forth standard already anticipated that.

So I recoded my shift words (LSHIFT & RSHIFT) to pass the test and now [email protected] is much faster and much simpler.

 

I have edited the previous POST to reflect the correction.

: [email protected]      ( bit# addr -- ? )
              BITFLD @               \ compute bit# & fetch bits in cell
              SWAP RSHIFT            \ if bit#<>0 RSHIFT,
              0001 AND ;             \ mask 1 bit

* The Hayes tester is set of test words and test suites for every standard Forth word that let's you confirm that a Forth WORD is doing what it's supposed to do.

It looks like this for a simple word like SWAP:

 

T{ 1 2 3 SWAP -> 1 3 2 }T

 

​You get OK if the output swap matches the '1 3 2' after the arrow and an error message if not.

 

To read more: http://forth-standard.org/standard/testsuite

Edited by TheBF
  • Like 1

Share this post


Link to post
Share on other sites

I had always wanted to come back to this little demo to properly do this in Forth in an idiomatic way and for the best performance.
This of course means using the Forth assembler. There is no other way to match compilers that generate native code with a Threaded code Forth.

So I took a look at Tursi's code.
I made a some slight improvements by keeping the loop limits in a register rather than re-loading the register every time through the loop.

 

I used the Forth compiler to make byte directive so signed bytes can be used for the counters. I liked the way that made the code look. ;-)
This happens at compile time so it costs nothing in run-time.

Then I chopped up the big program into 4 routines which would be the Forth approach. This meant I could test each movement independantly.
I used Forth for all the setup stuff, clearing the screen and creating the sprite.

After that it's just 4 words in a 100 iteration loop.

The spoiler has the new code and the video shows the timing using the VDP interrupt on the PC. It seems to agree with my iPhone stopwatch.

 

 

 

\ * assumes startup from CAMEL99 Forth
\ * Translated from ASM code by Tursi
\ stripped out graphics set up and used Forth.
\ Uses Forth Workspace at >8300

 \ NEEDS DUMP FROM DSK1.TOOLS
NEEDS MOV,   FROM DSK1.ASM9900
NEEDS CLEAR  FROM DSK1.GRAFIX
NEEDS SPRITE FROM DSK1.DIRSPRIT
NEEDS ELAPSE FROM DSK1.ELAPSE

HEX
8C00 CONSTANT VDPWD         \ vdp ram write data
8C02 CONSTANT VDPWA         \ vdp ram read/write address

\ MACRO to setup VDP write address from a register argument
: VDPWA, ( reg -- )
       DUP VDPWA @@  MOVB,   \ write 1st byte of address to VDP chip
       DUP           SWPB,
           VDPWA @@  MOVB,   \ write 2nd byte of address to VDP chip
                     NOP,
;

\ Macro to convert integer to signed byte at compile time
: byte  ( n -- c )  00FF AND >< ;

CODE MOVERIGHT ( -- )
          R3 CLR,            \ for x=0
DECIMAL   R8 239 byte LI,    \ to 239
          R13 01 byte LI,    \ step 1
          BEGIN,
HEX          R0 4301 >< LI,   ( >< swaps bytes at compile time)
             R0 VDPWA,
             R3 VDPWD @@ MOVB,
             R13 R3 ADD,      \ next x
             R3 R8 CMP,
          EQ UNTIL,
          NEXT,
          ENDCODE

CODE MOVEDOWN
          R4 CLR,              \ for y=0
DECIMAL   R8 175 byte LI,      \ to 175
          BEGIN,
HEX          R0 4300 >< LI,
             R0 VDPWA,
             R4 VDPWD @@ MOVB,
             R13 R4 ADD,        \ next y
             R4 R8 CMP,
          EQ UNTIL,
          NEXT,
          ENDCODE

CODE MOVELEFT
DECIMAL   R3 239 byte LI,       \ for x=239 to 0
HEX       R13 -1 byte LI,       \ step -1
          BEGIN, 
             R0 4301 >< LI,
             R0 VDPWA,
             R3 VDPWD @@ MOVB,
             R13 R3 ADD,        \ next x
          EQ UNTIL,
          NEXT,
          ENDCODE

CODE MOVEUP
DECIMAL   R4 175 byte LI,        \ * for y=175 to 0
          R13 -1 byte LI,         \ step -1
          BEGIN,
HEX          R0 4300 >< LI,
             R0 VDPWA,
             R4 VDPWD @@ MOVB,
             R13 R4 ADD,       \ * next y
          EQ UNTIL,
          NEXT,
          ENDCODE

DECIMAL
: START
    CLEAR
\    Ascii   color  y x spr#
    [CHAR] *   2    1 1  0 SPRITE
    1 MAGNIFY
    100 0
    DO
       MOVERIGHT MOVEDOWN MOVELEFT MOVEUP
    LOOP ;

 

 

Tursi's Benchmark.mp4

  • Like 1

Share this post


Link to post
Share on other sites

By the way, for reference, when I did a straight translation to Forth Assembler of Tursi's original code it was about 6.2 seconds on my Classic99/PC combo.

With the loop limit in a register it came down to 5.91

 

The setup Forth code takes about 22mS measured with the 9901 timer so it's insignificant.

 

I never saw anything close to the published 5 seconds. Is this a hardware issue on my end?

Share this post


Link to post
Share on other sites

Since I found a way to take some instructions out of CAMEL99 Forth direct threaded version I re-did some times on the Sieve of Eratosthenes.
Each Forth system ran the same code in the spoiler.

Again we can see TurboForth's strategy of installing a maximum number of Forth routines in 16bit RAM as the one to beat.

But using the BL instruction to assist in making direct threaded code shaved the time down from 7.25 to 7.1 seconds.
And the newest version of Camel Forth, where more of the primitive routines are written in Assembler improves the time versus regular Camel Forth by 20%.

I believe the FBFORTH performance is being held back by the way FIG Forth implements calling Forth's "inner interpreters" per this discussion:

http://http://www.bradrodriguez.com/papers/moving3.htm

Two approaches have been used in the past:
1. The fig-Forth solution. Fig-Forth reserved the first cell of
the Parameter Field to hold the address of the high-level code.
The DODOES routine then obtained the Parameter Field address,
pushed the address of the actual data (typically PFA+2) onto
the stack, fetched the address of the high-level routine,
and EXECUTEd.


Manual timings
1000 primes
X BASIC 14.3 secs
TI BASIC 11.9

10000 Primes
----------------------
TurboForth 8.0
CAMEL99 10.28
CAMEL99G 8.3
CAMEL99 DTC 7.1 (new version)
FBFORTH 15.0



 

\ Sieve of Erathosenes in Forth
\ Tested with TurboForth FBFORTH and CAMEL99 Forth

\ array calculators
: []@ ( n addr -- c )    + [email protected] ;
: []! ( n ndx addr -- )  + C! ;

: ERASE  ( addr n -- ) 0 FILL ;

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < IF ." Out of memory"  ABORT THEN ;

: SEEPRIMES ( -- )
        CR ." Primes: "
        2 DO
            I HERE []@ 0= IF I . THEN
            ?TERMINAL IF ." Primes halted" ABORT THEN
        LOOP ;

\ byte array uses un-allocated memory at HERE
DECIMAL
: PRIMES ( n -- )
        ?MEM
        CR ." Running..."
        HERE OVER ERASE
        1 0 HERE []!       \ mark as prime like 'C' version
        1 1 HERE []!
        2                  \ start at 2
        BEGIN
           OVER OVER DUP * >
        WHILE
           DUP HERE []@ 0=
           IF  OVER OVER DUP *
               DO
                  1 I HERE []!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        CR ." Press ENTER to see primes:" KEY 13 =
        IF   SEEPRIMES   THEN
;


 

Edited by TheBF
  • Like 3

Share this post


Link to post
Share on other sites

The fbForth time will likely improve if you first remove the hook to the fbForth ISR by storing 0 at >83C4:

HEX 0 83C4 !

...lee

  • Like 2

Share this post


Link to post
Share on other sites

I will do that and edit the result Lee when I get back to my desk.

Share this post


Link to post
Share on other sites

Youtube deleted most of my videos off my channel and no reasons why or what the problem is.....?

 

Youtube management are IDIOTS at best and the software they use to find Porn or Hate Speech is pure crap.

 

Example is Youtube deleted a PBS channel as Porn and it was Sesame Street?

 

Could Youtube write a worse junk program?

Share this post


Link to post
Share on other sites

The fbForth time will likely improve if you first remove the hook to the fbForth ISR by storing 0 at >83C4:

HEX 0 83C4 !

...lee

 

Lee,

 

I tried this twice just now.

After FBForth started I got 15 seconds.

Then again after 0 83C4 ! I got 14.86.

 

Your interrupts are not taking very much time away from the CPU when they are idling it seems.

 

When I look at your inner interpreter I cannot see where the overhead is coming from?

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...