Jump to content

Photo

Benchmarking Languages


167 replies to this topic

#151 JamesD OFFLINE  

JamesD

    Quadrunner

  • 8,482 posts
  • Location:Flyover State

Posted Sun Aug 27, 2017 11:36 PM

 

It makes me wonder how big a "Boolean" type is in the P-Code system. Any idea?  

 

Oops re-read your earlier post completely.  16 bits for a Boolean.

How big can you go before it hits the limit?

 

Could you try ARRAY of CHAR ?

That's still only 20K.  Seems a bit wasteful for what could be a single bit.
I think the array of char sounds like a reasonable idea, but if the P-Machine doesn't handle arrays across boundaries, it could still be a problem.


It sounds odd that it wouldn't support that though.



#152 apersson850 OFFLINE  

apersson850

    Dragonstomper

  • 605 posts

Posted Mon Aug 28, 2017 2:39 AM

array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array.

 

Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time.

 

I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case.



#153 apersson850 OFFLINE  

apersson850

    Dragonstomper

  • 605 posts

Posted Mon Aug 28, 2017 12:30 PM

I realized I made a mistake in declaring the array of booleans. Corrected that and updated the list.

1000 primes
X BASIC         14.3 secs
TI BASIC        11.9
Pascal           2.786 s

10,000 primes
TF               8.0
CAMEL99 ITC     10.28
CAMEL99 DTC      7.25

When the program is running, function memavail reports 5697 words free. My unit realtime consumes some memory too, especially as it has a segment with assembly support that's not dynamically relocatable.



#154 JamesD OFFLINE  

JamesD

    Quadrunner

  • 8,482 posts
  • Location:Flyover State

Posted Mon Aug 28, 2017 1:33 PM

array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array.

 

Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time.

 

I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case.

Isn't the memory expansion paged?



#155 apersson850 OFFLINE  

apersson850

    Dragonstomper

  • 605 posts

Posted Mon Aug 28, 2017 3:10 PM

Well, it's split in one 8 K and one 24 K section. But they are all on the same page, at least as I define a page in this context. That is, in the same address space, which for the TMS 9900 comprises 64 Kbytes.



#156 JamesD OFFLINE  

JamesD

    Quadrunner

  • 8,482 posts
  • Location:Flyover State

Posted Mon Aug 28, 2017 4:52 PM

Well, it's split in one 8 K and one 24 K section. But they are all on the same page, at least as I define a page in this context. That is, in the same address space, which for the TMS 9900 comprises 64 Kbytes.

Page as in turning a page, changing a page, having to change with RAM appears at an address to use it page



#157 apersson850 OFFLINE  

apersson850

    Dragonstomper

  • 605 posts

Posted Mon Aug 28, 2017 5:00 PM

Well, that's the same definition, then. And in that case it's not paged. It's the same memory expansion as we always have had.


Edited by apersson850, Tue Aug 29, 2017 12:30 AM.


#158 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Wed Aug 30, 2017 2:36 PM

array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array.

 

Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time.

 

I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case.

 

I always like the way Wirth's languages have these different data types. Forth makes you work hard for that nice stuff.  

So in my Pascal Envy fever I thought I would see what it would take to make a "packed array of bits" and it is more code than I thought.

 

The overhead just to set 1 bit, according to the CLASSIC99 9901 timer is 1.8 milli-seconds in my Forth system.

 

EDIT: New times   1.5mS to set a bit.

                              1.1mS to read a bit (BIT@)

 

This could be really sped up with assembly language code words but it's still a lot of work to strip out bits.

 

 

EDIT: Here is the revised code with fixed RSHIFT and uses CELL+ and CELLS to be less platform dependant.

\ BOOLEAN array experiment

\ BOOLEAN data is one CELL (16 bits on TMS9900)

HEX
\ create & erase memory area for 'n' bits
: BITS:  ( n -- ) CREATE    8 /  HERE OVER 0 FILL  CELL+  ALLOT  ;  \ added 2 bytes extra

\ compute bit# in a cell & cell address
: BITFLD     ( bit# addr[] -- bit#' addr)
               SWAP 10 /MOD CELLS ROT +  ;

: BIT@      ( bit# addr -- ? )
              BITFLD @               \ compute bit# & fetch bits in cell
              SWAP RSHIFT            \ if bit#<>0 RSHIFT,
              0001 AND ;             \ mask 1 bit
              
: BIT#>MASK ( bit# -- n )  0001 SWAP LSHIFT ;

: BSET      ( bit# addr[] -- )
              BITFLD                   ( -- bit# addr)
              SWAP BIT#>MASK >R        \ save the mask
              DUP @                    \ -- addr bits
              R> OR SWAP ! ;           \ or mask into bit, store in addr

: BRST      ( bit# addr[] -- )
              BITFLD                   ( -- bit# addr)
              SWAP BIT#>MASK INVERT >R  \ invert and save mask
              DUP @                     \ -- addr bits
              R> AND SWAP ! ;           \ mask out bits, store back in addr

\ test code
 DECIMAL
  300 BITS: ]X      \ make array X of 1000 bits

: FILLBITS   300 0 DO  I ]X BSET   LOOP ;
: CLRBITS    300 0 DO  I ]X BRST   LOOP ;
: EVENBITS   ." Erasing..."  CLRBITS 300 0 DO  I ]X BSET   2 +LOOP ;
: SHOWBITS   300 0 DO  I ]X BIT@ . LOOP ;


Edited by TheBF, Thu Sep 14, 2017 2:52 PM.


#159 apersson850 OFFLINE  

apersson850

    Dragonstomper

  • 605 posts

Posted Wed Aug 30, 2017 3:40 PM

I've not checked how the PME routines for packing and unpacking works. But obviously they take more time than not having to run them.



#160 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Thu Sep 14, 2017 2:16 PM

So I ran some of the *HAYES tester on my Forth system and guess what?

 

I coded my bit shifting words in a way that made them FAIL.  No shock because I didn't read the spec. (DUH)

 

In the WORD BIT@ in my earlier post, I added an IF statement to cope with a 0 bit shift.  Well the ANS/ISO Forth standard already anticipated that.

So I recoded my shift words (LSHIFT & RSHIFT) to pass the test and now BIT@ is much faster and much simpler.

 

I have edited the previous POST to reflect the correction.

: BIT@      ( bit# addr -- ? )
              BITFLD @               \ compute bit# & fetch bits in cell
              SWAP RSHIFT            \ if bit#<>0 RSHIFT,
              0001 AND ;             \ mask 1 bit

* The Hayes tester is set of test words and test suites for every standard Forth word that let's you confirm that a Forth WORD is doing what it's supposed to do.

It looks like this for a simple word like SWAP:  

 

T{ 1 2 3 SWAP -> 1 3 2 }T

 

​You get OK if the output swap matches the '1 3 2' after the arrow and an error message if not.

 

To read more: http://forth-standar...ndard/testsuite


Edited by TheBF, Thu Sep 14, 2017 2:36 PM.


#161 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Fri Apr 26, 2019 11:53 AM

I had always wanted to come back to this little demo to properly do this in Forth in an idiomatic way and for the best performance.
This of course means using the Forth assembler.  There is no other way to match compilers that generate native code with a Threaded code Forth.

So I took a look at Tursi's code. 
I made a some slight improvements by keeping the loop limits in a register rather than re-loading the register every time through the loop.

 

I  used the Forth compiler to make byte directive so signed bytes can be used for the counters. I liked the way that made the code look. ;-)
This happens at compile time so it costs nothing in run-time.

Then I chopped up the big program into 4 routines which would be the Forth approach. This meant I could test each movement independantly.
I used Forth for all the setup stuff, clearing the screen and creating the sprite.
 
After that it's just 4 words in a 100 iteration loop.
 
The spoiler has the new code and the video shows the timing using the VDP interrupt on the PC. It seems to agree with my iPhone stopwatch.

 

Spoiler

Attached Files



#162 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Fri Apr 26, 2019 12:30 PM

By the way, for reference,  when I did a straight translation to Forth Assembler of Tursi's original code it was about  6.2 seconds on my Classic99/PC combo.

With the loop limit in a register it came down to 5.91

 

The setup Forth code takes about 22mS measured with the 9901 timer so it's insignificant.

 

I never saw anything close to the published 5 seconds.  Is this a hardware issue on my end?



#163 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Fri May 3, 2019 9:24 AM

Since I found a way to take some instructions out of CAMEL99 Forth direct threaded version I re-did some times on the Sieve of Eratosthenes. 
Each Forth system ran the same code in the spoiler.
 
Again we can see TurboForth's strategy of installing a maximum number of Forth routines in 16bit RAM as the one to beat.
 
But using the BL instruction to assist in making direct threaded code shaved the time down from 7.25 to 7.1 seconds.
And the newest version of Camel Forth, where more of the primitive routines are written in Assembler improves the time versus regular Camel Forth by 20%.
 
I believe the FBFORTH performance is being held back by the way FIG Forth implements calling Forth's "inner interpreters" per this discussion:

http://http://www.br...ers/moving3.htm

 Two approaches have been used in the past:
1. The fig-Forth solution. Fig-Forth reserved the first cell of
the Parameter Field to hold the address of the high-level code.
The DODOES routine then obtained the Parameter Field address,
pushed the address of the actual data (typically PFA+2) onto
the stack, fetched the address of the high-level routine,
and EXECUTEd.


 Manual timings
 1000 primes
 X BASIC         14.3 secs
 TI BASIC        11.9

 10000 Primes
 ----------------------
 TurboForth       8.0
 CAMEL99         10.28
 CAMEL99G         8.3
 CAMEL99 DTC      7.1   (new version)
 FBFORTH         15.0

 
 

Spoiler


Edited by TheBF, Fri May 3, 2019 9:29 AM.


#164 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,991 posts
  • Location:Silver Run, Maryland

Posted Fri May 3, 2019 11:43 AM

The fbForth time will likely improve if you first remove the hook to the fbForth ISR by storing 0 at >83C4:

HEX 0 83C4 !

...lee



#165 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Fri May 3, 2019 12:58 PM

I will do that and edit the result Lee when I get back to my desk.

#166 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,105 posts
  • Location:Uzbekistan (no, really!)

Posted Fri May 3, 2019 2:03 PM

Nice!



#167 RXB OFFLINE  

RXB

    River Patroller

  • 3,601 posts
  • Location:Vancouver, Washington, USA

Posted Fri May 3, 2019 4:50 PM

Youtube deleted most of my videos off my channel and no reasons why or what the problem is.....?

 

Youtube management are IDIOTS at best and the software they use to find Porn or Hate Speech is pure crap.

 

Example is Youtube deleted a PBS channel as Porn and it was Sesame Street? 

 

Could Youtube write a worse junk program?



#168 TheBF ONLINE  

TheBF

    Stargunner

  • 1,044 posts
  • Location:The Great White North

Posted Sat May 4, 2019 1:03 PM

The fbForth time will likely improve if you first remove the hook to the fbForth ISR by storing 0 at >83C4:

HEX 0 83C4 !

...lee

 

Lee,

 

I tried this twice just now.

After FBForth started I got 15 seconds.

Then again after  0 83C4 !  I got 14.86.

 

Your interrupts are not taking very much time away from the CPU when they are idling it seems.

 

When I look at your inner interpreter I cannot see where the overhead is coming from?






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users