Jump to content

Photo

Squeezing Forth for more performance with INLINE[ ]


5 replies to this topic

#1 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 371 posts
  • Location:The Great White North

Posted Thu Aug 3, 2017 2:14 PM

When the Forth language first came onto the world stage it was implemented as something called an "indirect threaded code" language. This is a very clever way to make a language that fit a lot of stuff in a small space. You can see that with the big Forth systems that are running on the TI-99.  An interpreter, a compiler, a text editor and a virtual memory disk system typically can fit in 16K using this approach.
 
The secret to this code density is that every routine is identified by just one address.  The secret to making it run quickly is creating a little routine to read those addresses and do the code associated with them as fast as possible. On the TMS9900 that little routine is 3 instructions of assembly language so it's pretty fast. But as we have seen in the benchmarks with GCC and assembly language code there is a price penalty with threaded code that can range from 2.5 to as much as 10X or more in the worst cases. 
 
It turns out that ITC Forth spends about 50% of the time running those three instructions that read the address which sometimes us called the inner interpreter.  (as opposed to the "outer" text interpreter)
 
At the bottom of every Forth system are a pile of little routines written in assembler that do all the real work. They are simple things that read the contents of a memory address or add two numbers together. The code is all there and it is normally just called by the inner interpreter. Each routine ends with a call back to the inner interpreter.

Wouldn't it be handy if we could use all that code the way a conventional compiler does and copy the routines into memory all in row and eliminate the inner interpreter?
 
This little piece of Forth code does just that.  It reads a string of Forth words that are assembly language words and strings the code together as one long routine. This is not as space efficient, but it is about 2.4 times faster!
 
Here is an example that takes one number on the stack and performs a pile of operations on it. 
HEX
: OPTEST      \ mixed
          3000 0   
          DO                     
               AAAA 
               DUP SWAP OVER ROT DROP DUP AND 
               DUP OR DUP XOR 1+ 1- 2+  2- 2*
               2/ NEGATE ABS + 2* DROP
          LOOP  ;

Here it is with the operations compiled into a new routine called F(X)

The speed increased by 2.4 times vs ITC version
 
CAMEL99 ITC:   11.7 secs
CAMEL99 DTC:    9.1 secs
CAMEL99 F(X)     4.9 secs
HEX
CODE F(X)  ( n -- )
     INLINE[ DUP SWAP OVER ROT DROP DUP AND ]
     INLINE[ DUP OR DUP XOR 1+ 1- 2+  2- 2* ]
     INLINE[ 2/ NEGATE ABS + 2* DROP ]
     NEXT,
END-CODE

: ILTEST   
          3000 0
          DO
             AAAA  F(X)
          LOOP  ;

And here are the extensions to the language that let us do the magic.

With some small changes this can work just the same in Turbo-Forth or FB-Forth.

 

Spoiler

 


Edited by TheBF, Thu Aug 3, 2017 2:16 PM.


#2 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,024 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Aug 3, 2017 4:27 PM

Really nice. I think you may have written about this a couple of years back on CLF. Inspired by that I wrote one for TF. I'll have to see if I still have it.

Top work!

#3 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 371 posts
  • Location:The Great White North

Posted Thu Aug 3, 2017 4:58 PM

We were talking by email when you first got the disease ... ummm I mean got interested in writing a Forth system.
:-)

#4 Lee Stewart OFFLINE  

Lee Stewart

    River Patroller

  • 3,391 posts
  • Location:Silver Run, Maryland

Posted Thu Aug 3, 2017 7:55 PM

Very nice, indeed!   :)  A version of this would work quite nicely for fbForth 2.0 words in the user dictionary in RAM.  However, I am afraid it would not work very well for code (ALC) words in the fbForth 2.0 resident dictionary (RD).  Though the cfa and pfa of virtually every word in the RD are in ROM bank 0, their bodies are scattered over all four ROM banks, which would not yield to a linear copy of their machine language.

 

...lee



#5 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 371 posts
  • Location:The Great White North

Posted Thu Aug 3, 2017 8:31 PM

Ah yes. The dreaded bank switching.  I am living in my little utopian world of RAM.

 

Is the code for ASM: words in the kernel not contiguous for each word. 

ie:. Are there a lot words that BL to other words in another bank?



#6 Lee Stewart OFFLINE  

Lee Stewart

    River Patroller

  • 3,391 posts
  • Location:Silver Run, Maryland

Posted Thu Aug 3, 2017 9:26 PM

Ah yes. The dreaded bank switching.  I am living in my little utopian world of RAM.

 

Is the code for ASM: words in the kernel not contiguous for each word. 

ie:. Are there a lot words that BL to other words in another bank?

 

Yes.  In fact, there are words that jump to code in another word in the same bank and finish up in the jumped-to code because it is identical and saves space.

 

I think all of the words in your example are safe except NEGATE , which is MINUS in fbForth 2.0 (inherited from fig-Forth through TI Forth).

 

...lee






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users