When the Forth language first came onto the world stage it was implemented as something called an "indirect threaded code" language. This is a very clever way to make a language that fit a lot of stuff in a small space. You can see that with the big Forth systems that are running on the TI-99. An interpreter, a compiler, a text editor and a virtual memory disk system typically can fit in 16K using this approach.
The secret to this code density is that every routine is identified by just one address. The secret to making it run quickly is creating a little routine to read those addresses and do the code associated with them as fast as possible. On the TMS9900 that little routine is 3 instructions of assembly language so it's pretty fast. But as we have seen in the benchmarks with GCC and assembly language code there is a price penalty with threaded code that can range from 2.5 to as much as 10X or more in the worst cases.
It turns out that ITC Forth spends about 50% of the time running those three instructions that read the address which sometimes us called the inner interpreter. (as opposed to the "outer" text interpreter)
At the bottom of every Forth system are a pile of little routines written in assembler that do all the real work. They are simple things that read the contents of a memory address or add two numbers together. The code is all there and it is normally just called by the inner interpreter. Each routine ends with a call back to the inner interpreter.
Wouldn't it be handy if we could use all that code the way a conventional compiler does and copy the routines into memory all in row and eliminate the inner interpreter?
This little piece of Forth code does just that. It reads a string of Forth words that are assembly language words and strings the code together as one long routine. This is not as space efficient, but it is about 2.4 times faster!
Here is an example that takes one number on the stack and performs a pile of operations on it.
: OPTEST \ mixed
DUP SWAP OVER ROT DROP DUP AND
DUP OR DUP XOR 1+ 1- 2+ 2- 2*
2/ NEGATE ABS + 2* DROP
Here it is with the operations compiled into a new routine called F(X)
The speed increased by 2.4 times vs ITC version
CAMEL99 ITC: 11.7 secs
CAMEL99 DTC: 9.1 secs
CAMEL99 F(X) 4.9 secs
CODE F(X) ( n -- )
INLINE[ DUP SWAP OVER ROT DROP DUP AND ]
INLINE[ DUP OR DUP XOR 1+ 1- 2+ 2- 2* ]
INLINE[ 2/ NEGATE ABS + 2* DROP ]
And here are the extensions to the language that let us do the magic.
With some small changes this can work just the same in Turbo-Forth or FB-Forth.
Edited by TheBF, Thu Aug 3, 2017 2:16 PM.