Jump to content
IGNORED

Squeezing Forth for more performance with INLINE[ ]


TheBF

Recommended Posts

When the Forth language first came onto the world stage it was implemented as something called an "indirect threaded code" language. This is a very clever way to make a language that fit a lot of stuff in a small space. You can see that with the big Forth systems that are running on the TI-99. An interpreter, a compiler, a text editor and a virtual memory disk system typically can fit in 16K using this approach.
The secret to this code density is that every routine is identified by just one address. The secret to making it run quickly is creating a little routine to read those addresses and do the code associated with them as fast as possible. On the TMS9900 that little routine is 3 instructions of assembly language so it's pretty fast. But as we have seen in the benchmarks with GCC and assembly language code there is a price penalty with threaded code that can range from 2.5 to as much as 10X or more in the worst cases.
It turns out that ITC Forth spends about 50% of the time running those three instructions that read the address which sometimes us called the inner interpreter. (as opposed to the "outer" text interpreter)
At the bottom of every Forth system are a pile of little routines written in assembler that do all the real work. They are simple things that read the contents of a memory address or add two numbers together. The code is all there and it is normally just called by the inner interpreter. Each routine ends with a call back to the inner interpreter.

Wouldn't it be handy if we could use all that code the way a conventional compiler does and copy the routines into memory all in row and eliminate the inner interpreter?
This little piece of Forth code does just that. It reads a string of Forth words that are assembly language words and strings the code together as one long routine. This is not as space efficient, but it is about 2.4 times faster!
Here is an example that takes one number on the stack and performs a pile of operations on it.
HEX
: OPTEST      \ mixed
          3000 0   
          DO                     
               AAAA 
               DUP SWAP OVER ROT DROP DUP AND 
               DUP OR DUP XOR 1+ 1- 2+  2- 2*
               2/ NEGATE ABS + 2* DROP
          LOOP  ;

Here it is with the operations compiled into a new routine called F(X)

The speed increased by 2.4 times vs ITC version
CAMEL99 ITC: 11.7 secs
CAMEL99 DTC: 9.1 secs
CAMEL99 F(X) 4.9 secs
HEX
CODE F(X)  ( n -- )
     INLINE[ DUP SWAP OVER ROT DROP DUP AND ]
     INLINE[ DUP OR DUP XOR 1+ 1- 2+  2- 2* ]
     INLINE[ 2/ NEGATE ABS + 2* DROP ]
     NEXT,
END-CODE

: ILTEST   
          3000 0
          DO
             AAAA  F(X)
          LOOP  ;

And here are the extensions to the language that let us do the magic.

With some small changes this can work just the same in Turbo-Forth or FB-Forth.

 

 

 

\ inline.fth  a simple speedup for ITC FORTH July 2017  B Fox

HEX 045A CONSTANT $NEXT    \ code for CAMEL99 NEXT (B *R10)

: >BODY  ( cfa -- pfa ) 2+ ;
: CODE      !CSP  HEADER  HERE >BODY , ;
: NEXT,     $NEXT , ;
: END-CODE  ?CSP  ;

\ CFA of a code word contains the address of the next cell
: ?CODE ( cfa -- ) DUP @ 2- - ABORT" Not code word" ;  \ works only for ITC

\ scan MACHINE code looking for the NEXT, routine.
\ abort if NEXT is not found after 256 bytes. This is an arbitrary size
\ but most Forth code words are much smaller than 256 bytes.
: TONEXT ( adr --  adr2 )
           0                 \ this is a flag that falls thru if we don't succeed
           SWAP
          ( ADR) 80         \ max length of the allowed code word is $80 CELLS
           BOUNDS
           DO
             I @  $NEXT =   \ test each CELL for presence of "B *R10"
             IF   DROP I LEAVE
             THEN
           2 +LOOP
           DUP 0= ABORT" can't find NEXT" ;

: RANGE  ( cfa -- addr cnt )   >BODY DUP TONEXT OVER  - ;

: TRANSCRIBE ( srcadr cnt -- ) HERE OVER ALLOT SWAP CMOVE  ;   \ write code block to HERE

: INLINE,  ( $addr -- )    \ compile code for $addr word
           FIND 0= ABORT" not found"
           DUP ?CODE
           RANGE TRANSCRIBE ;

\ inline a string of code words all at once
: INLINE[                        \ usage: inline[ over + swap dup = OR ]
           BEGIN
             BL WORD COUNT PAD PLACE   \ move word to buffer (PAD)
             PAD 1+ C@ [CHAR] ] <>     \ 1st char <> ']'
           WHILE                       \ while true
             PAD INLINE,               \ compile the word in buffer
           REPEAT ;

 

 

 

Edited by TheBF
  • Like 4
Link to comment
Share on other sites

Very nice, indeed! :) A version of this would work quite nicely for fbForth 2.0 words in the user dictionary in RAM. However, I am afraid it would not work very well for code (ALC) words in the fbForth 2.0 resident dictionary (RD). Though the cfa and pfa of virtually every word in the RD are in ROM bank 0, their bodies are scattered over all four ROM banks, which would not yield to a linear copy of their machine language.

 

...lee

Link to comment
Share on other sites

Ah yes. The dreaded bank switching. I am living in my little utopian world of RAM.

 

Is the code for ASM: words in the kernel not contiguous for each word.

ie:. Are there a lot words that BL to other words in another bank?

Link to comment
Share on other sites

Ah yes. The dreaded bank switching. I am living in my little utopian world of RAM.

 

Is the code for ASM: words in the kernel not contiguous for each word.

ie:. Are there a lot words that BL to other words in another bank?

 

Yes. In fact, there are words that jump to code in another word in the same bank and finish up in the jumped-to code because it is identical and saves space.

 

I think all of the words in your example are safe except NEGATE , which is MINUS in fbForth 2.0 (inherited from fig-Forth through TI Forth).

 

...lee

Link to comment
Share on other sites

  • 3 months later...

Finally go around to making INLINE[ ] work without the need to create a new CODE word.

 

This version relies on 2 new words. ASM[ ]ASM.

By themselves these words let you insert inline assembly language inside a Forth definition, like you might see in a C compiler.

ASM[ simply compiles the addresses into the Forth list of addresses that make it look like a new CODE word.

(Notice why it's called "indirect threaded code". (ITC) It take 2 addresses to point to a machine language word.)

 

ASM[ also turns off the compiler!

Fun FACT: The Forth assembler interprets text.

The only compiler is the comma operator which "compiles" a number into memory.

 

 

]ASM is a macro that compiles 2 instructions after the inline machine code

LI R9,HERE+4
B *R10

R9 is the Forth instruction pointer (IP). We need to advance IP past the machine code that we compiled into the definition

R10 holds the address of the NEXT routine, which is the 3 instruction Forth "inner" interpreter in ITC Forth.

 

An example of what can be done with it is shown in this COMPARE routine that is written in Forth.

There are long strings of CODE WORDs in this routine that are wasting time between each word

by going through the inner interpreter. When we inline[ ] the words inside the loop we get a 56% speed up.

There is no reason to inline the words outside the loop because the size will increase with little improvement.

HEX

: A$  S" THIS IS A LONG STRING FOR COMPARING TO ANOTHER" ;
: B$  S" THIS IS A LONG STRING FOR COMPARING TO ANOTHER TOO" ;


: COMPARE       ( addr1 u1 addr2 u2 --- diff )
\ Compare two strings. diff is negative if addr1 u1 is smaller, 0 if it
\ is equal and positive if it is greater than addr2 u2.
                ROT 2DUP - >R                    \ compute U2-U1, push to RSTACK.
                MIN DUP
                IF
                    >R                           \ loop counter push to RSTACK
                    BEGIN
                       OVER C@ OVER C@ -
                       IF  SWAP C@ SWAP C@ -
                           2R> 2DROP             \ clean Rstack
                           EXIT                  \ & get out of this word
                       THEN
                       CHAR+ SWAP CHAR+ SWAP     \ incr. addr2 & addr1
                       R> 1- DUP >R              \ get rstack loop counter, decr, copy, push back
                     0= UNTIL                    \ loop until loop counter=0
                    R>                           \ get the U2-U1 value
                THEN
                DROP 2DROP R> NEGATE ;

: COMPTEST   100 0 DO  A$ B$ COMPARE DROP LOOP ;   \ 9.4 seconds

: FCOMPARE       ( addr1 u1 addr2 u2 --- diff )
                ROT 2DUP - >R  MIN DUP
                IF
                   >R
                   BEGIN
                      INLINE[  OVER C@ OVER C@ - ]
                       IF  INLINE[ SWAP C@ SWAP C@ - 2R> 2DROP ]
                           EXIT
                       THEN
                         INLINE[ CHAR+ SWAP CHAR+ SWAP R> 1- DUP >R ]
                   0= UNTIL
                   R>
                THEN
                DROP 2DROP R> NEGATE ;

: FCOMPTEST   100 0 DO  A$ B$ FCOMPARE DROP LOOP ;   \ 6 seconds



In the new version I also removed some the word names and just put the code inside the definition of INLINE[.

This bad form in Forth which favours short definitions, but I wanted to save some space.

TI-99 is not the most generous environment.

 

New version is in the spoiler for those who are curious.

 

 

 

HEX

: CODE        HEADER  HERE >BODY , !CSP ;
: NEXT,       045A , ;        \ compile code for "B *R10"
: END-CODE    ?CSP  ;         \ check if any junk was left on the stack

\ ==============
\ TEST for CODE word
\ CFA of a code word contains the address of the next cell

: ?CODE ( cfa -- ) DUP @ 2- - ABORT" Not code word" ;  \ works only for ITC

\ scan MACHINE code looking for the NEXT, routine.
\ abort if NEXT is not found after 127 bytes. This is an arbitrary size
\ but most Forth code words are much smaller than 127 bytes.
: TONEXT ( adr --  adr2 )
           0                 \ this is a flag that falls thru if we don't succeed
           SWAP
          ( ADR) 80         \ max length of the allowed code word is $80 CELLS
           BOUNDS
           DO
             I @  045A   =   \ test each CELL for CAMEL99 NEXT (B *R10)
             IF   DROP I LEAVE
             THEN
           2 +LOOP
           DUP 0= ABORT" can't find NEXT" ;

\ : RANGE  ( cfa -- addr cnt )
\         >BODY DUP TONEXT OVER  -  ;  \ calc.  start and length of code fragment


: ASM[
           HERE CELL+ ,            \ compile a pointer to the next cell
           HERE CELL+ ,            \ which is the CFA of the inline code pointing to the next cell...
           [  ;  IMMEDIATE         \ switch OFF compiler and go to interpreter mode

: ]ASM     0209 ,  HERE 2 CELLS + ,  \ code macro:  LI R9,HERE+4  (moves Forth IP reg.)
           NEXT,
           ] ;   IMMEDIATE         \ switch ON compiler


: INLINE[
           POSTPONE ASM[
           BEGIN
             BL WORD COUNT PAD PLACE
             PAD CHAR+ C@ [CHAR] ] <>
           WHILE
             PAD FIND 0= ABORT" not found"
             DUP ?CODE 
             >BODY DUP TONEXT OVER  -     \ calc.  start and length of code fragment
             HERE OVER ALLOT SWAP CMOVE   \ transcribe the code to HERE
           REPEAT 
           POSTPONE ]ASM ;  IMMEDIATE

 

 

Edited by TheBF
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...