+TheBF Posted August 3, 2017 Share Posted August 3, 2017 (edited) When the Forth language first came onto the world stage it was implemented as something called an "indirect threaded code" language. This is a very clever way to make a language that fit a lot of stuff in a small space. You can see that with the big Forth systems that are running on the TI-99. An interpreter, a compiler, a text editor and a virtual memory disk system typically can fit in 16K using this approach. The secret to this code density is that every routine is identified by just one address. The secret to making it run quickly is creating a little routine to read those addresses and do the code associated with them as fast as possible. On the TMS9900 that little routine is 3 instructions of assembly language so it's pretty fast. But as we have seen in the benchmarks with GCC and assembly language code there is a price penalty with threaded code that can range from 2.5 to as much as 10X or more in the worst cases. It turns out that ITC Forth spends about 50% of the time running those three instructions that read the address which sometimes us called the inner interpreter. (as opposed to the "outer" text interpreter) At the bottom of every Forth system are a pile of little routines written in assembler that do all the real work. They are simple things that read the contents of a memory address or add two numbers together. The code is all there and it is normally just called by the inner interpreter. Each routine ends with a call back to the inner interpreter. Wouldn't it be handy if we could use all that code the way a conventional compiler does and copy the routines into memory all in row and eliminate the inner interpreter? This little piece of Forth code does just that. It reads a string of Forth words that are assembly language words and strings the code together as one long routine. This is not as space efficient, but it is about 2.4 times faster! Here is an example that takes one number on the stack and performs a pile of operations on it. HEX : OPTEST \ mixed 3000 0 DO AAAA DUP SWAP OVER ROT DROP DUP AND DUP OR DUP XOR 1+ 1- 2+ 2- 2* 2/ NEGATE ABS + 2* DROP LOOP ; Here it is with the operations compiled into a new routine called F(X) The speed increased by 2.4 times vs ITC version CAMEL99 ITC: 11.7 secsCAMEL99 DTC: 9.1 secsCAMEL99 F(X) 4.9 secs HEX CODE F(X) ( n -- ) INLINE[ DUP SWAP OVER ROT DROP DUP AND ] INLINE[ DUP OR DUP XOR 1+ 1- 2+ 2- 2* ] INLINE[ 2/ NEGATE ABS + 2* DROP ] NEXT, END-CODE : ILTEST 3000 0 DO AAAA F(X) LOOP ; And here are the extensions to the language that let us do the magic. With some small changes this can work just the same in Turbo-Forth or FB-Forth. \ inline.fth a simple speedup for ITC FORTH July 2017 B Fox HEX 045A CONSTANT $NEXT \ code for CAMEL99 NEXT (B *R10) : >BODY ( cfa -- pfa ) 2+ ; : CODE !CSP HEADER HERE >BODY , ; : NEXT, $NEXT , ; : END-CODE ?CSP ; \ CFA of a code word contains the address of the next cell : ?CODE ( cfa -- ) DUP @ 2- - ABORT" Not code word" ; \ works only for ITC \ scan MACHINE code looking for the NEXT, routine. \ abort if NEXT is not found after 256 bytes. This is an arbitrary size \ but most Forth code words are much smaller than 256 bytes. : TONEXT ( adr -- adr2 ) 0 \ this is a flag that falls thru if we don't succeed SWAP ( ADR) 80 \ max length of the allowed code word is $80 CELLS BOUNDS DO I @ $NEXT = \ test each CELL for presence of "B *R10" IF DROP I LEAVE THEN 2 +LOOP DUP 0= ABORT" can't find NEXT" ; : RANGE ( cfa -- addr cnt ) >BODY DUP TONEXT OVER - ; : TRANSCRIBE ( srcadr cnt -- ) HERE OVER ALLOT SWAP CMOVE ; \ write code block to HERE : INLINE, ( $addr -- ) \ compile code for $addr word FIND 0= ABORT" not found" DUP ?CODE RANGE TRANSCRIBE ; \ inline a string of code words all at once : INLINE[ \ usage: inline[ over + swap dup = OR ] BEGIN BL WORD COUNT PAD PLACE \ move word to buffer (PAD) PAD 1+ C@ [CHAR] ] <> \ 1st char <> ']' WHILE \ while true PAD INLINE, \ compile the word in buffer REPEAT ; Edited August 3, 2017 by TheBF 4 Quote Link to comment Share on other sites More sharing options...
Willsy Posted August 3, 2017 Share Posted August 3, 2017 Really nice. I think you may have written about this a couple of years back on CLF. Inspired by that I wrote one for TF. I'll have to see if I still have it. Top work! Quote Link to comment Share on other sites More sharing options...
+TheBF Posted August 3, 2017 Author Share Posted August 3, 2017 We were talking by email when you first got the disease ... ummm I mean got interested in writing a Forth system. :-) Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted August 4, 2017 Share Posted August 4, 2017 Very nice, indeed! A version of this would work quite nicely for fbForth 2.0 words in the user dictionary in RAM. However, I am afraid it would not work very well for code (ALC) words in the fbForth 2.0 resident dictionary (RD). Though the cfa and pfa of virtually every word in the RD are in ROM bank 0, their bodies are scattered over all four ROM banks, which would not yield to a linear copy of their machine language. ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted August 4, 2017 Author Share Posted August 4, 2017 Ah yes. The dreaded bank switching. I am living in my little utopian world of RAM. Is the code for ASM: words in the kernel not contiguous for each word. ie:. Are there a lot words that BL to other words in another bank? Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted August 4, 2017 Share Posted August 4, 2017 Ah yes. The dreaded bank switching. I am living in my little utopian world of RAM. Is the code for ASM: words in the kernel not contiguous for each word. ie:. Are there a lot words that BL to other words in another bank? Yes. In fact, there are words that jump to code in another word in the same bank and finish up in the jumped-to code because it is identical and saves space. I think all of the words in your example are safe except NEGATE , which is MINUS in fbForth 2.0 (inherited from fig-Forth through TI Forth). ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 29, 2017 Author Share Posted November 29, 2017 (edited) Finally go around to making INLINE[ ] work without the need to create a new CODE word. This version relies on 2 new words. ASM[ ]ASM. By themselves these words let you insert inline assembly language inside a Forth definition, like you might see in a C compiler. ASM[ simply compiles the addresses into the Forth list of addresses that make it look like a new CODE word. (Notice why it's called "indirect threaded code". (ITC) It take 2 addresses to point to a machine language word.) ASM[ also turns off the compiler! Fun FACT: The Forth assembler interprets text. The only compiler is the comma operator which "compiles" a number into memory. ]ASM is a macro that compiles 2 instructions after the inline machine code LI R9,HERE+4 B *R10 R9 is the Forth instruction pointer (IP). We need to advance IP past the machine code that we compiled into the definition R10 holds the address of the NEXT routine, which is the 3 instruction Forth "inner" interpreter in ITC Forth. An example of what can be done with it is shown in this COMPARE routine that is written in Forth. There are long strings of CODE WORDs in this routine that are wasting time between each word by going through the inner interpreter. When we inline[ ] the words inside the loop we get a 56% speed up. There is no reason to inline the words outside the loop because the size will increase with little improvement. HEX : A$ S" THIS IS A LONG STRING FOR COMPARING TO ANOTHER" ; : B$ S" THIS IS A LONG STRING FOR COMPARING TO ANOTHER TOO" ; : COMPARE ( addr1 u1 addr2 u2 --- diff ) \ Compare two strings. diff is negative if addr1 u1 is smaller, 0 if it \ is equal and positive if it is greater than addr2 u2. ROT 2DUP - >R \ compute U2-U1, push to RSTACK. MIN DUP IF >R \ loop counter push to RSTACK BEGIN OVER C@ OVER C@ - IF SWAP C@ SWAP C@ - 2R> 2DROP \ clean Rstack EXIT \ & get out of this word THEN CHAR+ SWAP CHAR+ SWAP \ incr. addr2 & addr1 R> 1- DUP >R \ get rstack loop counter, decr, copy, push back 0= UNTIL \ loop until loop counter=0 R> \ get the U2-U1 value THEN DROP 2DROP R> NEGATE ; : COMPTEST 100 0 DO A$ B$ COMPARE DROP LOOP ; \ 9.4 seconds : FCOMPARE ( addr1 u1 addr2 u2 --- diff ) ROT 2DUP - >R MIN DUP IF >R BEGIN INLINE[ OVER C@ OVER C@ - ] IF INLINE[ SWAP C@ SWAP C@ - 2R> 2DROP ] EXIT THEN INLINE[ CHAR+ SWAP CHAR+ SWAP R> 1- DUP >R ] 0= UNTIL R> THEN DROP 2DROP R> NEGATE ; : FCOMPTEST 100 0 DO A$ B$ FCOMPARE DROP LOOP ; \ 6 seconds In the new version I also removed some the word names and just put the code inside the definition of INLINE[. This bad form in Forth which favours short definitions, but I wanted to save some space. TI-99 is not the most generous environment. New version is in the spoiler for those who are curious. HEX : CODE HEADER HERE >BODY , !CSP ; : NEXT, 045A , ; \ compile code for "B *R10" : END-CODE ?CSP ; \ check if any junk was left on the stack \ ============== \ TEST for CODE word \ CFA of a code word contains the address of the next cell : ?CODE ( cfa -- ) DUP @ 2- - ABORT" Not code word" ; \ works only for ITC \ scan MACHINE code looking for the NEXT, routine. \ abort if NEXT is not found after 127 bytes. This is an arbitrary size \ but most Forth code words are much smaller than 127 bytes. : TONEXT ( adr -- adr2 ) 0 \ this is a flag that falls thru if we don't succeed SWAP ( ADR) 80 \ max length of the allowed code word is $80 CELLS BOUNDS DO I @ 045A = \ test each CELL for CAMEL99 NEXT (B *R10) IF DROP I LEAVE THEN 2 +LOOP DUP 0= ABORT" can't find NEXT" ; \ : RANGE ( cfa -- addr cnt ) \ >BODY DUP TONEXT OVER - ; \ calc. start and length of code fragment : ASM[ HERE CELL+ , \ compile a pointer to the next cell HERE CELL+ , \ which is the CFA of the inline code pointing to the next cell... [ ; IMMEDIATE \ switch OFF compiler and go to interpreter mode : ]ASM 0209 , HERE 2 CELLS + , \ code macro: LI R9,HERE+4 (moves Forth IP reg.) NEXT, ] ; IMMEDIATE \ switch ON compiler : INLINE[ POSTPONE ASM[ BEGIN BL WORD COUNT PAD PLACE PAD CHAR+ C@ [CHAR] ] <> WHILE PAD FIND 0= ABORT" not found" DUP ?CODE >BODY DUP TONEXT OVER - \ calc. start and length of code fragment HERE OVER ALLOT SWAP CMOVE \ transcribe the code to HERE REPEAT POSTPONE ]ASM ; IMMEDIATE Edited November 30, 2017 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.