Decompiling Camel Forth
Last week I asked the author of Camel Forth, Brad Rodriguez, if he had decompiler code for Camel Forth.
He told me he had never written a de-compiler for any Forth that he had written but he was sure "you can handle it".
So I took a run at it.
Forth has a "dictionary" which is a linked list that contains the names of all the words in the language.
Along with the name there are a couple of extra fields that let the system find the code that goes with the word name.
Camel Forth, like FB Forth and Turbo Forth is a "threaded code" system.
This means that the compiler does not generate machine code, but rather creates lists of addresses.
These addresses can point to other routine addresses but eventually they point to real machine code.
So to decompile this kind of Forth means you have to read through each list and find the name of the
word associated with the address, print the name and move ahead two bytes and read the next address.
You continue doing this until you get to the address of the routine called EXIT, which is the Forth
equivalent of "RETURN" or RT in assembler.
Each Forth word begins with the address of a machine code routine that is the actual "interpreter" for that kind of word.
There is a special routine for variables, constants, new routine definitions (colon definitions) and one for something
called USER variables that are used when tasks need their own copy of some variables. These special
routines let you identify the "type" of each Forth word even though Forth is not a "typed" language.
Armed with that "type" information you can figure out how to print out the info for any given Forth word.
There are three different "decompilers" in this implementation:
One for the "primitves" which are real machine code.
When I encounter a "CODE" word in this decompiler I resorted to simply printing our the machine code since
making a dis-assembler would be a whole other project. The printed CODE word could actually be pasted
into the CAMEL Forth and used as is so that has some benefit.
Another decompiler is for variables, constants and USER variables so you see what there value is.
This is a little bit frivolous because you can do that with the Forth interpreter anytime you want but it makes
the thing consistent for the end user.
And a final decompiler is for "colon" definitions. (Forth words)
For some language constructs I elected to show what is compiled in the Forth routine rather than re-creating
the exact source code. This can be a difficult thing for newbies to grok about Forth.
The IF ELSE THEN and BEGIN UNTIL words actually compile little code routines called BRANCH and
?BRANCH which are like assembler B @xxx and JEQ @xxxx instructions. In fact the word BEGIN doesn't
compile anything into a program, It's just a general purpose label for loops to branch back too! So BEGIN
disappears when you decompile Forth code. The branching words are always followed by a number
which is the how many bytes the program has to jump. Positive numbers jump forward and negative numbers
cause the program to jump back.
I also take this approach for ." and DO/LOOP constructs. You see what the compiler did rather than the
de-compiler re-creating pretty source code.
The traditional name for a decompiler in Forth is "SEE".
Here are the example routines we will de-compile:
*EDIT* Corrected TEST2 per Lee's observation. GIF is still erroneous. Thou hast been warned.
\ Examples to DE-COMPILE
: TEST1 1 2 3 + + . ;
: TEST2 100 0 DO I . LOOP ;
: TEST3 BEGIN ." HELLO WORLD!" SPACE AGAIN ;
: TEST4 TEST1 TEST2 TEST3 ;
The de-compiler code is in the spoiler and the GIF shows the decompiler in action.
Edited by TheBF, Sat Jul 28, 2018 8:27 AM.