Jump to content
TheBF

Machine Forth OMG

Recommended Posts

So although machine Forth is nice to play with I am still longing to create a threaded system that uses the BL instruction based calling system with the return stack keeping old R11 values.

I know it will get bigger than ITC so what about "indirect sub-routine threading" Where we keep the Forth instruction pointer but DOCOL is looks more like:

 

( I am making this up here)

MOV R11 *RP

MOV *IP+ W

BL *W

 

I don't have the details clear in my head but does it inspire any thoughts from you guys?

 

BF

 

Here is fbForth’s inner interpreter (sans DODOES):

 

DOCOL DECT R
MOV IP,*R
MOV W,IP
$NEXT MOV *IP+,W
DOEXEC MOV *W+,R1
B *R1
$SEMIS MOV *R+,IP
MOV *IP+,W
MOV *W+,R1
B *R1
Obviously, this is indirect threading, but I show it for comparison because it is going to take me awhile to wrap my head around direct threading and machine Forth (mainly, because I have never thought about them seriously). At the risk of stating the obvious, B *NEXT (NEXT contains $NEXT most of the time) ends ALC bodies and the cfa of ; ends all : word bodies, causing execution to continue at $SEMIS. Pardon any ineptitude in my attempts to understand what you are doing. I promise to try harder.
...lee

Share this post


Link to post
Share on other sites

 

Same here—though I am not sure what you mean by “ms timing value”. If you mean the 4.096 value, that was just >1000/1000 = 4096/1000 = 4.096, which should be the comparison factor for how Mark first ran the program in TF.

 

...lee

 

Ahh... 4096 = 2^12 DUH

 

BF

Share this post


Link to post
Share on other sites

 

I think direct threaded might be better suited to the 9900:

 

Let R0 be instruction pointer. So, you'd have:

next:
    b *r0+

That's it.

 

High level and low-level Forth words would appear the same. No difference. You'd still have DOCOL and EXIT but they would just be the addresses of the routines, just like DUP etc.

 

There would be no CFA. The first cell of a definition is the address (direct address) of some executable code.

 

So direct threading would add 2 bytes to every header for a Branch instruction vs ITC right?

 

That is definitely smaller than push R11 and BL in the header.

 

I should be able to coax the cross-compiler to do that once I get my head around it...

 

And why R0 for IP versus any other register?

 

 

BF

Share this post


Link to post
Share on other sites

 

 

Here is fbForth’s inner interpreter (sans DODOES):

 

DOCOL DECT R
MOV IP,*R
MOV W,IP
$NEXT MOV *IP+,W
DOEXEC MOV *W+,R1
B *R1
$SEMIS MOV *R+,IP
MOV *IP+,W
MOV *W+,R1
B *R1
Obviously, this is indirect threading, but I show it for comparison because it is going to take me awhile to wrap my head around direct threading and machine Forth (mainly, because I have never thought about them seriously). At the risk of stating the obvious, B *NEXT (NEXT contains $NEXT most of the time) ends ALC bodies and the cfa of ; ends all : word bodies, causing execution to continue at $SEMIS. Pardon any ineptitude in my attempts to understand what you are doing. I promise to try harder.
...lee

 

 

 

I am playing games really. I am creating assembler macros to simulate Forth words and tieing it all together with yet another macro called "CALL"

and ending each sub-routine with RT, . I was just amazed at how much faster native code goes.

CALL    RP DECT
        MOV R11 *RP
        BL  @> "some code"
        MOV *RP+ R11 

The fun thing is that the Forth Assembler is so flexible that you can do this with little effort.

 

The other piece that you need is a target compiler.

It works just like the Forth compiler.

 

And for your interest a basic target compiler starts like this:

VARIABLE TARGMEM   64k ALLOT
VARIABLE TDP          \ target dictionary pointer

: THERE    ( -- addr)    TARGMEM TDP @ + ;  \  Target memory so not HERE but THERE :-)
: TALLOT   ( u -- )      TDP +!  ;
: T!       ( n addr -- ) TARGMEM +  ! ;
​: TC!      ( n addr -- ) TARGMEM + C! ;
: T,       ( n addr -- ) THERE  !   2 TALLOT ;  \ like "comma"
: TC,      ( c addr -- ) THERE  C!  1 TALLOT ;  \ like C, 
​

That's about it. That's the basic framework for a target compiler.

Rewrite the basic parts of Forth but use T, and TC, and you have a system in a new memory space.

 

*edit* Actually re-write your FORTH assembler first, and using T, and TC, etc...

 

At MPE they name these things like this: ,(t) C,(t) HERE(t) ALLOT(t)

Pretty, but more typing.

 

B

Edited by TheBF
  • Like 2

Share this post


Link to post
Share on other sites

I spent some time reworking the my cross-compiler to compile the little code fragments into the word headers (B DOVAR B DOCOL etc) for DIRECT threaded code, but it's not running yet.

The Kernel compiles and loads but in the debugger I can see that I am not pulling the correct addresses yet as the code travels through next.

Which is only 2 instructions now! It did make the kernel with tools installed ,go from 7022 bytes to 7124 (or so),

I am away from home at the moment so I don't have the exact numbers in front of me.

 

Thanks Willsy, now I have ANOTHER project to complete.

 

But thanks for real because it forced me to re-organize the cross-compiler code.

 

QUESTION:

 

This cross compiler is running on DOS because I am using my almost ANS Forth update of HsForth for DOS.

 

If I was to port this for general usage should I go with GForth or Win32Forth or just release it to the world for DOSBOX?

 

 

BF

Edited by TheBF

Share this post


Link to post
Share on other sites

LOL!

 

So the answer is ALL of the above + VFX. OK. Sounds like for some, I could release it for DOSBOX first.

 

(VFX is pretty amazing. I always liked MPE products but they are bleeding edge of Forth these days)

 

BF

Share this post


Link to post
Share on other sites

Well it's a cross-compiler, right? Runs on a PC? In that case I'd target MPE's free version of VFX and only write it once ;-)

Share this post


Link to post
Share on other sites

There again, if it's written in ANS Forth then it *should* run on all of the big-name compilers.

 

*cough cough* :-)

Share this post


Link to post
Share on other sites

There again, if it's written in ANS Forth then it *should* run on all of the big-name compilers.

 

*cough cough* :-)

 

Thanks be to G_d we are not on comp.lang.forth :-)))

 

Battle plans would be in progress.

 

B

  • Like 2

Share this post


Link to post
Share on other sites

What is the state of the art for '99 Forth cross compilers regarding macro inlining of small code words?

 

I was thinking about how to improve the speed of my 6809 Vectrex/Camel Forth and came to a similar conclusion as this thread i.e. instead of rewriting the compiler as STC (not enough time, never going to happen) I could make it make it STC-ish by inclining code and reducing the call overhead.

 

(I remembered the inlining thread that came after this one and searched for it, but first came across this thread - will reread the inlining thread next. Simple inlining was actually what I was thinking about using initially, but of course the mind wanders...)

Share this post


Link to post
Share on other sites
1 hour ago, D-Type said:

What is the state of the art for '99 Forth cross compilers regarding macro inlining of small code words?

 

I was thinking about how to improve the speed of my 6809 Vectrex/Camel Forth and came to a similar conclusion as this thread i.e. instead of rewriting the compiler as STC (not enough time, never going to happen) I could make it make it STC-ish by inclining code and reducing the call overhead.

 

(I remembered the inlining thread that came after this one and searched for it, but first came across this thread - will reread the inlining thread next. Simple inlining was actually what I was thinking about using initially, but of course the mind wanders...)

In-lining Assembly language in indirect threaded code is pretty space wasteful.  I wrote a way to do it and here is how it goes.

 

You need to create  the indirect links to move from threaded code to native machine code.  That adds 4 bytes to enter the code.

After the machine runs the instruction pointer has moved forward but the Forth interpreter doesn't know about it so you have to move Forth IP register forward to which I did by compiling more machine code. :)  That added 4 more bytes.  Then you compile a way to run the Forth inner interpreter which in my case is another 2 bytes. 

 

So all together you add 10 bytes just to enter and exit machine code from within ITC Forth code.

Not great.

\ put inline ASM in colon definitions
: ASM[
           HERE CELL+ ,            \ compile a pointer to the next cell
           HERE CELL+ ,            \ which is the CFA of the inline code
           [  ;  IMMEDIATE         \ switch to interpreter mode

: ]ASM     0209 ,  HERE 2 CELLS + , \ macro:  LI R9,HERE+4
                                    \ moves Forth IP reg.)
           NEXT,
           ] ;   IMMEDIATE          \ switch ON compiler

Better to just write some code words because at least they are re-useable elsewhere. 

However for 1 use the size of the dictionary entry is probably bigger than using ASM[  ]ASM so … maybe it's useful.

 

Now in a sub-routine threaded system it is beautiful. In fact on some processors like the 9900 the Forth instruction and machine code are the same size as the  CALL <ADDRESS> combination.

So for STC systems it's best to inline most of the Forth intrinsic instructions.  ( + - @ ! etc.) 

 

 

 

  • Like 1

Share this post


Link to post
Share on other sites

Thanks for the explanation. It all seems so simple, but I don't yet understand the inner workings enough to really judge 🙂

 

10 bytes doesn't really seem much of an overhead...does anyone care about memory usage these days? Maybe it's a problem on the '99, I know it has some strange architectural challenges, maybe that's one of them.

 

I read also the Inlining thread, it wasn't how I remembered it, but it's food for thought for my own Vectrex future enhancements!

 

Currently I'm working on interfacing the Vectrex BIOS routines from Forth i.e. creating an API. Nothing public yet, but I'll be putting v1 on Github eventually. It actually already is on Github, but Private.

Share this post


Link to post
Share on other sites

Typically the ASM code you need is not very big so 10 bytes may or may not be important. Speed/size. It's the classic trade-off.

BTW on 6809 is might be smaller because of 8 bit op-codes in some cases.  However the ASM[ will be pretty much the same for any ITC Forth. (I think)

Share this post


Link to post
Share on other sites
7 hours ago, TheBF said:

In-lining Assembly language in indirect threaded code is pretty space wasteful.  I wrote a way to do it and here is how it goes.

        <snip>

So all together you add 10 bytes just to enter and exit machine code from within ITC Forth code.

Not great.

 

I might need to translate that to fbForth. I suspect it will cost me more memory real estate, however—what with vocabularies and all. |:)

 

...lee

Share this post


Link to post
Share on other sites

It's a neat little hack. I think you just need the macro to use your IP register instead of R9 to make it work.  ??

Share this post


Link to post
Share on other sites
16 hours ago, Lee Stewart said:

 

I might need to translate that to fbForth. I suspect it will cost me more memory real estate, however—what with vocabularies and all. |:)

 

...lee

In case you haven't tried this yet, I was mistaken.  Very sorry.

 

The only time I used these two words was in a bigger program. It worked fine.

The concept is valid but I think it needs to separate the  '['  ']'  words to operate on their own. More testing needed to use it independently.

Here is the original usage.

\ inline.fth  a simple speedup for ITC FORTH July 2017  B Fox

\ Premis:
\ An indirect threaded code (ITC) system can spend up to 50% of its time 
\ running the Forth thread interperpreter, typically called NEXT.
\ The ITC NEXT routine is three instructions on the TMS9900.
\ The Forth Kernel contains many words called primitives, that are coded
\ in Assembler.
\ Many of these primitives are only 1 or 2 instructions.
\ INLINE[ ... ] copies the code from a primitive and compiles it in a new 
\ definition but removes the call to NEXT at the end of each primitive.
\ This can double the speed of chains of CODE words.

\ **not portable Forth code**  Uses TMS9900/CAMEL99 CARNAL Knowledge

\ INCLUDE DSK1.CODE

HEX
\ TEST for CODE word
\ CFA of a code word contains the address of the next cell
: ?CODE ( cfa -- ) DUP @ 2- - ABORT" Not code word" ;

\ scan MACHINE code looking for the NEXT, routine.
\ abort if NEXT is not found after 256 bytes. This is an arbitrary size
\ but most Forth code words are much smaller than 256 bytes.
: TONEXT ( adr --  adr2 )
           0                \ flag that falls thru if we don't succeed
           SWAP
          ( ADR) 80         \ max length of code word is $80 CELLS
           BOUNDS
           DO
             I @  045A   =   \ test each CELL for CAMEL99 NEXT (B *R10)
             IF   DROP I LEAVE
             THEN
           2 +LOOP
           DUP 0= ABORT" can't find NEXT" ;

\ : RANGE  ( cfa -- addr cnt )
\         >BODY DUP TONEXT OVER  -  ;  \ calc.  start and length of code

\ put inline ASM in colon definitions
: ASM[
           HERE CELL+ ,            \ compile a pointer to the next cell
           HERE CELL+ ,            \ which is the CFA of the inline code
           [  ;  IMMEDIATE         \ switch to interpreter mode

: ]ASM     0209 ,  HERE 2 CELLS + , \ macro:  LI R9,HERE+4
                                    \ moves Forth IP reg.)
           NEXT,
           ] ;    IMMEDIATE          \ switch ON compiler

\ create code words using primitives
: CODE[    BEGIN
             BL PARSE-WORD PAD PLACE
             PAD CHAR+ [email protected] [CHAR] ] <>
           WHILE
             PAD FIND 0= ABORT" not found"
             DUP ?CODE
             >BODY DUP TONEXT OVER  -     \ calc. start and len. of code
             HERE OVER ALLOT SWAP CMOVE   \ transcribe the code to HERE
           REPEAT ; IMMEDIATE

\ embed  a literal number as machine code  *HUGE* 8 bytes!!
\ equivalent: TOS PUSH,  LI TOS ( n ) , ;
: :ARG   ( n -- ) 0646 , C584 , 0204 , ( n) ,  ;  IMMEDIATE

\ compile primitives inline inside a colon definition
: INLINE[
           POSTPONE ASM[
           POSTPONE CODE[
           POSTPONE ]ASM ;  IMMEDIATE
\ ===================================


\ EXAMPLES
\ CODE 1+!  ASM[ *TOS INC,  TOS POP, ]ASM  NEXT,

\ CREATE Q  20 ALLOT
\ CODE ]Q    CODE[ 2* ]  Q :ARG  CODE[ + ] NEXT, END-CODE

 : *+       INLINE[ * + ]    ;

 : [email protected]    INLINE[ DUP [email protected] ] ;
\ : DUP>R    INLINE[ DUP >R ] ;
 : ^2       INLINE[ DUP *  ] ;


 

 

Share this post


Link to post
Share on other sites
14 hours ago, TheBF said:

It's a neat little hack. I think you just need the macro to use your IP register instead of R9 to make it work.  ??

 

Unfortunately, fbForth (as does TI Forth) requires a vocabulary change to ASSEMBLER to expose the Forth Assembler and a return to the FORTH vocabulary when done with in-lining the ALC. ASM: and ;ASM do this for defining words, but invoking the ASSEMBLER vocabulary would need to be added to ASM[ to enable the Assembler words. ]ASM already covers the return to the FORTH vocabulary because it contains NEXT, , which does that for both fbForth and TI Forth.

 

...lee

  • Like 1

Share this post


Link to post
Share on other sites
5 minutes ago, TheBF said:

In case you haven't tried this yet, I was mistaken.  Very sorry.

 

The only time I used these two words was in a bigger program. It worked fine.

The concept is valid but I think it needs to separate the  '['  ']'  words to operate on their own. More testing needed to use it independently.

 

Probably only need to [COMPILE] [ (immediate) but not ] (not immediate). The following works in fbForth:

HEX
: ASM[   \ Begin Forth Assembly Code within high-level Forth
   HERE 2+ ,            \ compile a pointer to the next cell
   HERE 2+ ,            \ which is the CFA of the inline code
   [COMPILE] [          \ switch to interpreter mode  
   [COMPILE] ASSEMBLER  \ switch to ASSEMBLER vocabulary
;  IMMEDIATE

: ]ASM   \ Back to high-level Forth
   020D , HERE 4 + ,    \ LI R13,HERE+4  (move Forth IP to after NEXT,)
   ASSEMBLER NEXT,      \ NEXT, in ASSEMBLER vocabulary in kernel
   FORTH ]              \ back to FORTH vocabulary and switch ON compiler
;    IMMEDIATE
DECIMAL

 

...lee

  • Like 2

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...