Jump to content
IGNORED

Camel99 Forth Information goes here


TheBF

Recommended Posts

For a little more clarity here is the actual code I used as my example.

EVENT1 CLR  R12         CRU base of the TMS9901 
       SBO  0           Enter timer mode 
       LI   R1,>3FFF    Maximum value
       INCT R12         Address of bit 1 
       LDCR R1,14       Load value 
       DECT R12         There is a faster way (see below) 
       SBZ  0           Exit clock mode, start decrementer 
      
 
EVENT2 CLR  R12 
       SBO  0           Enter timer mode 
       STCR R2,15       Read current value (plus mode bit)
       SRL  R2,1        Get rid of mode bit
       LDCR R12,15      Clear Clock register, and exit timer mode 
       S    R2,R1       How many cycles were done? 

My apparently unique use is to let the timer run continuously.

When I want to time something, I read the timer with code like EVENT2 , but I don't subtract anything, I simply store the timer value.

Then some time later (less than 349 mS) I just read the timer again, subtract both readings and get the ABS value.

 

Simple and faster than reloading each time.

 

Here is the actual code in RPN assembler. Notice I had to disable interrupts.

CODE: TMR!   (  -- )         \ load TMS9901 timer to max value 3FFF
             W 3FFF LI,      \ load scratch register W with MAXIMUM timer value
             R12 CLR,        \ CRU addr of TMS9901 = 0
             0 LIMI,
             0   SBO,        \ SET bit 0 to 1, Enter timer mode
             R12 INCT,       \ CRU Address of bit 1 = 2 , I'm not kidding
             W 0E LDCR,      \ Load 14 BITs from R1 into timer
             R12  DECT,      \ go back to address 0
             0    SBZ,       \ reset bit 0, Exits clock mode, starts decrementer
             2 LIMI,
             NEXT,           \ 16 bytes
             END-CODE

CODE: TMR@   ( -- n)         \ read the TMS9901 timer
             TOS PUSH,
             R12 CLR,
             0 LIMI,
             0 SBO,          \ SET bit 0 TO 1, ie: Enter timer mode
             TOS 0F STCR,    \ READ TIMER (14 bits plus mode bit) into W
             TOS  1 SRL,     \ Get rid of mode bit
             0 SBZ,          \ SET bit 1 to zero
             2 LIMI,
             NEXT,
             END-CODE
Link to comment
Share on other sites

BTW my source for the code was here:

 

http://www.unige.ch/medecine/nouspikel/ti99/tms9901.htm#Timer

 

The decrementer can be stopped by simply writing a zero to the leaving register, and leaving timer mode."

 

I'm afraid Thierry is wrong here. I used Tursi's program (slightly edited):

 

    DEF START

START LIMI 0
    CLR  R6
    LI   R7,>2000
    LI   R0,>3000   * Start value
    LI   R12,2
    SBO  -1         * Enter clock mode
    LDCR R0,14      * Load clock register

LP  SBO -1          * Enter clock mode
    STCR R5,14      * Read register
    SBZ -1          * Leave clock mode

    C R5,R6         * Keep highest value
    JL J1
    MOV R5,R6
J1  CLR R0
    MOVB R0,@>8C02
    MOVB R0,@>8C02

    MOV R5,R0
    BL @DIG

    MOVB R7,@>8C00

    MOV R6,R0
    BL @DIG

    TB  27          * Mouse button (R12=2) on the Geneve

    JEQ LP
    BLWP @0

* PRINT A HEX VALUE FROM R0
DIG   MOV  R0,R1
      LI   R3,4
DIGL  SRC  R1,12
      MOV  R1,R4
      ANDI R4,>000F
      MOVB @HEX(R4),@>8C00
      DEC  R3
      JNE  DIGL
      RT

HEX TEXT '0123456789ABCDEF'

    END

When you set the start value to 0000, the clock is still counting down. You can try this program for yourself, just leave away the TB 27 check when you run it on a TI-99/4A (it is the left mouse button on the Geneve; when I press it, the program exits).

 

So I turn on the clock mode, load the register with 0, and leave clock mode, and it still counts.

  • Like 1
Link to comment
Share on other sites

 

I'm afraid Thierry is wrong here. I used Tursi's program (slightly edited):

 

When you set the start value to 0000, the clock is still counting down. You can try this program for yourself, just leave away the TB 27 check when you run it on a TI-99/4A (it is the left mouse button on the Geneve; when I press it, the program exits).

 

So I turn on the clock mode, load the register with 0, and leave clock mode, and it still counts.

 

Ok thanks. I have not actually tried loading the timer with 0 using my code. All I knew what that it worked as expected on real iron.

 

I will write a version that lets me load the initial value interactively in Forth so I can play with it.

Link to comment
Share on other sites

So I re-wrote my code and made it more like Tursi's in terms of setting up the CRU address. (it took 1 less instruction :))

 

I re-built Forth and put it on the old machine and sure enough the timer keeps running even when I load it with 0 as you can see in the screen shot.

CODE: TMR!   ( n -- )         \ load TMS9901 timer from stack
             0 LIMI,
             R12 CLR,        \ CRU addr of TMS9901 = 0
             0   SBO,        \ SET bit 0 to 1, Enter timer mode
             R12 INCT,       \ CRU Address of bit 1 = 2 , I'm not kidding
             TOS 0E LDCR,    \ Load 14 BITs from TOS into timer
            -1  SBZ,         \ reset bit 0, Exits clock mode, starts decrementer
             2 LIMI,
             TOS POP,
             NEXT,          
             END-CODE

CODE: TMR@   ( -- n)         \ read the TMS9901 timer
             TOS PUSH,
             0 LIMI,
             R12 2 LI,      \ cru = 1 (honest, 2=1)
            -1 SBO,         \ SET bit 0 TO 1, Enter timer mode
             TOS 0E STCR,   \ READ TIMER (14 bits)
            -1 SBZ,         \ RESET bit 1, exit timer mode
             2 LIMI,
             NEXT,
             END-CODE

post-50750-0-39699100-1558034653_thumb.jpg

  • Like 1
Link to comment
Share on other sites

Usually, TI's specification documents are very precise, but this here is at least ambiguous. It led to several misunderstandings in the same way. Saying that a clock is "enabled" or "disabled" is normally understood as running or stopped. In fact, the formulation glitches already start with the name "clock mode" which could make you think you have to turn on this mode to run the clock, but this mode is rather used to read or write the clock register, while the clock is running in interrupt mode.

 

And there are some more open questions that I will need to check to make sure:

 

- A soft reset (SBZ 15 in clock mode) resets all I/O ports to input. Does it also reset the interrupt mask? (not explicitly stated)

- If you set a port to output mode (e.g. P15), can it trigger the interrupt line with which it shares the pin (/INT7)? (not explicitly stated)

 

Raphael Nabet (the original author of the TI emulation in MESS) assumed that the latter is not possible; but I'll try to test it on my Geneve. The point is that I can rewrite the interrupt handler on the Geneve, as it resides in RAM. On the TI, if the interrupt source is not the VDP, the handler searches the DSRs.

  • Like 3
Link to comment
Share on other sites

Step 1 to generating native 9900 code from Forth.

 

I have ripped up parts of the XFCC99 Forth cross-compiler and kept other parts and created NATIVE99 , the beginning a Forth cross-compiler that generates native code.

It's still very manual. I can compile colon definitions but they don't know how to call themselves yet.

There is no Forth dictionary, that is all kept in the PC more like a C compiler would do. Most of the Forth primitives compile inline at the moment but nothing is compiled into the binary unless you use it in a program, so the programs can be very small.

 

The spoiler has the first program which is displaying all the dirty details.

(but it works as expected) ;)

 

Here is the compiler summary

Program Summary:
  A000  40960 Load address
    90    144 Code size
  A074  41076 boot address
   116    278 Image size
     4      4 Code words
     0      0 Forth words

 

 

\ Native99 test program 1

TARGET-COMPILING
         START.                \ sets a timer
         NEW.                  \ init target memory to FFFF
         ABSOLUTE A000 ORIGIN.
         TI-99.EA5           

[CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH
     INCLUDE CC9900\NATIVE\NCCOLON.FTH

[TC]
VARIABLE X
VARIABLE Y
VARIABLE Z

\ nested sub-routines
: SUB3     Y @  Z ! ;
: SUB2     X @  Y ! SUB3 CALL, ;
: SUB1     X 1+!    SUB2 CALL, ;

PROGRAM: RUN               \ sets the entry address
         8300 WORKSPACE
         FF00 RSTACK
         FEA0 DSTACK
         BEGIN
            SUB1 CALL,
         AGAIN
         BYE
END.

[CC] FILENAME: NCPROG1
     FILENAME$ $SAVE-EA5.     \ FILENAME$ was set by FILENAME:

// copy NCPROG1 cc9900\clssic99\dsk1\

CR ." === COMPILE ENDED PROPERLY ==="

 

 

  • Like 1
Link to comment
Share on other sites

Step 2: Colon definitions call themselves

 

In Step 1 we used a sub-routine definer aliased as colon to create Forth words that had to be explicited called.

 

In step 2 we defined a proper <DOCOL> routine with CREATE DOES>.

CREATE lets us define the compile-time activity.

DOES> lets us define what happens when we run the WORD that CREATE created, if that makes sense. It's like a simple object constructor.

\ Native code COLON COMPILER

: <DOCOL> ( n -- )
            CROSS-ASSEMBLING
            CREATE           \ create the word in compiler dictionary
                  THERE ,    \ remember my address in compiler Forth
                  R11 RPUSH, \ compile TI-99 entry code into target program

           \ runtime:
            DOES>  @  @@ BL, \ fetch my address, branch&link indirect
;

\ define the cross-compiler's TI-99 colon and semi-colon
CROSS-ASSEMBLING
HOST: :       TFORTHWORDS [ FORTH ] 1+! \ count the word for reporting
              <DOCOL>  ;HOST

HOST: ;        RET,  ENDSUB  ;HOST

The spoiler has the new sub-program that no longer needs CALL.

 

 

 

\ Native99 test program 2

TARGET-COMPILING
         START.                \ sets a timer
         NEW.                  \ init target memory to FFFF
         ABSOLUTE A000 ORIGIN.
         TI-99.EA5

[CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH
     INCLUDE CC9900\NATIVE\NCCOLON.FTH

[TC]
VARIABLE X
VARIABLE Y
VARIABLE Z

\ nested sub-routines
: SUB3     Y @  Z ! ;
: SUB2     X @  Y ! SUB3  ;
: SUB1     X 1+!    SUB2  ;

PROGRAM: RUN               \ sets the entry address
         8300 WORKSPACE
         FF00 RSTACK
         FEA0 DSTACK
         BEGIN
            SUB1
         AGAIN
         BYE
END.

 

 

 

  • Like 1
Link to comment
Share on other sites

Native Code Compiler (NCC) Preliminary Performance Comparison

 

It is clear to me now that I will need to make a much more sophisticated compiler to come close to Assembler performance.

Here is a comparison of three Forth compilers and the equivalent written in Forth assembler.

: TEST  FFFF
        BEGIN
           1- DUP
        0= UNTIL
        BYE ;

Forth Assembler that does the same thing.

         TOS PUSH,
         TOS 0FFFF LI,
         BEGIN,
            TOS DEC,
         EQ UNTIL,
         BYE

ITC 10.9 sec
DTC 8.6 sec
NCC 4.4 sec ( all inline primitives)
ASM < 1 sec ;-)

 

Some obvious points to optimize...

  • 1- DUP should be smarter to remove the TOS POP from 1- so the DUP is not needed. This would become TOS DEC,
  • 0= UNTIL should be optimized to a JNE @BEGIN

This is called "peephole" optimization in Forth compilers so that is where the focus has to be to make this closer to ASM code.

 

Edit: I should add that this is the worst case example.

In real programs ITC Forth tends to be 3 to 4 times slower than optimized compilers as seen in the Benchmarking languages thread.

Edited by TheBF
  • Like 1
Link to comment
Share on other sites

I cannot speak to the NCC for fbForth, but, substituting DROP for BYE so I do not need to reload everything while testing, TEST becomes in fbForth (12.7 seconds)

HEX
: TEST  FFFF
   BEGIN
      1- DUP
   0= UNTIL
   DROP  ;

which is ~1.5 seconds slower than (11.2 seconds)

HEX
: TEST2  FFFF DUP
   BEGIN
   WHILE
      1- DUP
   REPEAT
   DROP  ;

which, not needing the stack in fbForth ALC, becomes (~1 second)

HEX
ASM: TEST3
   R0 FFFF LI,
   BEGIN,
      R0 DEC,
   NE WHILE,
   REPEAT,
;ASM

Using BEGIN, ... UNTIL, (as in your code) saves one ALC instruction and is marginally faster (10%?—stopwatch timing):

HEX
ASM: TEST4
   R0 FFFF LI,
   BEGIN,
      R0 DEC,
   EQ UNTIL,
;ASM

...lee

Link to comment
Share on other sites

Thanks Lee,

 

That confirms my results. Yes now I remember, you have the BYE that writes to low RAM. Mine is just 2 instructions.

 

The problem with the NCC concept at the moment is that to compute 0= as a primitive it is a TOS TOS MOV, and then set/reset the TOS register appropriately,

followed by UNTIL which is another TOS TOS MOV, and then drop the TOS and jump back or jump forward, so it is a pile of instructions on the Forth VM.

I am going to see if I can make something smarter for 0= like is done in the Forth Assembler and potentially use that idea for branching as well.

 

It's a challenge to do it really efficiently without leveraging the machines native branching but Forth uses the TOS as the status flag...

 

It might be better to just create a bunch of machine macros like @, !, etc and use ALC. :-)

 

Without the Forth headers in the code however you save about 25% of the space so that leaves room for code bloat due to the VM concept.

 

I am going to take a page from TI-Forth and put the return macro in a register so RET, will be as below. This will save me 2 bytes per colon definition.

Sub-routines (colon defs.) are just called with BL because each colon definition begins by pushing R11 onto the rstack.

CODE RET,   *R10 B ,

\ R10 will contain this code located in scratchpad RAM
     *RP+ R11 MOV,
         *R11 B,

Ultimately if I can get it in reasonable shape it would be fun to generate a working Forth kernel that is all native code.

It will be bigger than 8K I am sure, but maybe only 50% if make it smart enough.

 

In the shorter term I want to make it complete enough to compile some real program and see how it all works and how big it is.

I should be able to do the Sieve benchmark in a little while.

(He said over-optimistically)

 

 

Anyway its a good education. Thanks for following the progress.

 

How's that knee?

 

 

 

 

 

 

  • Like 1
Link to comment
Share on other sites

Native Code Compiler (NCC) Preliminary Performance Comparison

         SP DECT,
         R4 SP+ MOV,   ( PUSH, macro)
         R4 0FFFF LI,
         BEGIN,
            R4 DEC,
         EQ UNTIL,
         BYE

 

BTW for clarity my assembler code is actually this, because TOS is just an alias for R4 in my Forth assembler and PUSH, is a 2 instruction macro.

So our ALC tests are functionally the same.

 

When I first started this project I was so confused I used every trick I could to simplify the code. ;-)

  • Like 1
Link to comment
Share on other sites

NCC Program 3

 

Here is something that I didn't expect. In native code the Forth colon is just is an Assembly language sub-routine.

I was looking at the docs for MeCrisp Forth which is a native code Forth for a number of processors. I was shocked to see the colon used with Assembly language inside.

Guess what? It works in my crude compiler as well.

 

This program compiles to 112 bytes. :-D

\ Native99 test program 3

CROSS-COMPILING
\ Compiler pre-amble
         START.                \ sets a timer
         NEW.                  \ init target memory to FFFF
         A000 ORIGIN.
         TI-99.EA5

[CC] INCLUDE CC9900\NATIVE\NCFORTH.FTH
[CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH
[CC] INCLUDE CC9900\NATIVE\NCCOLON.FTH

[CC] HEX

CROSS-ASSEMBLING
FFFF CONSTANT >FFFF

: FTEST
       >FFFF
       BEGIN
         1- DUP
      0= UNTIL
;

: ATEST
        TOS PUSH,
        TOS FFFF LI,
        BEGIN,
          TOS DEC,
        EQ UNTIL,
;

PROGRAM: RUN               \ sets the entry address
         8300 WORKSPACE
         FF00 RSTACK
         FEA0 DSTACK
         FTEST
         ATEST
         BYE
END.
Edited by TheBF
  • Like 1
Link to comment
Share on other sites

NCC Program 4 Keyboard

 

This is become very interesting. :)

 

I stole code from CAMEL99 Forth and created Forth KEY very quickly.

\ Native99 test program 4  Keyboard interface

CROSS-COMPILING
\ Compiler pre-amble
         START.                \ sets a timer
         NEW.                  \ init target memory to FFFF
         A000 ORIGIN.
         TI-99.EA5

\ first build the compiler
[CC] INCLUDE CC9900\NATIVE\NCFORTH.FTH
[CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH
[cc] INCLUDE CC9900\NATIVE\NCCOLON.FTH

CROSS-ASSEMBLING

: KEY?  ( -- char | 0)
        TOS PUSH,
        TOS CLR,            \ TOS is output
        0 LIMI,
        83E0 LWPI,          \ switch to GPL workspace
        000E @@ BL,         \ call ROM keyboard scanning routine
        8300 LWPI,          \ return to Forth's workspace , interrupts are restored
        2 LIMI,
        837C @@ R0 MOVB,    \ read GPL status byte (=2000 if key pressed)
        NE IF,
            8374 @@ TOS MOV, \ read the key into TOS (R4)
        ENDIF,
;

: KEY   ( -- char)
        BEGIN
          KEY?
          ?DUP
        UNTIL
;

PROGRAM: RUN               \ set the entry address
         8300 WORKSPACE
         FF00 RSTACK
         FEA0 DSTACK

         KEY DROP          \ wait for key & drop it
         BYE               \ return to boot screen
END.

And here is the compiled code from CLASSIC99 with my comments

EDIT: Found a bug with ?DUP

A05C  02E0  lwpi >8300         * workspace
A060  0207  li   R7,>ff00      * return stack pointer
A064  0206  li   R6,>fea0      * data stack pointer
A068  06A0  bl   @>a040        * call RUN
A040  0647  dect R7            * enter sub-routine
A042  C5CB  mov  R11,*R7
A044  06A0  bl   @>a014        * call "key?"          
A048  C104  mov  R4,R4         * ?DUP (inline)        
A04A  1601  jeq  >a050       *** FIXED THIS JUMP         
A04C  0646  dect R6            * DUP (inline)        
A04E  C584  mov  R4,*R6        *           
A050  C104  mov  R4,R4         * UNTIL (inline)        
A052  1602  jne  >a058                 
A054  C136  mov  *R6+,R4               
A056  10F6  jmp  >a044                 
A058  C2F7  mov  *R7+,R11      * ';' (inline)         
A05A  045B  b    *R11                 
Edited by TheBF
  • Like 1
Link to comment
Share on other sites

I do not suppose there is any easy way to remove superfluous line A050—perhaps through “peephole” optimizing?

 

...lee

That's the challenge for sure. At the moment this is a very naive compiler.

 

My strategy is to get a lot of working before attempting to get clever. I am working at my personal "bleeding edge" as it is. :-)

Link to comment
Share on other sites

I do not suppose there is any easy way to remove superfluous line A050—perhaps through “peephole” optimizing?

 

...lee

 

I just tried this and it failed but... I removed MOV R4,R4 from the 0= code and it works. This is because the DUP is operating on the EQ flag.

It also goes faster. 3.99 seconds by the armstrong method.

So your idea was a good one. Thanks!

HOST: 0=    ( n -- ? )   
\      TOS TOS MOV,
        3  JEQ,
        TOS CLR,
        2  JMP,
        TOS SETO, ;HOST IMMEDIATE
HOST: UNTIL    ( n --)
             TOS TOS MOV,  \ test tos=0
             3 JNE,        \ if tos=0
             TOS POP,      \ drop n
             BACK JMP,     \ loop
             TOS POP,      \ elseif tos<>0, drop
             ;HOST IMMEDIATE

I just stole some stuff from the TI-FORTH assembler that we all use to create BEGIN WHILE REPEAT and it seems to work.

Removed compile time error detection for now to keep the code easier to understand.

i might have to add a DROP in here some where.

EDIT: Yes IF needed a DROP (TOS POP,) and so did THEN, because you need to drop the TOS if you take the jump AND if you don't take the jump just like UNTIL above.

HOWEVER, If you make a word called DUPWHILE the decrementing WHILE loop executes in 1 second vs 4 seconds. Worth adding that to the peephole optimizer.

\ Branch calculators taken from TI-FORTH Assembler
 HOST: AHEAD   ( -- addr)  THERE 2-      ;HOST
 HOST: RESOLVE ( addr -- ) THERE OVER - 2- 2/ SWAP 1+ TC!   ;HOST

\ here we use parts of the Assembler directly
 HOST: IF    ( n --)  NE CJMP AHEAD  TOS POP,  ;HOST IMMEDIATE
 HOST: ELSE  ( -- )   0 JMP,  RESOLVE          ;HOST IMMEDIATE
 HOST: THEN  ( addr -- )  RESOLVE   TOS POP,   ;HOST IMMEDIATE

HOST: WHILE  ( ) POSTPONE IF 2+  ;HOST IMMEDIATE
HOST: REPEAT
          >R     POSTPONE AGAIN
          R>  2- POSTPONE THEN
          ;HOST  IMMEDIATE

Edited by TheBF
  • Like 1
Link to comment
Share on other sites

I am having so much fun with this that I woke up early.

So once I figured out how to do WHILE loops it was simple to make a Chuck Moore style FOR NEXT loop.

When Chuck starting building Forth architecture CPUs he found that it was always faster to just run a down counter for a loop. He stopped using DO/LOOP.
He made FOR NEXT which in it's simplest form loads a number somewhere and NEXT decrements the number and loops WHILE the number<>0.
We do this all the time in Assembly Language.
This version uses the return stack to hold the number, it could make a little faster by using a spare register.

So with the WHILE structure as an example, here is all it took to add FOR/NEXT

 

EDIT: The empty FOR/NEXT loop with 64K iterations runs in 1 second. Less if we put the loop counter in a register.

HOST: FOR   TOS RPUSH,   
            TOS POP,    \ refill data stack cache register
            THERE       \ leave compiler's current working address on PC Forth stack
;HOST  IMMEDIATE

HOST: NEXT  *RP DEC,
             NE CJMP AHEAD 2+  \ same as WHILE but no need to DROP data stack
            >R BACK JMP,       \ loop not finish, jump back to THERE
             R> 2- RESOLVE     \ compute the address need by AHEAD and put it in the code
             RP INCT,          \ drop the index from the return stack
;HOST IMMEDIATE

Here is how it is used (compiler pre-amble removed)

CROSS-ASSEMBLING
0 CONSTANT 0
1 CONSTANT 1
2 CONSTANT 2
FFFF CONSTANT >FFFF

VARIABLE X

PROGRAM: RUN               \ set the entry address
         8300 WORKSPACE
         FFFF RSTACK
         FF00 DSTACK
         
         FFFF FOR
            X 1+!
         NEXT
         BYE
END.
Edited by TheBF
  • Like 1
Link to comment
Share on other sites

It was trivial to use R9 as the loop counter so I just did it. R9 is the IP register in threaded Camel99 Forth but we don't need that anymore. :-)

 

By pushing R9 onto the return stack when we enter FOR and RPOPing it when we leave FOR/NEXT is nestable.

A 64K loop nested 10 times takes about 9 seconds so each 64K loop is 900mS. Not to shabby!

HOST: FOR   R9  RPUSH,    \ R0 will be the loop counter
            TOS R9 MOV,
            TOS POP,
            THERE
;HOST  IMMEDIATE

HOST: NEXT   R9 DEC,
             NE CJMP AHEAD 2+  \ while *RP<>0
             >R BACK JMP,
             R> 2- RESOLVE
             R9 RPOP,
;HOST IMMEDIATE


  • Like 1
Link to comment
Share on other sites

NATIVE99 CROSS-COMPILER is starting to work like Forth

 

After I got the VDP code moved over from Camel99 Forth I put a cheap and dirty set of screen routines together and a string compiler word.

Armed with that and the FOR/NEXT loop I made a little demo. I have a bug in my foreground color setting which I need to find.

But this shows me how much faster the is than threaded Forth. Wow!

 

Also I have had to relearn how to code because now I can intermix ALC and Forth.

If you make a colon definition it looks like Forth but it's really a machine code sub-routine that calls itself when invoked.

(And it is nestable)

So if you use it in a bunch of Assembly language you don't have to BL to it! It does it by itself. :-o

 

Example:

: RMODE       W  8C02  LI,        \ VDP port address into working register
              0 LIMI,             \ enter a critical section
              R0 SWPB,            \ R0= VDP-adr we are using. Set up 1st byte to send
              R0 *W MOVB,         \ send low byte of vdp ram write address
              R0 SWPB,
              R0 *W MOVB,         \ send high byte of vdp ram write address
              2 LIMI,             \ leave the critical section
;

: WMODE      R0 4000  ORI,       \ set control bits to write mode
             RMODE               \ we can mix Forth and Assembler :-)
;

The spoiler shows the code for this text demo and the MP4 shows it running. I also compiled the same code on (using the same simple VDP driver) on CAMEL99 Forth for a comparison.

 

The native99 version + libraries compiles to 950 bytes. ( Because the libraries are source, I could remove some things if I really wanted a smaller final program.)

 

The threaded Forth version compiles to 252 bytes, but there is an 8K byte compiler/interpreter underneath it.

 

I have not yet made the COLON compiler handle literal numbers properly meaning it needs to compile literals into the TOS register with LI when they are encountered in a program.

That's why all the constants are defined. I wanted to have as many of the core routines running properly before I mess with that.

 

 

 

\ Native99 test program A  VDP Text speed

CROSS-COMPILING
\ Compiler pre-amble
         START.                \ sets a timer
         NEW.                  \ init target memory to FFFF
         A000 ORIGIN.
         TI-99.EA5

\ first build the Native code compiler words
[CC] INCLUDE CC9900\NATIVE\NCFORTH.FTH
\ add inline primitives to the compiler
[CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH
\ create the colon and semi-colon words
[cc] INCLUDE CC9900\NATIVE\NCCOLON.FTH

\ Now we have forth, we can pull in some libraries
[cc] INCLUDE CC9900\NATIVE\RUNTIME.FTH
[CC] INCLUDE CC9900\NATIVE\LIB.NCC\KEY.FTH
[CC] INCLUDE CC9900\NATIVE\LIB.NCC\VDP9918.FTH


[CC] HEX
CROSS-ASSEMBLING
 0 CONSTANT 0
 1 CONSTANT 1
 2 CONSTANT 2
 3 CONSTANT 3
 4 CONSTANT 4
 5 CONSTANT 5
 7 CONSTANT #7
0E CONSTANT >1E
20 CONSTANT BL
41 CONSTANT 'A'
28 CONSTANT #40
3C0 CONSTANT C/SCR
FFFF CONSTANT >FFFF

CREATE A$  ," Forth Native Code "
CREATE B$  ," is pretty fast... "
VARIABLE VROW
VARIABLE VCOL

: VPOS  ( -- vaddr)   VROW @ #40 *  VCOL @ + ;
: EMIT  ( char --)   VPOS VC!  VCOL 1+! ;
: TYPE  ( addr len --) FOR  DUP C@ EMIT  1+ NEXT DROP ;
: CR    ( -- )  VROW 1+!  VCOL OFF ;
: PRINT ( $addr -- )  COUNT TYPE ;

: PAGE  ( -- )
        VCOL OFF  VROW OFF
        0 C/SCR BL VFILL ;

: DELAY   >FFFF FOR NEXT ;
: TIMES     FOR  DUP PRINT NEXT DROP ;

PROGRAM: RUN
         8300 WORKSPACE  \ setup Forth VM
         FFFF RSTACK
         FF00 DSTACK

         >1E #7 VWTR
         BEGIN
           PAGE  A$ #40 TIMES  DELAY
           PAGE  B$ #40 TIMES  DELAY
           KEY?
         UNTIL
         BYE
END.

 

 

 

 

 

Native99 Forth.mp4

Threaded Forth.mp4

Edited by TheBF
  • Like 1
Link to comment
Share on other sites

STOP THE PRESSES

 

I have been struggling with how to make a native code Forth compiler take advantage of the memory to memory architecture of the 9900.

 

Typically to deal with getting the value of a variable, for example, Forth will move the address from the stack to a register and then "fetch" the value from that address and put the value onto the stack.

To assign a value to a variable the address must be moved into a register and the value on the stack moved to the address.

 

However the 9900 can do things like:

X     DATA 17
Y     DATA 0

      MOV @X,@Y

I had played with a Native code Forth compiler for DOS 30 years ago by a guy named Tom Almy and it was screaming fast. I found a paper by Tom from 1986 and my eyes have been opened.

 

It turns out that if you make all data declarations put their data (addresses or literal numbers) onto a "LITERAL stack" you now have access these numbers as literal numbers or addresses and therefore you can compile code that takes advantage of the CPU's best features.

 

So in the example above it would look like this in Forth ( I have created a new operator called := as in the Pascal 'assignment' operator for the example)

VARIABLE X
VARIABLE Y

   X Y :=

The VARIABLE declarations do the same as a label and DATA directive in Assembler. Nothing more.

 

When we invoke the variables in our code they don't compile any code either. They just push their addresses onto the literal stack (LSTACK).

When we invoke the := operator it grabs those addresses from the LSTACK and compiles them in the optimal way to move data from X into Y

In the case of the 9900 we use memory to memory move. The Forth data stack is not required!

How cool is that?

 

The resultant Assembly code looks a little odd (OK a lot) but it is much more efficient than all the stack juggling.

 

I have a lot more testing to do, but this method has made my day.

 

Edit: Replaced experimental code with final code

HOST: !    ( n -- )  ( l:  addr -- )
           TOS POPARG @@ MOV,
           TOS POP,
;HOST

HOST: @  ( -- n)  ( l: addr -- )
          ?TOSPUSH,
          POPARG @@ TOS MOV,
;HOST

HOST: C@  ( -- char) ( l:addr --)
          ?TOSPUSH,
          POPARG @@ TOS MOVB,
          TOS 8 SRL,
;HOST

HOST: +!  ( l:n addr --)
          TOS POPARG @@ ADD,
          TOS POP,
;HOST

HOST: 1+!    ( l: addr -- )  POPARG  @@ INC,  ;HOST
HOST: 2+!    ( l: addr -- )  POPARG  @@ INCT, ;HOST
HOST: 1-!    ( l: addr -- )  POPARG  @@ DEC,  ;HOST
HOST: 2-!    ( l: addr -- )  POPARG  @@ DECT, ;HOST
HOST: ON     ( L:adr -- )    POPARG  @@ SETO, ;HOST
HOST: OFF    ( L:adr -- )    POPARG  @@ CLR,  ;HOST
HOST: :=    ( L: src dst -- ) POPARG @@  POPARG @@  2SWAP MOV,  ;HOST
Edited by TheBF
  • Like 1
Link to comment
Share on other sites

After a lot of fiddling I think I have it working. One bug left with initializing the Literal stack...

Here is the Forth program: ( I have added a report control and PROGRAM: now sets the TI-99 filename in the program image.

CROSS-ASSEMBLING
         START.                \ sets a timer
         NEW.                  \ init target memory to FFFF
         A000 ORIGIN.
         TI-99.EA5
         REPORT ON

VARIABLE X 

PROGRAM: NCPROG1              \ we are interpreting now
         8300 WORKSPACE       \ make the Forth VM
         FF00 RSTACK
         FFA0 DSTACK
         X 1+!
         X 1-!
         X ON
         X OFF
         BYE
END.

   A006  02E0  lwpi >8300                  (18)
         8300
   A00A  0207  li   R7,>ff00               (20)
         FF00
   A00E  0206  li   R6,>ffa0               (20)
         FFA0
>  A012  05A0  inc  @>a004                
         A004
   A016  0620  dec  @>a004                
         A004
   A01A  0720  seto @>a004                
         A004
   A01E  04E0  clr  @>a004                
         A004
   A022  0300  limi >0000                 
         0000
   A026  0420  blwp @>0000          


Here is the output code:

Link to comment
Share on other sites

PUSH/POP Optimization :-o

 

So Tom Almy's paper mentioned looking backwards in the compiled program to find optimizations.

The 9900 is kind of "verbose" when it comes to generating stack code since we have to make the stacks with normal registers.

 

Since I cache the top of stack (TOS) a register for code efficiency it can happen that at the end a routine like a store to a variable, I have to "refill" the TOS register. (POP it). That is one instruction.

If the next thing that happens is I need to load the TOS register with a new value, like a literal number for example, then I immediately have to PUSH the TOS back onto the stack. That is two instructions!

So that is 6 bytes and three needless instructions. :_(

 

I use a MACRO called PUSH, because so many routines have a line "TOS PUSH," in them.

I added this code to the compiler and replaced all the TOS PUSH lines with a smart ?TOSPUSH and voila! push/pop optimization appears when you add OPTIMIZE ON to the program.

Edit: Replaced with final code


VARIABLE OPTIMIZE      \ control switch variable

\ =======================================================================
\ ?TOSPUSH gives us simple push/pop optimization for the TOS register R4
\ It saves 3 instructions when it detects the condition so very valuable

HEX
C136 CONSTANT TOSPOP   \ machine code for *SP+ R4 MOV,

\ look back in the compiled code 1 cell and get the data
: PREVINSTR   ( -- n)  THERE CELL-  T@ ;

: ?TOSPUSH,
        OPTIMIZE @
        IF
             PREVINSTR TOSPOP =
             IF   1 CELLS NEGATE TALLOT  \ erase the TOS POP, ie: de-allocate it
             ELSE TOS PUSH,              \ too bad. We gotta do it
             THEN
        ELSE
            TOS PUSH,                    \ normal operation
        THEN ;

Edited by TheBF
Link to comment
Share on other sites

My Brain is on Fire

 

Making a native code compiler is hard but fun.

 

After taking the idea of "literal stack from Thomas Almy's paper * "Compiling Forth for Performance" I built up a handy dandy set of routines to manage this new thing.

It has a word LDEPTH that lets me know how "deep" the literal stack is, in other words how many parameters are sitting on it at the moment.

 

Tom was very light on how to use this stack exactly, but so far I can know about my input parameters at compile time. (While the program is being read)

This adds some surprising features and also some complexity.

 

I have been playing with the Forth word store (!) . It needs a value and an address to put it in.

Typically these two arguments are sitting on the Forth data stack for store to grab them and put the value into the address.

 

I may be over complicating things but I think I need to handle 3 possible cases now with this "literal stack" while compiling:

  1. There is nothing on the literal stack because some other code left my arguments on the data stack. ie: Normal Forth
  2. There is a value on the data stack, but some code has put the address on the literal stack,
  3. Both arguments are on the literal stack.

So with that assumption here is what it takes to compile the Forth ! operator: :)

 

Edit: Updated code.

I quickly realized there could be other stuff on the literal stack from un-resolved operations, so now if there is more than 1 item, we go to the default method 3.

HOST: !    ( n -- )  ( l: n addr -- )
          LDEPTH
          CASE
\ ** Here we have no args at compile time. It's like regular Forth
             0 OF  *SP+ *TOS MOV,
                    TOS POP,          \ refill TOS
                                      ENDOF
\ ** Here we know the address at compile time, value is on Forth stack
             1 OF  TOS POPARG @@ MOV,
                   TOS POP,         \ refill TOS
                                      ENDOF
\ DEFAULT: ** here we know both arguments at compile time.
\             No need to bother the stack at all
                 POPARG  >R       \ address -> Rstack
                 R0  POPARG LI,   \ push value to TOS
                 R0  R> @@ MOV,   \ store tos at address
         ENDCASE
;HOST

The very cool thing is in case #3. Here we know everything at compile time and therefore do not need to play with the stack. I used R0 to do the job.

I suspect there will be other cases like this. Tom mentions that array address calculations can be done at compile time for example.

 

This means the compiler will use the CASE statement heavily slowing down the compilation speed, but at the moment it's happening in DOS box on the PC.

 

 

* Journal of Forth Application and Research Volume 4, Number 3

Edited by TheBF
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...