Machine Forth OMG

+TheBF · November 16, 2022

I have been noodling on how best to add conventional comparison operations to native code Forth, efficiently, for way too many days now.

(I was thinking that dementia was starting to set in...)

The mis-match is the status register for the CPU versus the top of the stack flags in Forth.

In frustration I cheated and took a look at Mecrisp Forth for the TI MSP430. There I found that even Mathias has to use extra instructions.

The MSP430 has SUBC instruction which he uses a fair bit, but we don't have that.

I did see that he defined the masks for the status bits.

He also SETs the bits in the status register. We can't do that either.

I think I finally have a use for the STST instruction.

Here is where I am going for these comparisons.

These two seem to work in my test environment.

(I am using ASMFORTH for testing because it is interactive, and I have added conventional IF and UNTIL that drop their argument)

I can't think of a way to make these smaller/faster.

\ status register masks
HEX
8000 CONSTANT L>
4000 CONSTANT A>
2000 CONSTANT EQ>
1000 CONSTANT C>

: 0= ( n -- ?)
     TOS 0 CI,
     TOS STST,
     TOS A> ANDI,  \ and A> flag
;

: =  ( n n -- ?)
     *SP+ TOS CMP,  \ set's carry flag if 0
      TOS STST,
      TOS A> ANDI,  \ and C> flag
;

Where the Test code is:

CODE MFTEST2B  ( 3.8 seconds)
	FFFF #
	BEGIN
	   1-
	   DUP 
	0= UNTIL
	DROP
;CODE

Where UNTIL is defined as:

: UNTIL  
   TOS DEC,               
   TOS POP,               \ refill TOS register
   CS> HERE  0 JNC, <BACK \ CS> pops address of BEGIN from control stack
                          \ <BACK computes difference and modifies the JNC instruction
;

Way too long but I think I can work with this.

Open to comments from the Assembly language experts.

Here is the output code from the loop.

Spoiler

DE14  0646  dect R6                   
DE16  C584  mov  R4,*R6               
DE18  0204  li   R4,>ffff      \ push FFFF onto the stack 
\ BEGIN
DE1C  0604  dec  R4            \ 1- 
DE1E  0646  dect R6           
DE20  C584  mov  R4,*R6       
DE22  0284  ci   R4,>0000      \ 0=
DE26  02C4  stst R4                    
DE28  0244  andi R4,>2000              

DE2C  0604  dec  R4            \ UNTIL
DE2E  C136  mov  *R6+,R4       \ drop       
DE30  17F5  jnc  >de1c         \ jump back       
DE32  C136  mov  *R6+,R4       \ drop  

DE34  045A  b    *R10          \ return to Forth

+Lee Stewart · November 16, 2022

2 hours ago, TheBF said:

He also SETs the bits in the status register. We can't do that either.

Actually, you can—though what you might need to do is likely slower than your solution. You can BLWP to a small routine in scratchpad RAM with new workspace also in scratchpad RAM, by passing a byte in R0 that contains the status bits to change. Change those bits in R15 and return with RTWP. R15 will be copied to the status register upon return. With judicious use of the workspace, you could make it a pretty small routine. I mean by “judicious use of the workspace” that for one register, use R12; for 2, R11 and R12, etc. That way you will only use the end of the workspace and can overlay working code with the unused registers without fear. Obviously, 3 registers (R13, R14, R15) are used by BLWP/RTWP, so you must work outside of those, but you get the picture.

...lee

apersson850 · November 17, 2022

Another way is to store the address of the currently used workspace in R13, the desired content of the status register in R15 and the address after the next instruction in R14, then execute RTWP. This will be an "inline" return, since the workspace will not change, the branch will be to the instruction you would execute anyway and the status register will change.

In a standard assembly program it works fine. May be a bit tougher to include in Forth.

For efficiency you let R13 be what it is the whole time and only change R14 if you really have to execute this in more than one place. That's the price to pay for this way to create an LST instruction.

The LWP instruction can be created in the same way.

Both LWP and LST are present in the TMS 9995, so it's obvious that at TI, they also realized that these instructions could be handy to have.

+TheBF · November 17, 2022

2 hours ago, apersson850 said:

Another way is to store the address of the currently used workspace in R13, the desired content of the status register in R15 and the address after the next instruction in R14, then execute RTWP. This will be an "inline" return, since the workspace will not change, the branch will be to the instruction you would execute anyway and the status register will change.

In a standard assembly program it works fine. May be a bit tougher to include in Forth.

For efficiency you let R13 be what it is the whole time and only change R14 if you really have to execute this in more than one place. That's the price to pay for this way to create an LST instruction.

The LWP instruction can be created in the same way.

Both LWP and LST are present in the TMS 9995, so it's obvious that at TI, they also realized that these instructions could be handy to have.

I completely forgot about that way of getting at the status register. Thanks to both of you.

By the way I am working in Assembler here. It's just RPN Assembler, so there are no limitations that way.

The challenge is how best to translate Forth source code to native 9900 code which is what all modern Forth systems do these days.

I am working way above my pay grade but it's a good challenge.

If you don't mind reading German code comments here is a link to Mecrisp Forth MSP430 which about as close as you get to 9900 these days.

Mecrisp generates native code interactively. With constant folding and other optimizations. Mathias Koch is a wizard to a guy like me.

Oops: Wrong link was posted. Fixed.

mecrisp/mecrisp-source/common at master · kevinfish/mecrisp · GitHub

Edited November 17, 2022 by TheBF
bad link

apersson850 · November 17, 2022

Also remember that when you do a BLWP, you can let the branch vector contain the same workspace as you are coming from. It will then be like BL - you can only do it once (not nested).

The only penalty otherwise is that R13-R15 of the calling workspace will be used to contain return data, not just the called workspace. And the reason is of course that they are effectively the same.

+TheBF · November 20, 2022

So all this time playing with the status register and with coaching from @Lee Stewart and @apersson850 I went back to the Machine Forth compiler.

Chuck's Machine Forth only branched on the carry flag of his CPU as I understand for his code. (or carry was set on the stack data . Not sure of the hardware details)

Now in MachForth I am back to branching on true or false with these macros.

: IF     ( -- $$)   THERE 0 JNC, ; 
: -IF    ( -- $$)   THERE 0 JOC, ;

Then for consistency, I just re-use the branching words with <BACK to make loops.

<BACK knows how to resolve a backwards branch and modifies the jump instruction offset field (the high byte).

: BEGIN    THERE ;  IMMEDIATE
: WHILE    IF  SWAP ;
: -WHILE   -IF SWAP ;

: UNTIL   ( addr --)  IF <BACK ;
: -UNTIL  ( addr --) -IF <BACK ;
: AGAIN   ( addr --) THERE  0 JMP, <BACK ;
: REPEAT  ( addr -- ) AGAIN THEN ;

I also now understand why Chuck abandoned making IF and WHILE and UNTIL consume their arguments when he started coding for his own hardware.

The extra trouble to do that slows the empty loops down by ~3X.

Machine Forth while loop runs in .9 seconds

COMPILER  \ name space that has compiler directives
   NEW.
   HEX 2000 ORIGIN.

TARGET
PROG: DEMO1  \ .9  seconds 
    FFFF
    BEGIN
      1-
    WHILE
    REPEAT
    DROP

    NEXT,         \ Return to Forth console
END.

\ Usage from Forth command line:
\ DEMO1 RUN

By comparison creating the equivalent loop in ASMForth, which uses the normal Forth convention of consuming arguments, but compiles to native code, the loop runs in 2.8 seconds

HEX
CODE DOWHILE
      FFFF #
      BEGIN
        1- DUP
      WHILE
      REPEAT
      DROP
;CODE

+TheBF · November 20, 2022

One big breakthrough on MachForth comes from the creation of SUPERSAVE.

The MachForth compiler is now a 20k dictionary because I include the Assembler, dev. tools like DUMP and ELAPSE for timing code.

It also uses WORDLISTS to manage Forth words that are the same but do VERY different things to compile native code versus threaded code.

It also reserves the entire 8K of low RAM for the program image, so our little machine is getting pretty full.

I recompiled a version of MachForth under the SuperCart version of Camel99 and now there is room to write a serious program in MachForth.

apersson850 · November 21, 2022

Which means that you kind of accomplish the same thing as the p-system, which could't have existed properly in a low-memory device like the TI 99/4A unless there was the p-code card, which acts like a 48 Kbyte ROM-disk with the operating system plus a 12 Kbyte code storage for the p-machine interpreter and BIOS. With later RAM solutions similar things can be accomplished with "standard" hardware.

+TheBF · November 21, 2022

3 hours ago, apersson850 said:

Which means that you kind of accomplish the same thing as the p-system, which could't have existed properly in a low-memory device like the TI 99/4A unless there was the p-code card, which acts like a 48 Kbyte ROM-disk with the operating system plus a 12 Kbyte code storage for the p-machine interpreter and BIOS. With later RAM solutions similar things can be accomplished with "standard" hardware.

I forgot that the P-code card had 48K of storage. That is nice.

I am pretty much walking that path but using the Cartridge memory space and the SAMS card.

P-system was there first.

apersson850 · November 22, 2022

The only difference is that the p-code card uses DSR memory space (4000H-5FFFH). Because it's a PEB card, of course. 12 K ROM squeezed into 8 K address space, by bank-switching the upper 4 K. The 48 K ROM-disk is really a GROM-disk, so it's accessed via the normal system of address and data ports. They just reside in DSR space instead in this case, and only if the p-code card is enabled.

It's probably because TI choose to use GROM chips for this code repository that some people have got the idea that GPL is involved in the p-system in some way.

Anyway, this means that the p-system doesn't use the cartridge space at all. If you have memory products that reside there, they can be used by your program without messing up anything else in the p-system.

Edited November 22, 2022 by apersson850

+TheBF · March 3, 2023

I have been "pushing a rope" for two years with Machine Forth

Here is part of the intro paragraph for the manual I am writing for my latest flavour of machine Forth called ASMForthII.

The author has spent a considerable amount of time adapting Chuck Moore’s Machine Forth concept to the TMS9900
and the results are good but not great. The hypothesis is that the lack-lustre performance of Machine Forth on
the 9900 is due to the hardware mis-match between the 9900 and the F21 Forth CPU.  Machine Forth is actually 
the Assembly language for Chuck’s F21. 
The conclusion is that like the original machine Forth, any machine Forth must leverage underlying hardware
to be efficient.

So all that to say that a "machine Forth" for TI-99 needs to use the registers and other features of the machine.

The end result is a very low level compiler with a near one-to-one correspondence with the machine instructions

plus hi-level branching and looping, the convenience of the HOST Forth system for console I/O and debugging and disk.

It was @Reciprocating Bill and @lucien2 who showed the sieve benchmark in Assembler and with GCC that showed

me what was really possible.

Current Feature List

Forth-like syntax (fetch/store architecture) (OK not a feature for everybody)
8 free registers, but you can also use the DATA stack for extra parameters or to push registers as needed
Seamlessly interleaves with Forth RPN Assembly language mnemonics
DATA stack with PUSH POP pseudo-instructions
Return stack with RPUSH RPOP pseudo-instructions
Structured branching and looping
- Fast Nest-able FOR NEXT down-counting loop
Nest-able sub-routines with : ;
Tail-call optimization under programmer control with -; operator
Standard Forth CODE ;CODE words are directly callable from Forth. They can call ASMFORTH subroutines.
Compiler is an E/A5 program

Future

Compile stand alone E/A5 programs
Create a STDIO library for VDP and keyboard
Create a DISKIO library

Here is the Sieve program using Forth for data and console I/O and ASMForth for the computation as the language looks now.

The ASMForth section is a translation of the business end of @Reciprocating Bill 's program.

It compiles in 3.8 seconds and runs in ~~10 seconds.~~

Edit: Removed one instruction and used #CMP. 9.8 seconds

Spoiler

\ SIEVE in ASMFORTH for Camel99 Forth                 Feb 2023 Brian Fox
\ based on code by @Reciprocating Bill atariage.com 

\ Original notes by BIll.
\ * SIEVE OF ERATOSTHENES ------------------------------------------
\ * WSM 4/2022
\ * TMS9900 assembly adapted from BYTE magazine 9/81 and 1/83 issues
\ * 10 iterations 6.4 seconds on 16-bit console
\ * ~10 seconds on stock console

\ * ASMForth version runs in 10 seconds 
HOST 
DECIMAL 8190 CONSTANT SIZE
HEX     2000 CONSTANT FLAGS   \ array in Low RAM 

ASMFORTH 
: FILLW ( addr size cell --) \ nestable sub-routine 
    R0 POP            \ size 
    R1 POP            \ base of array
    BEGIN
        TOS R1 @+ !   \ write ones to FLAGS
        R0 2-
    NC UNTIL  
    DROP 
; 

HEX
CODE DO-PRIME ( -- n)  
  FLAGS # SIZE # 0101 # FILLW
\ inits 
  R0 OFF        \ clear loop index 
  R3 OFF        \ 0 constant
  FLAGS R5 #!   \ array base address 
  0 #           \ counter on top of Forth stack 

  SIZE # FOR 
    R5 @+ R3 CMPB        \ FLAGS C@+ byte-compared to R3 (ie: 0)
    <> IF                \ not equal to zero ? 
          R0 R1 !        \ I -> R1
          R1 2*  R1 3 #+ \ R1 2* 3+
          R0 R2 !        \ I -> R2 ( R2 is K index) 
          R1 R2 +        \ PRIME K +! 
          BEGIN  
            R2 SIZE #CMP \ K SIZE compare 
          < WHILE  
            R3 FLAGS (R2) C! \ reset byte FLAGS(R2)
            R1 R2 +      \ PRIME K +! 
          REPEAT 
          TOS 1+         \ increment count of primes
    THEN 
    R0 1+                \ bump index register
  NEXT 
;CODE  

HOST   ( Switch back to Host Forth )
DECIMAL 
: PRIMES ( -- )
   PAGE ."  10 Iterations"
   10 0 DO   DO-PRIME  CR . ." primes"  LOOP
   CR ." Done!"
;

+TheBF · March 3, 2023

Effect of Tail Call Optimization in ASMForth.

I was pretty simple to port this nesting demo that I found on the internet.

The results are in the source code for other computers and TI-99 with Camel99 , TurboForth and ASMForth.

I figured out how to do tail-call optimization in threaded Forth and it has a similar improvement to native code (BL) subroutines.

ASMForth is pushing R11 onto the return on entry and popping it back before doing RT.

\ For  ASMForth II 

\ Amstrad 6128+ Z80A 4Mhz	Uniforth  Nesting 1Mil  3:26
\ ZX Spectrum 2+  FIG-Forth 1.1a	  Nesting 1Mil  3:15
\ C64 (normal)	  Forth64	          Nesting 1Mil	6:20
\ PDP11           FIG-Forth 1.3        Nesting 1Mil  0:49

\ TI99            Camel99 Forth       Nesting 1Mil  2:30.7
\                 w/tail-call optimization          1.54.6

\                 TurboForth 1.21     Nesting 1Mil  2:29

\                 ASMForth II         Nesting 1Mil  1:28.7
\                 W/tail-call optimization          0:54.23

HOST INCLUDE DSK1.ELAPSE 

ASMFORTH 
: BOTTOM ;
: 1st BOTTOM BOTTOM ;  : 2nd 1st 1st ;      : 3rd 2nd 2nd ;
: 4th 3rd 3rd ;        : 5th 4th 4th ;      : 6th 5th 5th ;
: 7th 6th 6th ;        : 8th 7th 7th ;      : 9th 8th 8th ;
: 10th 9th 9th ;       : 11th 10th 10th ;   : 12th 11th 11th ;
: 13th 12th 12th ;     : 14th 13th 13th ;   : 15th 14th 14th ;
: 16th 15th 15th ;     : 17th 16th 16th ;   : 18th 17th 17th ;
: 19th 18th 18th ;     : 20th 19th 19th ;   

CODE RUN    20th  ;CODE 

HOST 
:  1MILLION   CR ."  1 million nest/unnest operations" RUN ;

CR .( enter 1million or 32million )
\ ELAPSE 1MILLION 


\ recompile with tailcall optimization operator ( -; )
ASMFORTH 
: BOTTOM  ;  \ can't optimze this one because there is no function in it. 
: 1ST BOTTOM BOTTOM -;  : 2ND 1ST 1ST -;      : 3RD 2ND 2ND -;
: 4TH 3RD 3RD -;        : 5TH 4TH 4TH -;      : 6TH 5TH 5TH -;
: 7TH 6TH 6TH -;        : 8TH 7TH 7TH -;      : 9TH 8TH 8TH -;
: 10TH 9TH 9TH -;       : 11TH 10TH 10TH -;   : 12TH 11TH 11TH -;
: 13TH 12TH 12TH -;     : 14TH 13TH 13TH -;   : 15TH 14TH 14TH -;
: 16TH 15TH 15TH -;     : 17TH 16TH 16TH -;   : 18TH 17TH 17TH -;
: 19TH 18TH 18TH -;     : 20TH 19TH 19TH -;   

CODE RUN    20TH  ;CODE 

HOST 
:  1MILLION2   CR ." Optimized 1M nest/unnest operations" RUN ;

+TheBF · March 7, 2023

I am writing a manual for ASMForth II and using code examples.

I am borrowing ideas from Chuck Moore's CPU which had hardware registers called TOS (top of stack) and NOS (next on stack).

In the case of the 9900 NOS is actually *SP.

To indicate automatic popping NOS which is *SP+, I now use NOS^ .

So this little example shows ASMForth versus 9900 Assembly Language.

AsmForth Addition Example:

    45 #  7 #     \ push 2 numbers onto the DATA stack. NOS=45 TOS=7 
    NOS^ TOS +   \ Add NOS to TOS. NOS^ pops the stack automatically

This compiles the following TI Assembler code:

    DECT SP 
    MOV  R4,*SP 
    LI   R4,45
    DECT SP 
    MOV  R4,*SP 
    LI   R4,7
    A    *SP+,R4

Writing the doc is helping me refine the "language".

I should have something for public consumption by the weekend.

I think it is viable contender to replace a Forth Assembler since that's really all it is, with some macros added.

If you don't want to use the data stack you could use registers like this:

       45 R0 #!
        7 R1 #!
      R0  R1 +

+TheBF · March 8, 2023

For anyone brave enough here is the repository for ASMForth II.

GitHub - bfox9900/ASMFORTH: Experimental Assembler using Forth like syntax

The manual is rough but if you have more questions you can read the source code or just ask me.

The demo programs give enough examples for you to play around.

Please tell me about bugs that you find and ANY suggestions you think would improve it.

The bin folder has the two files that are needed to be loaded from E/A Option 5.

It is just: Camel99 kernel + Tools + Assembler + ASMFORTH II rolled up into an E/A5 program.

I will be making a front-end on the Assembler to allow vectoring ASMForth code to different

memory which will allow making stand alone binaries as I was doing with Machine Forth which will pretty neat.

I have to do a VDP/Console library which I will write in ASMForth rather than Assembler just because I now can. (I think)

Of course you are free to make your own. There are macros for the string words COUNT and /STRING in ASMForth.

Make yourself EMIT and TYPE to hit the screen.

I gotta stop for awhile.

+TheBF · March 8, 2023

I am adding a few extra Forth primitives to ASMForth and updating the manual.

I am beginning to like the clarity of this notation to write stack operations.

Here is 2DUP in ASMForth

: 2DUP  ( a b -- a b a b ) 
    SP -4 #+      \  make room for 2 cells
    4TH NOS !      
    TOS 3RD ! ;

+TheBF · March 8, 2023

One minor annoyance with the 9900 is dealing with bytes and swapping them in register to do math operations.

I have ignored that up until now thinking I would leave it to the programmer to do the swapping as necessary.

I think I made ASMForth just a bit smarter with this code. It could also be going too far beyond "Assembler".

The C! compiler tests for symbolic addressing in the destination argument.

If it is, then we take the args and compile the MOVB instruction.

If we have a register destination then it compiles the MOVB and follows it with an 8 bit SRL on that same register.

This way if you are using bytes in the TOS register or on the Forth stack they work as expected.

The Assembler MOVB, word is always there if you didn't want this.

\ Add some smarts to C! 
: C!   ( c dst -- )  
    DUP SYMBOL? IF  MOVB,  EXIT THEN  \ do MOVB and get out 

\ dst must be a register    
    DUP>R       \ save copy of DST register 
    MOVB,       \ compile instruction 
    R> 8 SRL,   \ perform swap byte on that register 
;

Here is a little test. ~~I think the logic is correct.~~

Not good for every circumstance. I think I go back to MOVB = C! for now until I think of something better.

\ BYTEOPS.FTH demo code 

\ The 9900 handles bytes in weird way. 
\ C! will shift the bytes if the destination argument is a register. 

HOST 
HEX 
CREATE X  AABB ,
CREATE Y  0000 ,

ASMFORTH 
CODE TEST-C! ( -- n n n)
    DUP               \ free up TOS 
    TOS OFF           \ clr tos  

    X @@  TOS MOVB,   \ assembler version 

    DUP 
    X @@  TOS C!     \ C! has bit shifting

    X @@  Y @@ C!    \ C! has no bit shifting mem 2 mem 
    
    DUP 
    Y @@  TOS C!     \ with bit shifting 

;CODE      

TEST-C!
.S 
Y ?

image.png.f84fd73f91db25dbcb37bb4aaddc2e49.png

+Lee Stewart · March 9, 2023

3 hours ago, TheBF said:
One minor annoyance with the 9900 is dealing with bytes and swapping them in register to do math operations.

I have ignored that up until now thinking I would leave it to the programmer to do the swapping as necessary.

I think I made ASMForth just a bit smarter with this code. It could also be going too far beyond "Assembler".

The C! compiler tests for symbolic addressing in the destination argument.

If it is, then we take the args and compile the MOVB instruction.

If we have a register destination then it compiles the MOVB and follows it with an 8 bit SRL on that same register.

This way if you are using bytes in the TOS register or on the Forth stack they work as expected.

The Assembler MOVB, word is always there if you didn't want this.
\ Add some smarts to C! 
: C!   ( c dst -- )  
    DUP SYMBOL? IF  MOVB,  EXIT THEN  \ do MOVB and get out 

\ dst must be a register    
    DUP>R       \ save copy of DST register 
    MOVB,       \ compile instruction 
    R> 8 SRL,   \ perform swap byte on that register 
;
Here is a little test. ~~I think the logic is correct.~~

Not good for every circumstance. I think I go back to MOVB = C! for now until I think of something better.
\ BYTEOPS.FTH demo code 

\ The 9900 handles bytes in weird way. 
\ C! will shift the bytes if the destination argument is a register. 

HOST 
HEX 
CREATE X  AABB ,
CREATE Y  0000 ,

ASMFORTH 
CODE TEST-C! ( -- n n n)
    DUP               \ free up TOS 
    TOS OFF           \ clr tos  

    X @@  TOS MOVB,   \ assembler version 

    DUP 
    X @@  TOS C!     \ C! has bit shifting

    X @@  Y @@ C!    \ C! has no bit shifting mem 2 mem 
    
    DUP 
    Y @@  TOS C!     \ with bit shifting 

;CODE      

TEST-C!
.S 
Y ? 
    

I know I haven’t been paying enough attention to your ASMForth posts, but is this C! in the ASSEMBLER (or maybe ASMFORTH ) vocabulary (word list) and different from the C! in the FORTH vocabulary?

...lee

+TheBF · March 9, 2023

Just now, Lee Stewart said:

I know I haven’t been paying enough attention to your ASMForth posts, but is this C! in the ASSEMBLER (or maybe ASMFORTH ) vocabulary (word list) and different from the C! in the FORTH vocabulary?

...lee

Yes. At the moment after my moment of "cleverness" failed. It is just a synonym for MOVB again.

I am finding that getting too cute with syntax just adds needless complication.

Basically ASMforth is just changing the nomenclature from TI mnemonics to Forth-like menomics and adding some handy "pseudo-instructions" ie: macros.

I will push it and see how far it goes.

But I think it has some merit.

I am just making a post on making the FOR NEXT loop more versatile.

+TheBF · March 9, 2023

Freeing FOR NEXT

I started on a VDP driver and learned that it was a mistake to force FOR NEXT to take it's argument from TOS.

I kept the one I had but renamed it #FOR so it takes a literal number, like the other instructions with # sign.

The magic of the Forth Assembler is that it uses the DATA stack to pass pieces of an assembly language instruction on the data stack as arguments.

The instruction mnemonic is the actual "Assembler".

It picks up the pieces on the DATA stack and combines them correctly to make an instruction. It's magic.

So... Why couldn't FOR take any valid argument (Register/Addressing mode) for its loop limit? It can!

Here is the new code for the FOR NEXT loop.

: FOR   ( arg --)  RPUSH BEGIN ; 
: NEXT  ( -- )   RP @ 1-   NC UNTIL RDROP ;

The code for RPUSH is:

: RPUSH,  ( src -- ) RP DECT,   ( src) *RP   MOV, ;

You can see that it is an incomplete MOV instruction. The source argument is passed to it on the Forth data stack.

So now this little FOR NEXT loop can do this.

CODE FASTER  \ 14.18 seconds
    100 R0 #! \ loop limit in a register 
    R0 FOR    \ pass Register to FOR :-) 
      R0 FOR
          R0 FOR
           NEXT
        NEXT
    NEXT
;CODE

AND this:

HOST 
VARIABLE X  100 X ! 

ASMFORTH
CODE TESTARG
    X @@ FOR    \ pass a variable to FOR 🙂
      X @@ FOR
          X @@ FOR
           NEXT
        NEXT
    NEXT
;CODE

+TheBF · March 9, 2023

Holy Crap. It works.

VDP Driver in ASMForth and test program that reads the screen, clears the screen and puts it back.

Edit: Had to add a couple of DROP instructions that I forgot I needed at the end of DELAY and VC@

Spoiler

\ VDPLIB.FTH library  for ASMForth II           2023 Mar Brian Fox

HOST
HEX
8800 EQU VDPRD
8802 EQU VDPSTS
8C00 EQU VDPWD
8C02 EQU VDPWA

ASMFORTH  
\ VDPA! takes arg from TOS but leaves it on the stack 
: VDPA! ( Vaddr -- Vaddr) \ set vdp address (read mode)
    R1 STWP,
    0 LIMI,
    9 (R1)  VDPWA @@ C!  \ write odd byte from TOS (ie: R4)
    TOS     VDPWA @@ C!  \ MOV writes the even byte to the port address
;

: VC@   ( addr -- c)
    VDPA! 
    TOS OFF
    VDPRD @@  9 (R1) C!  \ read data into odd byte of R4
    DROP 
;

: VC! ( c Vaddr -- )
    TOS 4000 #OR VDPA! 
    9 (R1) VDPWD @@ C!    \ Odd byte R4, write to screen
    DROP                  \ refill TOS
;

HEX
\ * VDP write to register. Kept the TI name
: VWTR   ( c reg -- )   \ Usage: 5 7 VWTR
    TOS >< 
    NOS^ TOS +         \ combine 2 bytes to one cell
    TOS 8000 #OR  VDPA!
    DROP 
;

: VFILL ( Vaddr cnt char -- )
    TOS R5 !       \ R5 = CHAR
    R5 ><
    R0 POP         \ cnt to R0
    TOS POP        \ Vaddr to TOS 
    TOS 4000 #OR  VDPA! 
    VDPWD R3 #! 
    R0 FOR
        R5 *R3 C!
    NEXT
    DROP 
;

: VREAD ( Vaddr addr n --)
    TOS R0 !
    R5 POP
    TOS POP  VDPA!  
    VDPRD R3 #! 
    R0 FOR
        *R3 *R5+ C!
    NEXT
    DROP 
;

: VWRITE ( addr Vaddr len -- )
    TOS R0 !
    TOS POP       \ Vaddr in TOS 
    TOS 4000 #OR VDPA! \ set write address 
    TOS POP       \  pop RAM addr into TOS 
    VDPWD R3 #! 
    R0 FOR 
        *TOS+ *R3 C!
    NEXT    
    DROP
;

\ test code 

HEX 
\ high level words 
: READSCR       0 # 2000 #  3C0 # VREAD ; 
: WRITESCR   2000 #    0 #  3C0 # VWRITE ;
: CLS           0 #  3C0 #   BL #  VFILL ;
: DELAY      TOS FOR NEXT DROP ;

CODE TEST-R/W
    READSCR 
    CLS  
    FFFF # DELAY 
    WRITESCR 
;CODE

Edited March 9, 2023 by TheBF
Updated code

+TheBF · March 9, 2023

I want the ability to create standalone E/A5 programs with this ASMForth tool.

In that past I modified the entire Assembler program which is dumb. I didn't trust myself to get it correct any other way.

So I thought it was time to make something smarter.

This code is a preamble to control the actions of these dictionary management words:

HERE
C,
ALLOT
CREATE
I discovered I had to fix CODE as well.

With this little preamble and the magic of DEFER words we can compile the stock, unaltered Assembler in the dictionary with the >RAM directive.

Then with >HEAP we alter the action of critical parts of the Assembler so code goes into the dictionary or the HEAP memory.

Much clearer and more versatile.

Edit: added the use of SYNONYM

Spoiler

\ MEMVECTOR.FTH   Brian Fox                         Mar 9 2023
\ redirect Assembler code to different memory locations

INCLUDE DSK1.TOOLS  \ debugging only
INCLUDE DSK1.DEFER 
INCLUDE DSK1.SYNONYM 

\ HEAP memory managers
: HEAP    ( -- addr)  H @ ;
: HALLOT  ( n -- ) H +! ;
: HC,     ( n -- ) HEAP ! 1 HALLOT ;
: H,      ( n -- ) HEAP ! 2 HALLOT ;
: HCREATE HEAP CONSTANT ;
: HCODE   HEADER  HEAP ,   !CSP ; 

\ alias the RAM managers to avoid name conflicts 
SYNONYM <HERE>   HERE 
SYNONYM <ALLOT>  ALLOT 
SYNONYM <C,>     C, 
SYNONYM <,>      , 
SYNONYM <CREATE> CREATE 
SYNONYM <CODE>   CODE 

DEFER HERE 
DEFER ALLOT 
DEFER C,
DEFER , 
DEFER CREATE 
DEFER CODE 

: >RAM  
    ['] <HERE>  IS HERE   
    ['] <ALLOT> IS ALLOT 
    ['] <C,>    IS C, 
    ['] <,>     IS , 
    ['] <CREATE> IS CREATE 
    ['] <CODE>   IS CODE 
; IMMEDIATE 

: >HEAP 
    ['] HEAP   IS HERE 
    ['] HALLOT IS ALLOT      
    ['] HC,    IS C, 
    ['] H,     IS ,
    ['] HCREATE IS CREATE
    ['] HCODE   IS CODE 
; IMMEDIATE 

\ Test 
>RAM 
INCLUDE DSK1.ASM9900

>HEAP 
CODE HEAPTEST   TOS INCT,  NEXT, ENDCODE 

>RAM
CODE RAMTEST   TOS INCT,  NEXT, ENDCODE

Edited March 9, 2023 by TheBF
Updated code

+TheBF · October 3, 2023

2nd Pass Optimizer for MachForth (I jumped back to my first native code compiler)

One the performance enhancing tricks of making a Forth system is to cache the top of the DATA stack (TOS) in a machine register. This creates an "accumulator" register.

Some operations are slower but on balance this can improve system performance by about 10% over using a memory only data stack. Math operations seem to benefit.

When compiling Forth code to inline machine code, for such a system, one of the things that happens is there can be repeating patterns of DROP and DUP in sequence in the code.

This sequence serves no purpose but it is there because of how various primitive Forth instructions need to refill the top of stack register from the memory stack (DROP operation)

or they need to make room in the TOS register for a new value with the DUP operation.

Let's use the example below.

Store a number in a variable followed by a literal number going onto the stack to be used by the next code in the sequence.

BEEF X !  DEAD Y !

The Store (!) operator generates this code:

: !  ( n TOSaddr --)  
  *SP+ *TOS MOV, 
  *SP+  TOS MOV,   ( DROP operation)
;

Next the literal DEAD must go onto the stack.

The code to do that must perform a DUP , meaning it saves the value in the TOS register onto the memory stack.

: DOLITERAL    
     SP DECT,     ( these two intructions are DUP )
     TOS *SP MOV, 
     TOS SWAP LI, ( this loads the tos register with the literal in the source code) 
;

The end result is that we refill the TOS register from memory and then immediately put it back.

I tried a number of ways to detect this situation while compiling code but there seemed to be a SNAFU in looping code where the DROP and the DUP were separated at opposite ends of the loop.

The logic to detect that was more than my brain could handle.

Suffice to say I think doing a 2nd pass is a more reliable way to get the job done and simpler to understand.

The problem code is a simple set of 3 memory words. The solution is find them, remove them and move on.

And the benefits really add up because you are removing 3 instructions per occurrence that you find.

Here is what I came up with.

Spoiler

\ optimizer.fth  for MachForth 

\ Stack machine primitives can create overhead by needlessly
\ DROPing then DUPing the top of stack.
\ This program scans for the troublesome code sequence 
\ and removes the 6 byte sequence wherever it is found. 

HOST 

HEX 2000 CONSTANT CODEIMAGE 

\ search for u in memory block (addr,len)
\ return the new address and len or 0 if not found. 
: SCANW (  addr len u -- addr' len'|0)
        >R     \ remember char
        BEGIN 
          DUP
        WHILE ( len<>0)
          OVER @ R@ <>
        WHILE ( R@ <> u)
           2 /STRING  \ advance to next cell address
        REPEAT
        THEN
        R> DROP     \ 32 bytes
;

: D=   ( d d -- ?) ROT = -ROT = AND ;

: 2CONSTANT  CREATE SWAP  ,  ,  DOES> 2@ ;

HEX
     C136 CONSTANT 'DROP'
0646 C584 2CONSTANT 'DUP'

: FINDDROP  ( addr len -- addr' len' ?)
    'DROP' SCANW  DUP 0> ;

: DROP/DUP? ( addr len -- addr' len' ?)
    FINDDROP >R
    OVER  CELL+ 2@ 'DUP' D=
    R> AND ;

\ EXTRACT moves the binary program in memory
\ to remove the DROP/DUP sequence
: EXTRACT ( addr size -- )     
  >R                       \ save the size 
  DUP  3 CELLS +           \ compute new src address  
  SWAP                     \ addr is the dst address    
  R>                       \ compute new dst address
  ( src dst size ) MOVE    \ move the code 
  3 CELLS NEGATE TDP +!    \ adjust target program end pointer 
;

: PROGRAM  ( -- addr len ) CODEIMAGE TDP @ OVER - ;

: OPTIMIZER 
    BEGIN
      PROGRAM DROP/DUP?
    WHILE 
      EXTRACT 
    REPEAT 
;

Now after compiling a machine forth program you just invoke the optimizer before saving the program image.

Seems to work as planned. I will need to do more testing.

+TheBF · January 9

I don't know where else to put this so it's going here.

I am watching the guys configure GCC to our favorite machine and it has me thinking again about generating efficient 9900 code from Forth (like) syntax.

I heard a talk by Chuck Moore on how he is experimenting using more registers in his current Forth systems.

I started down that path with ASMForth because registers are how you get performance from a register machine.

In the process I noticed that when you use the raw instructions as Forth primitives you have so much freedom.

Assuming the top of stack is cached in a register we can do this:

! becomes MOV

+! becomes Add

etc.

Something cool then happens in that you can convert some stack operations into simpler machine code.

So where DUP in this architecture is:

    SP INCT,
TOS *SP MOV,

DUP can be replaced with

TOS TOS

SWAP means we reverse the order of the arguments that we feed to the instruction.

(I think this will work. It might need a flag somewhere to signal we are swapped)

TOS *SP

Where OVER is something like: (notice we DUP first)

             SP INCT,
        TOS *SP MOV,
     2 (SP) TOS MOV,

OVER can be replaced by

*SP TOS

since we just want to use the 2nd stack item but not destroy it.

The phrase "OVER =" becomes:

     *SP TOS CMP,

This is a far as I got in this line of thought.

I think ROT might not be so easy to convert but I will explore this as time permits.

And all of this begs the question of can we just allocated 3 registers for the top of the stack and use logic to remember what item is in what register?

That makes my head hurt at this stage.

+FarmerPotato · January 10

This merits another OMG. Or three!

I'm trying to understand. You are converting OVER to a setup for the next word which must be an instruction. Got to get my brain around how it's all just assembler directives, not threaded tokens.

Then is the word OVER not available for immediate use?

What then is the convenient way to get the runtime effect of OVER:

TOS *SP+ MOV,

-4 (SP) MOV,

( assuming you have stack growing upwards in MachForth?)

Machine Forth is still very exciting, and hard to understand.

+TheBF · January 10

Correct. Machine Forth is a compiler like C or Pascal. However since it is compiling from the Forth console you can run the code afterwards like a Forth word.

All you need is Forth way to jump into the code (BL *TOS) and word to return to Forth.

CODE RUN   ( entry_address -- )  *TOS  BL,  NEXT,  ENDCODE

After seeing Bill Sullivan's result coding the sieve in Assembler I realized you are never going to get full performance from the 9900 with stack operations.

( My ASMForth sieve based on his code with a few tweaks is actually a hair faster. )

ASMForth is me taking the Machine Forth idea that was for Chuck's CPUs and mutating it into something with a closer fit to 9900 but also using Forth-like syntax.

ASMForthII uses registers like Assembler but you also have PUSH, POP, RPUSH and RPOP macros so you have the stack.

Colon definitions automatically push R11 on entry and POP R11 before RT.

The docs is here https://docs.google.com/document/d/1h-qVQeD6_b58DywrzGZphmSSxUNMKBbY7QCByMFKFOM/edit?usp=sharing

Not the best read.

Here is an ugly HELLO World. ASMForth is a work in progress. I want to get back closer to normal Forth syntax in future if possible.

\ tiny hello world in ASMForth II
\ Translated from hello.c by Tursi for comparison

ASMFORTH
HEX
8C02 CONSTANT VDPWA   \ Write Address port 
8C00 CONSTANT VDPWD   \ Write Data port

\ define the string
CREATE TXT  S" Hello World!" S,

: VDPADDR 
    TOS ><        \ swap bytes 
    TOS VDPWA C!  \ VDP address LSB character store 
    TOS ><        \ swap bytes 
    TOS VDPWA C!  \ VDP address MSB + "write" bit character store 
    DROP 
;

MACRO: EMIT+  ( addr -- addr++)  VDPWD C!  ;MACRO

CODE MAIN 
    0 LIMI,        \ disable interrupts 
\ set the VDP address to >0000 with write bit set
    4000 # VDPADDR
    TXT # 
    *TOS R0 C!     \ byte count -> R0
    R0 8 RSHIFT 
    R0 1-          \ for loop needs 1 less
    TOS 1+         \ skip past byte count 
    R0 FOR         \ get argument from R0 
       TOS @+ EMIT+  \ @+ is indirect auto-inc.
    NEXT
    DROP

    NEXT,
ENDCODE 

\ usage: PAGE MAIN CR

Here is the code generated

   DF00  0300  limi >0000                  (24)
    DF04  0646  dect R6                     (14)
   DF06  C584  mov  R4,*R6                 (30)
   DF08  0204  li   R4,>4000               (20)
   DF0C  06A0  bl   @>dece                 (32)
   DECE  0647  dect R7                     (14)
   DED0  C5CB  mov  R11,*R7                RPush R11
   DED2  06C4  swpb R4                     (14)
   DED4  D804  movb R4,@>8c02              (38)
   DED8  06C4  swpb R4                     (14)
   DEDA  D804  movb R4,@>8c02              (38)
   DEDE  C136  mov  *R6+,R4                (30)
>  DEE0  C2F7  mov  *R7+,R11
   DEE2  045B  b    *R11

   DF10  0646  dect R6                     (14)
   DF12  C584  mov  R4,*R6                 (30)
   DF14  0204  li   R4,>deb2               (20)
   DF18  D014  movb *R4,R0                 (26)
   DF1A  0880  sra  R0,8                   (32)
   DF1C  0600  dec  R0                     (14)
   DF1E  0584  inc  R4                     (14)

   DF20  0647  dect R7
>  DF22  C5CB  mov  R11,*R7        Rpush loop counter R11
   DF24  C2C0  mov  R0,R11         Set R11 loop counter for FOR NEXT
   DF26  D834  movb *R4+,@>8c00
   DF2A  060B  dec  R11            R11 is the return stack CACHE
   DF2C  18FC  joc  >df26
   DF2E  C2F7  mov  *R7+,R11
   DF30  C136  mov  *R6+,R4        DROP refills TOS
   DF32  045A  b    *R10           return to Forth

Machine Forth OMG

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members