Camel99 Forth Information goes here


Strange Forth thought for the day.


This paraphrased statement came from Charles Moore, the inventor of Forth.


I now had an interpreter which could interpret assembler, which could assemble a compiler, which could compile an interpreter



Dutch Flag Problem using Dijkstra's 3 Colour Algorithm


Well if we ever needed proof that the correct algorithm will outperform poor ones here it is.


I notice that Rosetta Code did not have a Forth entry for the Dutch Flag. In the posting I found a paper on Diijkstra's method with pseudo-code.

So I re-did my COMBSORT version with the algorithm. It's 4 SEVEN times faster!


See the new version run here:



*EDIT* Added a GIF here for simpler viewing


Here is the code:



\ Dutch flag DEMO using CAMEL99 Forth using Dijkstra's Algorithm

\ INCLUDE DSK1.TOOLS.F   ( for debugging)

\ TMS9918 Video chip Specific code

\ define colors and characters
24 32 *  CONSTANT SIZE     \ flag will fill GRAPHICS screen
SIZE 3 / CONSTANT #256     \ 256 chars per segment of flag
1        CONSTANT REDSQR   \ red character
9        CONSTANT WHTSQR   \ white character
19       CONSTANT BLUSQR   \ blue character
28       CONSTANT PTR1

\ color constants


\ charset  FG    BG
  0        RED TRANS COLOR
  1        WHT TRANS COLOR
  2        BLU TRANS COLOR

\ screen fillers
: RNDI    ( -- n ) SIZE 1+ RND ; \ return a random VDP screen address

: NOTRED    (  -- n ) \ return rnd index that is not RED
              RNDI DUP VC@ REDSQR = 
           WHILE DROP  
           REPEAT ;

: NOTREDWHT    ( -- n ) \ return rnd index that is not RED or BLU
           BEGIN  RNDI DUP
              VC@  DUP REDSQR =  
              SWAP WHTSQR = OR
           REPEAT ;

: RNDRED  (  -- ) \ Random RED on VDP screen
          #256 0 DO   REDSQR NOTRED VC!   LOOP ;

: RNDWHT (  -- ) \ place white where there is no red or white
          #256 0 DO   WHTSQR NOTREDWHT VC!   LOOP ;

: BLUSCREEN ( -- )  
           0 768 BLUSQR VFILL ;

\ load the screen with random red,white&blue squares
: RNDSCREEN ( -- )

: CHECKERED  ( -- ) \ red,wht,blue checker board
         SIZE 0
            BLUSQR I VC!
            WHTSQR I 1+ VC!
            REDSQR I 2+ VC!
         3 +LOOP ;

: RUSSIAN  \ Russian flag
            0  0 WHTSQR 256 HCHAR
            0  8 BLUSQR 256 HCHAR
            0 16 REDSQR 256 HCHAR ;

: FRENCH  \ kind of a French flag
           0  0 BLUSQR 256 VCHAR
          10 16 WHTSQR 256 VCHAR
          21  8 REDSQR 256 VCHAR ;

\ Algorithm Dijkstra(A)  \ A is an array of three colors
\ begin
\   r <- 1;
\   b <- n; 
\   w <- n;
\ while (w>=r)
\       check the color of A[w]
\       case 1: red
\               swap(A[r],A [w]);
\                r<-r+1
\       case 2: white
\               w<-w-1
\       case 3: blue
\               swap(A[w],A[b]);
\               w<-w-1;
\               b<-b-1
\ end

\ support routines
: WAIT   11 11 AT-XY ." Finished!" 1500 MS ;

: XCHG  ( adr1 adr2 -- )
      OVER VC@ OVER VC@        \ read the chars in VDP RAM
      SWAP ROT VC! SWAP VC! ;  \ exchange the characters

\ address pointer variables

: DIJKSTRA ( -- )
           0 R !
           SIZE 1- DUP  B !  W !
               W @  R @  1- >
               W @ VC@
                 REDSQR OF  R @ W @  XCHG
                            1 R +!           ENDOF

                 WHTSQR OF -1 W +!           ENDOF

                 BLUSQR OF  W @ B @  XCHG
                           -1 W +!
                           -1 B +!           ENDOF
           REPEAT ;

: RUN ( -- ) \ test with different input patterns

         CR ." Dijkstra Dutchflag Demo"  CR
         CR ." Sorted in-place in Video RAM" CR
         CR ." Using the 3 colour algorithm" CR
         CR ." Press any key to begin" KEY DROP
         0 23 AT-XY
         CR ." Completed"




This old document is very interesting for anyone who would like to know how and why Forth is the unusual language that it is.

It is also a little amusing for younger people to read when you see references to 80 column punched cards and tape I/O.

Although 99ers are familiar with tape in a special way.


Personally I like these words from the book:


"So to offer guidance when the trade-offs become obscure, I am going to define the Basic Principle:


Keep it Simple
As the number of capabilities you add to a program increases, the complexity of the program increases exponentially. The problem of maintaining compatibility among these capabililties, to say nothing of some sort of internal consistency in the program, can easily get out of hand. You can avoid this if you apply the Basic Principle.
You may be acquainted with an operating system that ignored the Basic Principle." ;-)
Recent updates to CAMEL99 Forth

Aug 4, 2018 V2.0.20


  • Kernel is 16 bytes smaller
  • Removed word INCLD from kernel and put code in body of INCLUDED to save space
  • Change INIT code to use structured assembler loop
  • Comment improvements here and there.
  • Move DATA stack reset in COLD word to just before QUIT. This fixed the first error bug. (First bad word entered at console gave "empty stack" error)
  • Removed SPRITE support word DXY from KERNEL, moved to DIRSPRIT (direct sprite control) as a machine code word.
  • Added SEE.F to DSK1 which is a Forth decompiler.
  • Changes to PONG.F game to use INCLUDE from library files.
Crack the "Bullwhip" on the Camel


HI level languages are great for getting work done, but sometimes they don't let us use cool features of a specific CPU.

The 9900 has the cool memory to memory architecture and the BLWP instruction let's us put our registers anywhere that's convenient.


I had to add a BLWP "WORD" to CAMEL99 Forth to link to device service routines and I wondered if I could use it somewhere else.

(I probably should get a life...)


It was pretty efficient to add BLWP to the Camel99 KERNEL because it keeps the top of the Forth stack cached in Register 4.

So here is all I had to do using the XFC99 Cross-compiler/Assembler.


"TOS" below is just an alias for R4 and TOS POP, is macro for : *SP+ TOS MOV,

CODE: BLWP  ( vector -- )  \ "BULLWHIP" takes a vector address as input arg.
           *TOS BLWP,      \ loads the workspace and program counter into CPU and branch
            TOS POP,       \ refill Forth TOS when we get back from our journey...

It needs a TMS9900 vector address on the Forth stack as the input argument and when you say BLWP … the machine goes off to another world... well umm workspace.


I wondered where this would be useful so I imagined a program needing to sum 12 numbers really fast. Perhaps they were 12 variables in a game that determine total Health.


So here is a way you might do it Forth (It could be made a little faster but for demonstration this is clearer)

CREATE ARGS   \ this is our workspace
        0 ,         \ will contain the answer
        2 , 2 , 2 , \ 12 numbers we want to sum
        2 , 2 , 2 ,
        2 , 2 , 2 ,
        2 , 2 , 2 ,
        0 ,         \ R13  reserved so we can get back to Forth
        0 ,         \ R14
        0 ,         \ R15
\ expose ARGS as an array for 12 ints.
\ Note: 0 ]ARG is reserved for the SUM, 13,14,15 cannot be used
: ]ARG ( n -- addr ) CELLS ARGS + ;

: FORTHSUM  ( -- )        
       0              \ this is an accumulator on stack
       13  1
          I ]ARG @ +  \ add values into accumulator
       0 ]ARG !       \ put accumulator back into memory

The FORTHSUM routine took 249 ticks of the TSM9901 timer. That is 5.3 milli-seconds. Not shabby, but what happens if we crack the bullwhip.


Using the same ARGS data I did this:

CODE SUM ( -- )   \ the fast summing program
        R1  R0 ADD,
        R2  R0 ADD,
        R3  R0 ADD,
        R4  R0 ADD,
        R5  R0 ADD,
        R6  R0 ADD,
        R7  R0 ADD,
        R8  R0 ADD,
        R9  R0 ADD,
        R10 R0 ADD,
        R11 R0 ADD,
        R12 R0 ADD,
        RTWP,      \ get back home to Forth

\ Familiar BLWP Vector is a workspace and address of the code just like Assembler.
\ ARGS returns it's own address onto the Forth stack 
\  The comma compiles the number into memory and bumps the mem pointer

\ Forth ' looks up the "execution token" of a Forth word.
\ Forth >body converts the execution token into address of real code



The FASTSUM program ran in 17 TMS9901 timer ticks or .362 milli-seconds!

That's a speed up of over 14 times.


Cracking the Bullwhip makes that old Camel run.

For sure. It makes interrupt handlers a thing of beauty compared to other machines.


*EDIT* and also a pretty cool multi-tasking context switcher.

RTWP does it in one instruction if you preset the workspaces correctly.

Sooooo cool.

*EDIT* and also a pretty cool multi-tasking context switcher.

RTWP does it in one instruction if you preset the workspaces correctly.

Sooooo cool.


Okay, so you're pre-loading R13,R14 and R15 as appropriate and then executing a RTWP to jump to the new task, yes? That's a little counter-intuitive. Isn't that what BLWP is for? :D


How do you get back from the second task to the first task?

Okay, so you're pre-loading R13,R14 and R15 as appropriate and then executing a RTWP to jump to the new task, yes? That's a little counter-intuitive. Isn't that what BLWP is for? :D


How do you get back from the second task to the first task?

At first I thought I could use BLWP but I realized that I would need to keep a separate table of vectors because BLWP overwrites the contents of R13 R14 R15. That ticked me off because those registers could hold the state perpetually.


The only instruction that can jump to a new workspace and NOT destroy the context registers is... RTWP.

So by changing context with RTWP, you don't need any extra memory to store the task vectors.


The graphic is an illustration from the doc.

The latest code version is here:



Here how the code changes context.

CODE: YIELD  ( -- )  
                 RTWP,               \ one instruction switches context        14
\ ---------------------------------------------------------------------------------
 l: _TSTAT       R1  STWP,           \ NEXT TASK: store NEW workspace in R1     8
                 32 (R1) R0 MOV,     \ Read local TFLAG to see if I am awake   28
              NE UNTIL,              \ loop thru tasks until TFLAG is <> 0     10
              NEXT,                  \ if task is awake, run next            \ 60 *.333 = 20uS

Two other counter-intuitive things here:

  1. The label _TSTAT is plugged into R14 of every task when they are FORKed. This routine checks the TFLAG variable to see it if is awake.
  2. TFLAG is a USER variable. Where TurboForth puts code in PAD-RAM, I have put the USER variable table. This lets me use the CPU WP register as the Forth User pointer (UP) register, another space saving.

Here is the code for user Variables when you use WP for as the UP register. It's pretty efficient.

(this was stuff I dreamt about 30... years ago so it's been festering for a long time) ;-)

CODE: DOUSER ( -- addr)    \ Executor for a "USER VARIABLE" (local to each task)
              TOS PUSH,   
              TOS STWP,    \ store workspace register WP in TOS
             *W TOS ADD,   \ add the offset stored in the USER variable's parameter field


Oh one more cool thing about using WP as the user pointer.


All the CPU registers become user variables !


So I can define the Forth Virtual machine registers like this:

  12 USER 'SP    \ the task's Forth SP register ( R6)
  14 USER 'RP    \ the task's Forth RP register ( R7)
  18 USER 'IP    \ the task's Forth IP register ( R8)
  20 USER 'R10   \ address of R10 (holds the NEXT routine)

I kinda love the 9900

Me too, do not see why so many wanted BL @address over BLWP @address, BL has limited number of Registers which means in big programs you waste more time

trying to swap out or switch out variables. Yea BLWP uses 32 more bytes but it is like CALL SUB in XB with extra variables that do not need to interact with main ones.

Me too, do not see why so many wanted BL @address over BLWP @address, BL has limited number of Registers which means in big programs you waste more time

trying to swap out or switch out variables. Yea BLWP uses 32 more bytes but it is like CALL SUB in XB with extra variables that do not need to interact with main ones.


That's a good point. I am contemplating writing a driver for the serial ports and I thought of making a receive queue so as not to miss characters without using interrupts. (work in progress)

My current view is to allocate 32 extra bytes at the end of the rcv buffer and use it as the workspace to house the queue's input/output pointers.

The code will be super simple and all the circular queue data and registers will be in one block.


I also realized yesterday that by not refilling the Forth TOS register when I do BLWP *TOS , I can return a single item back to Forth's top of stack with just a move indexed-indirect through R13.

So I can use that to get character out of the Q back to Forth.


There is a temptation inside Forth to just use the stacks because they are there, but with a BLWP Forth instruction the world opens up nicely.


Here is the previous example using the parameter return back to Forth mechanism.

\ Using Forth to make a macro to name the code: 8@(R13) 
\ ie: the Forth TOS register that called the function
: [TOS]     8 R13 () ; 

\ *NEW*  BLWP() leaves TOS register un-touched. (no refill)
\  Return 1 arg to Forth TOS
CODE BLWP() ( addr -- arg)        
            *TOS BLWP,

CODE SUM ( -- )   \ the fast summing program
        R1  R0 ADD,
        R2  R0 ADD,
        R3  R0 ADD,
        R4  R0 ADD,
        R5  R0 ADD,
        R6  R0 ADD,
        R7  R0 ADD,
        R8  R0 ADD,
        R9  R0 ADD,
        R10 R0 ADD,
        R11 R0 ADD,
        R12 R0 ADD,
        R0 [TOS] MOV,  \ *NEW* R0 to Forth TOS register


: FASTSUM  ( -- answer) SUMVECT BLWP() ; 

\ Usage at Forth console

FASTSUM .  24 ok
