Jump to content
TheBF

Camel99 Forth Information goes here

Recommended Posts

8 hours ago, TheBF said:

I took a look my inline optimizer to see if it was possible optimize Forth loop structures as code. While I was at it, things were getting a little complicated so I reduced the business end of the process, copying kernel code snippets, into one word nice word call CODE, .  I can now optimize DO LOOP , BEGIN UNTIL and BEGIN AGAIN with this version. I don't think I will go any further. In theory one could build a recursive descent compiler over the Forth code but I think that's above my pay grade. :) 

 

In order to keep the optimized loop info from getting mixed up with the Forth DATA stack, I make a little secondary LIFO called a control stack. This made it much simpler and I can do nested loops without losing my mind managing mix data on the Forth data stack.

 

Wow! That is truly impressive! :o  I am afraid I would never be able to do that in fbForth 2.0 because of the convoluted way I manage definitions in the ROM kernel. At the very least, I could similarly manage the loop words, but the rest would be far more complicated and likely not worth the effort even were I capable.

 

...lee

  • Like 2

Share this post


Link to post
Share on other sites

Thanks. It's been a long time coming as I get my head back into these realms.  It starts to give diminishing returns this way if I try to go much farther with optimizing.

Technically I am getting close to the realm of GForth where the compiler does this kind of thing by default. I am tempted to try making that work but it gets pretty complex for our little 99. :) 

 

I remember years back getting a copy of GForth and compiling it for DOS.

I used a commercial system called HsForth at the time which I paid $450 US dollars for in 1991 and purchased updates over the decade. It was pretty fast to me.

I was shocked to discover that GForth-fast.exe ran significantly faster than HsForth.  Never looked into it because my career was moving out of technology at that time.

Now I know why it was faster. I guess that's progress.

 

To his credit however Jim Kalihan provided an optimizer with HsForth that worked like INLINE[ ]  but it was  OPT"  ".

I was a huge addition to the compiler and it was a bit buggy on some code, but it gave about a 2X speedup on the code that I tested. He went down the rabbit-hole of trying make it optimize any Forth code.

 

I have a native code Machine Forth cross-compiler that worked before I got distracted by all this VDP stuff, that I must get back to.

While trying to fit 9900 to Chucks Machine Forth ideas I suddenly realized that I could make a "Forth Assembly language" for the 9900.

A large number of instructions overlap so well  1+  2+  1-  2-   "@"  (indirect addressing) for "fetch. etc.

I think MOV would do double duty as store and fetch but that's not clear yet.

This would allow writing Forth syntax but the code would be real close to the metal 9900 code. 

 

The other thing I am now acutely aware of is how bloated sub-routines are on this machine. Unless your sub-routine is 4 instructions or more the compiler should just inline that darn code.

So that means most of the Forth primitives become inline code. Fast but not very Forthy.

 

Enough of my dreaming for today. 

 

 

 

 

  • Like 1

Share this post


Link to post
Share on other sites

Thanks for your explanations.

 

I like this inline optimisation, because (I think) I can understand it! However, it looks to me that you can only inline code-primitives, loops and data words i.e. you couldn't put one of your own non-primative words in the loop. Am I right?

 

On one hand, putting your own words in the loop would dilute the inline optimisation significantly anyway, but on the other hand, if you inlined primatives only,  couldn't you could put it in a dedicated code word instead quite easily?

 

Also, didn't you show an inliner some time back that did inline non-primatives, but not at the code level, at the reference level (using the CFA? Sorry, I'm no guru). IIRC, then I guess this is a different inliner?

 

I'm always interested in ways of speeding up my CamelForth 6809 system, but have more learning to do first 👍

Share this post


Link to post
Share on other sites
2 hours ago, D-Type said:

Thanks for your explanations.

 

I like this inline optimisation, because (I think) I can understand it! However, it looks to me that you can only inline code-primitives, loops and data words i.e. you couldn't put one of your own non-primative words in the loop. Am I right?

 

On one hand, putting your own words in the loop would dilute the inline optimisation significantly anyway, but on the other hand, if you inlined primatives only,  couldn't you could put it in a dedicated code word instead quite easily?

 

Also, didn't you show an inliner some time back that did inline non-primatives, but not at the code level, at the reference level (using the CFA? Sorry, I'm no guru). IIRC, then I guess this is a different inliner?

 

I'm always interested in ways of speeding up my CamelForth 6809 system, but have more learning to do first 👍

You are correct. This exercise began as a way to remove NEXT from in between CODE words.

Then I realized data words could be sped up and now I realized that since I had compile native code that loops could also be changed to run native code.

 

It is possible to take a Forth word apart and if you find another Forth word  recursively drill down by calling "optimize" until you find the 1st occurrence of a code word and compile that code.

Then return back up one level, keep doing that until you return to the top level and are at the ';'  of the top level Forth word. (makes my head hurt)

In the process you would recompile end to end ALL of the native code that runs when that word executes.  It would be huge piece of code.

 

VFX Forth is a native code compiler and it has software control to tell it how many levels to drill down when it optimizes ie: expands the code into a long routine with no sub-routine calls.

The control names are:

  • no inlining
  • normal inlining
  • agressive inlining
  • absurd inlining  :)  

Where absurd inlining unravels everything right down to the most primitive piece of code. It's not recommended.

You can learn more here:

 

https://www.mpeforth.com/vfxcom.htm

 

 

Share this post


Link to post
Share on other sites
2 hours ago, D-Type said:

...couldn't you could put it in a dedicated code word instead quite easily?

I didn't address this.

 

This inliner is actually creating headless "code" words in heap memory.  So no memory is used for the name, count byte, Link and precedence field, which can really add up on small systems.

The code created is  like :NONAME words in that there is just an XT.  That XT points to the next address which contains the beginning of the machine code.

At the top of the loop we grab the next free address in heap to hold the XT (code field address as it is traditionally called) ,  point it to the next cell which is where the code will be compiled.

At the bottom of the loop  we just compile that XT into the word  we are compiling  with nothing more than comma ","   after terminating the new code with NEXT, 

Pretty neat if I do say so myself. :) 

 

  • Like 1

Share this post


Link to post
Share on other sites

 

New VDP Driver 

 

Using this version that ran on Weiand Forth I reported at 2:29 using Camel99 V2.51 and stop-watch timing.

Spoiler
\ fig-forth to ANS Forth translation harness

: VARIABLE     CREATE  , ;
: CREATE2   CREATE ;
: CLS   PAGE ;
: ENDIF   POSTPONE THEN ; IMMEDIATE

DECIMAL

\ ===[ ORIGINAL FIG-FORTH CODE ]===
-1 CONSTANT TRUE
0 CONSTANT FALSE

180 CONSTANT SIZE
0 VARIABLE V1
0 VARIABLE V2
0 VARIABLE POWER
CREATE2 A1 SIZE CELLS ALLOT
0 VARIABLE LENGTH

: A1*7->A1 ( -- )
  0 V1 !
  0 V2 !
  SIZE 0 DO
    A1 V1 @ CELLS + @ DUP
    0= V1 @ LENGTH @ > AND V2 @ 0= AND IF
      DROP LEAVE
    ELSE
      7 * V2 @ + DUP
      10 MOD A1 V1 @ CELLS + !
      10 / V2 !
      V1 @ 1+ V1 !
    ENDIF
  LOOP
  V1 @ LENGTH ! ;

: TYPE-A1 ( -- )
  0 V1 !
  FALSE V2 !
  -1 LENGTH @ 1- DO
    A1 I CELLS + @ 48 + DUP
    48 = 0= IF TRUE V2 ! ENDIF
    V2 @ IF
      PAD V1 @ 1+ + C!
      V1 @ 1+ V1 !
    ELSE
      DROP
    ENDIF
  -1 +LOOP
  V1 @ PAD C!
  CR PAD COUNT TYPE CR ;

: TEST-A1 ( -- f )
  0 V1 !
  FALSE V2 !
  LENGTH @ 0 DO
    A1 I CELLS + @
    7 = 0= IF
      0 V1 !
    ELSE
      V1 @ 1+ V1 !
    ENDIF
    V1 @ 5 > IF
      TRUE V2 !
    ENDIF
  LOOP
  V2 @ ;

: 7'S-PROBLEM
  CLS
  A1 SIZE CELLS 0 FILL
  7 A1 !
  2 POWER !
  BEGIN
    A1*7->A1
    CR ." 7 ^ SEVEN TO THE POWER OF " POWER @ . ." IS"
    POWER @ 1+ POWER !
    TYPE-A1
  TEST-A1 UNTIL ;

 

 

On Version 2.67 which I am beating up now, the original code runs in 2:07. 

Still not as fast as the compiled BASIC or TF that's a 21% improvement. 

I know that 7 seconds of my time is from scrolling.

 

For Reference

Compiled BASIC       1:38

TurboForth               1:49

Weiand Forth            2:53 

 

Something to note is the Lee's version written in Forth coding style, using the data stack rather than literally translating from BASIC, runs in one minute on Camel99 v2.67

 

 

 

 

  • Like 2

Share this post


Link to post
Share on other sites

So there were some re-runs of the The Matrix on TV...

Multi-tasking + Magellan + Google Images and you get this. :) 

 

Spoiler
\ THE MATRIX

NEEDS DUMP   FROM DSK1.TOOLS
NEEDS RND    FROM  DSK1.RANDOM
NEEDS COLOR  FROM DSK1.GRAFIX
NEEDS MALLOC FROM DSK1.MALLOC
NEEDS FORK   FROM DSK1.MTASK99

MARKER /MATRIX
\ Japanese characters
DECIMAL
  S" 007E087E08300000" 128 CALLCHAR
  S" 007E020202027E00" 129 CALLCHAR
  S" 0044442404043800" 130 CALLCHAR
  S" 0000600464087000" 131 CALLCHAR
  S" 0004081030501000" 132 CALLCHAR
  S" 0028282828284400" 133 CALLCHAR
  S" 0000107C107C1000" 134 CALLCHAR
  S" 003C448404041800" 135 CALLCHAR
  S" 003C000000007E00" 136 CALLCHAR
  S" 003E020214080400" 137 CALLCHAR
  S" 0004040404043800" 138 CALLCHAR
  S" 0042424242023C00" 139 CALLCHAR
  S" 007C107C100C0000" 140 CALLCHAR
  S" 007C007C007C0000" 141 CALLCHAR
  S" 007C007C04380000" 142 CALLCHAR
  S" 007C44A404380800" 143 CALLCHAR
  S" 007E020438448000" 144 CALLCHAR
  S" 0020203824202000" 145 CALLCHAR
  S" 00107C1424480000" 146 CALLCHAR
  S" 00087C0808300000" 147 CALLCHAR
  S" 00407C4040403C00" 148 CALLCHAR
  S" 00007C007C106000" 149 CALLCHAR
  S" 00287C2808301400" 150 CALLCHAR
  S" 0060600404047800" 151 CALLCHAR
  S" 0054540404381400" 152 CALLCHAR
  S" 007C04281028C400" 153 CALLCHAR
  S" 007C040404043800" 154 CALLCHAR
  S" 0000107C04043800" 155 CALLCHAR
  S" 007C101010107C00" 156 CALLCHAR
  S" 00207C2420202000" 157 CALLCHAR
  S" 00107C0438540000" 158 CALLCHAR

   128 SET# 158 SET#  4 1 COLORS
   CHAR 0 SET# 16 1 COLOR   ( numbers are white)

\ Test
: .JAPAN  CR 159 128 DO I EMIT LOOP ; 
.JAPAN

: CLIP     ROT MIN MAX ;
: RNDCHAR ( -- c)  31 RND 128 +  ;
: RNDX    32 RND 0 31 CLIP ;
: RNDY    5 RND ;
: RNDL    19 RND 4 + ;
: RND#    9 RND ;

: VPUT ( c --) VPOS VC! PAUSE  ;
: VROW++  ( -- ) VROW DUP @ 1+ 23 MIN SWAP ! ;
: FALLING ( length col row  -- )
          AT-XY
          0
          ?DO
             RNDCHAR VPUT
             VROW++
             50 MS
          LOOP
          5 RND 3 > IF RND# >DIGIT VPUT THEN 
;

\ define the three jobs as endless loops
: FALLER
       BEGIN
         RNDL  RNDX  0 FALLING
       AGAIN ;

: ERASER
       BEGIN
          RNDX 0 AT-XY
          24 0
          DO
             BL VPUT
             VROW++
             30 MS
          LOOP
          500 MS
       AGAIN ;

CREATE GREENS   13 , 3 , 4 ,
: RNDGREEN ( -- n)  3 RND 0 2 CLIP CELLS GREENS + @ ;
: SPARKLE  ( -- )   3 RND 16 +  16 19 CLIP RNDGREEN 1 COLOR ;

: SPARKLER
      BEGIN
         PAUSE
         SPARKLE
         [CHAR] 0 SET# 16 1 COLOR
         PAUSE
         [CHAR] 0 SET# 15 1 COLOR
         PAUSE
         [CHAR] 9 SET# 16 1 COLOR
         PAUSE
         [CHAR] 9 SET# 15 1 COLOR
      AGAIN
;

: TASK:  ( -- ) USIZE MALLOC DUP FORK CONSTANT ;

TASK: JOB1  ' FALLER   JOB1 ASSIGN
TASK: JOB2  ' FALLER   JOB2 ASSIGN
TASK: JOB3  ' ERASER   JOB3 ASSIGN
TASK: JOB4  ' SPARKLER JOB4 ASSIGN
TASK: JOB5  ' FALLER   JOB5 ASSIGN

JOB1 WAKE  JOB2 WAKE  JOB3 WAKE  JOB4 WAKE  JOB5 WAKE

: MAIN
    CLEAR
    1 SCREEN
    128 SET# 158 SET#  4 1 COLORS
  ( numbers are white)
    MULTI
    BEGIN
      PAUSE
      ?TERMINAL
    UNTIL
    SINGLE
    8 SCREEN
    BL SET#  [CHAR] Z SET#  2 1 COLORS ;

 

 

  • Like 2

Share this post


Link to post
Share on other sites

Just when you think you've seen everything...

 

Ed down-under, author of DxForth:  Ed's Forth Topics (minimaltype.com)

...was showing some code on comp.lang.forth a while back (last year maybe?)

 

Anyway he had this unique way to do jumps in his assembler. I never saw this before.

I figured out how to make in work in Camel99.

If you prefer more conventional labels instead of Forth's  structured jumps and loops, it is not a bad way to go. 

I added some useful error checking and changed it up to use a small stack to hold the forward references.

 

The code looks like this meaningless little snippet:

CODE TEST ( n -- n')
1 $:    R4 DEC,
        R4 400 CI,
        2 $ JEQ,
        R1  DEC,
        R2  INCT,
        R4 R4 MOV,
        1 $ JNE,
2 $:    NEXT,
ENDCODE

THIS CODE HAS A FLAW. It does not deal with multiple forward jumps to the same label. I need a loop to process the label stack items and resolve all the entries.

Spoiler
\ Numbered assembler labels: Idea from DxForth Ported to Camel99  Mar 2021 Fox
\ NEEDS DUMP FROM DSK1.TOOLS
NEEDS MOV, FROM DSK1.ASM9900
NEEDS PUSH FROM DSK1.STACKS

MARKER REMOVE

DECIMAL
CREATE REFERS   15 CELLS ALLOT
: CLRJMPS ( -- )  REFERS 15 CELLS 0 FILL ;
: ]REF    ( n -- addr) CELLS REFERS + ;  \ reference array

HEX
10 LIFO: ASMLS  \ assembler label stack holds location of jump instructions

: >LS  ( addr -- ) ASMLS PUSH ;
: LS>  ( addr -- ) ASMLS POP ;
: FWD? ( -- n) ASMLS STACK-DEPTH ;

\ Defined in ASM9900:
\ Compute offset, compile into 'jmp
\ : RESOLVE ( 'jmp offset --)  2- 2/ SWAP 1+ C! ;

: $:  ( n -- )
      DUP ]REF @   ABORT" Duplicate label"
      FWD?  \ is there a forward branch on the label stack?
      IF
           DROP  ( don't need the label# now)
           LS> HERE OVER -  RESOLVE   \ resolve fwd jump
      ELSE
           HERE SWAP ]REF !   \ record address for backward jump
     THEN
;

: $   ( n -- offset ) \ offset is compiled by the Jump instruction
        ]REF @ DUP 0=
        IF  ( un-resolved fwd jump)
            ( -- 0)        \ zero offset compiled by jump instruction
            HERE >LS       \ record where the jump instruction is

        ELSE ( ]REF<>0. Therefore it's a backward jump )
            ( -- addr) HERE -  2- 2/ ( -- offset)
        THEN ;

: CODE    ( -- ) CODE CLRJMPS ;  \ re-define code to clr jump table
: ENDCODE ( -- ) ?CSP FWD? ABORT" Un-resolved forward jump"  ;

 

 

Share this post


Link to post
Share on other sites

I went looking for a 16 bit program that could generate PI digits.

I found one on the DxForth site.  I made some changes to allow it to compile on Camel99 Forth, mostly library includes.

I also removed things that were for creating a binary program and added a fast MOVE routine to replace CMOVE which really helps the old girl "move" faster.

 

So it runs but it does not generate PI.  :( 

I will have to try it on another DOS Forth to see if it's the program or an ENDIAN problem or maybe my D+ routine?

 

I posted the code for Lee because it has some unusual Forth usages. 

I am thinking Dr. Stewart will not approve of the BEGIN WHILE UNTIL THEN  loop! :)

But the dynamic allocation of three arrays in the startup word is pretty cool.

 

Spoiler
\ PI.FTH
\
\ Revised 2015-02-09  es
\
\ Compute Pi to an arbitrary precision. Uses Machin's
\ formula:  pi/4 = 4 arctan(1/5) - arctan(1/239)
\
\ Compile with 16-bit DX-Forth: FORTH - INCLUDE PI.F BYE
\ Compile with CAMEL99 Forth: INCLUDE DSK*.PI  ( where * is your drive no.)
\
\ This 16-bit implementation allows up to 45,808 digits
\ to be computed before arithmetic overflow occurs.
\
\ The code can be used on 32-bit targets with appropriate
\ changes:
\
\   16-bit             32-bit
\
\   10000 Multiply     100000000 Multiply
\   <# # # # # #>      <# # # # # # # # # #>
\   4 +loop            8 +loop
\   525 um/mod         1050 um/mod
\                      remove 'digits > 45808' warning
\
\ Acknowledgements:
\
\   Roy Williams, Feb 1994
\   J. W. Stumpel, May 1991
\   E. Ford, Aug 2009
\   R. Bishop, Aug 1978
\
\ This code is PUBLIC DOMAIN. Use at your own risk.

\ Modified for Camel99 Forth  Mar 2021 Fox
NEEDS DUMP  FROM DSK1.TOOLS
NEEDS VALUE FROM DSK1.VALUES
NEEDS D+	  FROM DSK1.DOUBLE
NEEDS .R    FROM DSK1.UDOTR
NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS MALLOC FROM DSK1.MALLOC

\ proper MOVE for 9900, cell wide
HEX
CODE MOVE
    C036 , C076 , C104 , 1306 , 0584 ,
    0244 , FFFE , CC31 , 0644 , 15FD ,
    C136 ,
NEXT,
ENDCODE

DECIMAL

0 VALUE POWER  ( adr)
0 VALUE TERM   ( adr)
0 VALUE RESULT ( adr)
0 VALUE SIZE   ( n)

VARIABLE CARRY

: ADD ( -- )
  0 CARRY !
  RESULT   0 SIZE 1- DO
    I CELLS OVER + ( res) DUP @ 0
    I CELLS TERM + @ 0  D+  CARRY @ M+
    ( hi) CARRY !  ( lo) SWAP ( res) !
  -1 +LOOP  DROP ;

: SUBTRACT ( -- )
  0 CARRY !
  RESULT   0 SIZE 1- DO
    I CELLS OVER + ( RES) DUP @ 0
    I CELLS TERM + @ 0  D-  CARRY @ M+
    ( HI) CARRY !  ( LO) SWAP ( RES) !
  -1 +LOOP  DROP ;

0 VALUE FACTOR

\ scan forward for cell containing non-zero
: +INDEX ( ADR -- ADR INDEX )
    -1
    BEGIN 1+ DUP SIZE -
    WHILE
       2DUP CELLS + @
    UNTIL
    THEN ;

: DIVIDE ( ADR FACTOR -- )
  TO FACTOR   0 CARRY !  +INDEX
  ( adr index )  SIZE SWAP
  ?DO
     I CELLS OVER + ( res)
     DUP @  CARRY @  FACTOR  UM/MOD
    ( quot) ROT ( res) !  ( rem) CARRY !
  LOOP
  DROP ;

\ scan backward for cell containing non-zero
: -INDEX ( adr -- adr index )
    SIZE
    BEGIN 1- DUP
    WHILE
       2DUP CELLS + @
    UNTIL
    THEN ;

: MULTIPLY ( adr factor -- )
  TO FACTOR   0 CARRY !  -INDEX
  ( adr index )  0 SWAP
  DO
    I CELLS OVER + ( res)
    DUP @  FACTOR  UM*  CARRY @ M+
    ( hi) CARRY !  ( lo) SWAP ( res) !
  -1 +LOOP
  DROP ;

: COPY ( -- ) POWER TERM SIZE CELLS MOVE ; \ changed CMOVE to MOVE

\ : ZERO? ( result -- f )  +INDEX NIP SIZE = ;
: ZERO? ( result -- F ) SIZE CELLS 0 SKIP NIP 0= ;

0 VALUE PASS
VARIABLE EXP
VARIABLE SIGN

: DIVISOR ( -- N )
  PASS 1 = IF  5  ELSE  239  THEN ;

: ERASE  0 FILL ;

: INITIALIZE ( -- )
  POWER SIZE CELLS ERASE
  TERM  SIZE CELLS ERASE
  PASS 1 = IF  RESULT SIZE CELLS ERASE  THEN
  16  PASS DUP * / POWER !
  POWER  DIVISOR  DIVIDE
  1 EXP !  PASS 1- SIGN ! ;

0 VALUE NDIGIT

: CalcPi ( -- )
  NDIGIT 45800 U> IF
    ." Warning: digits > 45808 will be in error " CR
  THEN

  2 1+ 1
  DO
    I TO PASS
    INITIALIZE
    BEGIN
      COPY
      TERM  EXP @ DIVIDE
      SIGN @  DUP IF  SUBTRACT  ELSE  ADD  THEN
      0= SIGN !  2 EXP +!
      POWER  DIVISOR DUP *  DIVIDE
      POWER ZERO?
    UNTIL
  LOOP ;

\ VARIABLE OUT
\ : CR  CR  0 OUT ! ;
\ : #   #   1 OUT +! ;

DECIMAL
: PRINT ( -- )
  CR
  RESULT  DUP @ 0 .R  [CHAR] . EMIT SPACE
  NDIGIT 0
  ?DO
    0 OVER !
    DUP 10000 MULTIPLY
    DUP @  0 <# # # # # #> TYPE SPACE
\    OUT @ C/L @ > IF CR THEN  \ not needed for Camel99
    4  +LOOP
  DROP  CR ;

: GetNumber ( -- n )
  CR ." How many digits do you want? "
  PAD DUP 20 ACCEPT NUMBER? ABORT" Invalid" CR ;

: PI ( n -- )
( GetNumber ) DUP TO NDIGIT

  \ array size = ceil(ndigit / log10(2^16))
  109 UM* 525 UM/MOD SWAP ( rem) IF  1+  THEN
  ( extra for accurate last digits)
  2 +  TO SIZE

  \ create arrays in un-allocated memory
  HERE TO POWER   SIZE CELLS ALLOT
  HERE TO TERM    SIZE CELLS ALLOT
  HERE TO RESULT  SIZE CELLS ALLOT
  50 ALLOT  ( hold buffer space)


  CalcPi
\  PRINT
;
\ end

 

 

  • Like 2

Share this post


Link to post
Share on other sites
2 hours ago, TheBF said:

I posted the code for Lee because it has some unusual Forth usages. 

I am thinking Dr. Stewart will not approve of the BEGIN WHILE UNTIL THEN  loop! :)

 

It makes Dr. Stewart’s head hurt. 😵  I cannot do that in fbForth without redefining words.

 

Now, of course, I must sift through that code. 🤓

 

...lee

  • Like 3

Share this post


Link to post
Share on other sites

LOL.  I could never do that in the past either. Still makes my head spin.

This is this new way to write loop words with no protections that seems to be the norm now with the standards team.

 

It's interesting never the less.

 

I don't know why this math doesn't work on my system. This is the second program I have found. The other was Mandelbrot.

It compiles and runs but the results are not correct. I am wondering if it is the ENDIAN difference with 9900.

And of course that ship has sailed and nobody else does it like TI did these days so nobody thinks about it when they write programs.

 

  • Like 3

Share this post


Link to post
Share on other sites

I have been reading on comp.lang.forth that many Forth systems use a fully hashed lookup for their dictionary.  It got me thinking about a way to map hashing onto Camel99 Forth. This was how it was done on HsForth.

The default system had a simple linked list.  HASHING was a Forth source file that you loaded and then you had a hashed dictionary system.

Compiling speeds were almost 10X faster on my old IBM PC clone! It was amazing for the time.

 

I have FIND as vectored word, through a variable, so that parts is fixable.  I would have to re-write all dictionary creating words but that's short list for my system.

 

HEADER,  ( addr len -- ) is at the root for dictionary entry creation and then I would just need to re-define CREATE CONSTANT VARIABLE and USER to use the new HEADER, word.

 

HEADER, is this below. It got a little more complicated with "wordlist" support in the latest versions.

: HEADER, ( addr len --)
      ALIGN
      CURRENT @  DUP>R   \ Fetch CURRENT wordlist id, make a copy for later
\ === compile the header fields ===
      @ ,                \ fetch the NFA & compile in the LFA field
      0 C,               \ compile the precedence byte (immediate flag)
      HERE DUP LATEST !  \ HERE is now a new NFA, store NFA in LATEST
      R>  !              \ & also in the current 'Wordlist ID' we saved earlier
      S, ;               \ compile the (addr len) string as the name.

HEADER would need to be enhanced to also do:

  1. hash the name  giving hash#
  2. compute the CFA from the NFA that it just created
  3. put that new CFA in the hashtable[hash#]

 

Find would be an amazing improvement.  

  1. hash the name
  2. use that hash# as an index into the hashtable
  3. Fetch hashtable[index] and voila! That's the CFA of the word.

If the hash function is simple and written as CODE, FIND will be much much faster.

 

Hash Function of Today

In that past I always used a simple hash function that just summed up the ascii values in a string and modulo divided by the number of locations in the hash-table  where that number was always prime.

That's closed hashing. My old projects were limited and hashing collisions were handled by just looking at the next hash-table cell. So it degenerated to a linear search when the hash-table was very full.

 

I went looking for what's that state of the art in the 21st Century and found this site: FNV Hash (isthe.com)

This is really good and information I found on stackoverflow indicate the FNV1A method really is very close to optimal.  The interesting thing is that nobody cares about 16 bit values. They assume 32 bit integers.  I could used mixed math but I wanted to test what happens with 16 bit computation.  I got some reasonable results using different values for FNV_prime.  The best one so far is 17, at least for single bytes. 

My value for offset_basis was chosen by using the same ratio of the 32 bit number from the web page

 

 offset_basis:2^32     ->   offset_basis:2^16 

and then I picked a close prime number. What the hell? I am in un-charted waters. :) 

 

Here is my initial version of FNV1A in Forth:

\ FNV1A adapted for 16 bits   Mar 15 2021  Fox

\ hash = offset_basis
\ for each octet_of_data to be hashed
\     hash = hash xor octet_of_data
\     hash = hash * FNV_prime
\ next octet_of_data
\ return hash

NEEDS DUMP   FROM DSK1.TOOLS
NEEDS ?BREAK FROM DSK1.BREAK
NEEDS VALUE  FROM DSK1.VALUES
MARKER REMOVE

\ FNV-1 Has no values for 16 bits hashes so I made these up based on
\ discussions at: http://www.isthe.com/chongo/tech/comp/fnv/

DECIMAL
 2707 CONSTANT FNV_prime    \  2^11 + 2^8 + 74 = 2707
47351 CONSTANT offset_basis  \ prime# in the upper range of 2^16
\ 33767 prime

\ single character hasher
: FNV1A ( c -- hash)  offset_basis XOR FNV_prime * ;

\ string hasher
: $FNV1A ( addr len -- hash)
       offset_basis -ROT
       BOUNDS
       DO
         I [email protected] XOR FNV_prime *
       LOOP
;

Right now I am working a test framework to hash the Forth dictionary into a table in low ram and see how it works.

I will use an open hash method where we start in table of 512 cells.  If there is a hashing collision, I will add 512 to base address and use that address. So in 4K I will have room for 7 collisions at any hash#

We shall see if it actually works. :) 

 

  • Like 4

Share this post


Link to post
Share on other sites

I was thinking about some of this today, wondering.... how the dictionary stored WORDS, values for those and tabling the addresses. I, like I say, just thought about it, I was ok with the basics. But until you see it, you really don't know. And I've not done that.

Share this post


Link to post
Share on other sites
3 minutes ago, GDMike said:

I was thinking about some of this today, wondering.... how the dictionary stored WORDS, values for those and tabling the addresses. I, like I say, just thought about it, I was ok with the basics. But until you see it, you really don't know. And I've not done that.

It seems a little harder than it really is in common Forth systems.

 

If you think about just the Forth words, so one vocabulary, it is a simple linked list.

I will do Camel Forth's dictionary. There are LOTS of variations since so many people make their own Forth systems.

The header data structure is:

cell: LFA   \ link field to previous word
byte:  0     \ immdiate field
byte: len    \ word name lenght in bytes
text: WORD   \ text field

 

The linked list looks like the text picture below: (crude pic sorry) 

 

The search starts  at the bottom with a variable somewhere, Camel's is called LATEST in the plain version of the system.

Fetch LATEST and you get the address of a name field, for a word. That address is the byte count of the text.

Go back 3 bytes from the count byte and you find the next link field,  fetch that link field and you get the next count byte and so on...

When the link=0 you are at the end.

last entry in the dictionary
  <0><00>
       ^
       ^---<link><04>EXIT 
                   ^  
                   ^---<link><04>DOCOL<---
                              ^
       >-------more of this----
       ^
       ^
       ^<link> <05>FORTH
                ^
                ^
VARIABLE LATEST ^    

\ Lastest holds the name field byte of the last word created 
          

 

 

  • Thanks 1

Share this post


Link to post
Share on other sites
52 minutes ago, TheBF said:

I have been reading on comp.lang.forth that many Forth systems use a fully hashed lookup for their dictionary.  It got me thinking about a way to map hashing onto Camel99 Forth. This was how it was done on HsForth.  The default system had a simple linked list.  HASHING was a Forth source file that you loaded and then you had a hashed dictionary system.

Compiling speeds were almost 10X faster on my old IBM PC clone! It was amazing for the time.

.

.

.

Right now I am working a test framework to hash the Forth dictionary into a table in low ram and see how it works.

I will use an open hash method where we start in table of 512 cells.  If there is a hashing collision, I will add 512 to base address and use that address. So in 4K I will have room for 7 collisions at any hash#.  We shall see if it actually works. :) 

 

Just stop it! This could seriously lead to fbForth 3.0.  |:)

 

...lee

  • Like 2
  • Haha 2

Share this post


Link to post
Share on other sites
On 3/13/2021 at 3:59 PM, TheBF said:

 

New VDP Driver 

 

Using this version that ran on Weiand Forth I reported at 2:29 using Camel99 V2.51 and stop-watch timing.

  Hide contents

\ fig-forth to ANS Forth translation harness

: VARIABLE     CREATE  , ;
: CREATE2   CREATE ;
: CLS   PAGE ;
: ENDIF   POSTPONE THEN ; IMMEDIATE

DECIMAL

\ ===[ ORIGINAL FIG-FORTH CODE ]===
-1 CONSTANT TRUE
0 CONSTANT FALSE

180 CONSTANT SIZE
0 VARIABLE V1
0 VARIABLE V2
0 VARIABLE POWER
CREATE2 A1 SIZE CELLS ALLOT
0 VARIABLE LENGTH

: A1*7->A1 ( -- )
  0 V1 !
  0 V2 !
  SIZE 0 DO
    A1 V1 @ CELLS + @ DUP
    0= V1 @ LENGTH @ > AND V2 @ 0= AND IF
      DROP LEAVE
    ELSE
      7 * V2 @ + DUP
      10 MOD A1 V1 @ CELLS + !
      10 / V2 !
      V1 @ 1+ V1 !
    ENDIF
  LOOP
  V1 @ LENGTH ! ;

: TYPE-A1 ( -- )
  0 V1 !
  FALSE V2 !
  -1 LENGTH @ 1- DO
    A1 I CELLS + @ 48 + DUP
    48 = 0= IF TRUE V2 ! ENDIF
    V2 @ IF
      PAD V1 @ 1+ + C!
      V1 @ 1+ V1 !
    ELSE
      DROP
    ENDIF
  -1 +LOOP
  V1 @ PAD C!
  CR PAD COUNT TYPE CR ;

: TEST-A1 ( -- f )
  0 V1 !
  FALSE V2 !
  LENGTH @ 0 DO
    A1 I CELLS + @
    7 = 0= IF
      0 V1 !
    ELSE
      V1 @ 1+ V1 !
    ENDIF
    V1 @ 5 > IF
      TRUE V2 !
    ENDIF
  LOOP
  V2 @ ;

: 7'S-PROBLEM
  CLS
  A1 SIZE CELLS 0 FILL
  7 A1 !
  2 POWER !
  BEGIN
    A1*7->A1
    CR ." 7 ^ SEVEN TO THE POWER OF " POWER @ . ." IS"
    POWER @ 1+ POWER !
    TYPE-A1
  TEST-A1 UNTIL ;

 

 

On Version 2.67 which I am beating up now, the original code runs in 2:07. 

Still not as fast as the compiled BASIC or TF that's a 21% improvement. 

I know that 7 seconds of my time is from scrolling.

 

For Reference

Compiled BASIC       1:38

TurboForth               1:49

Weiand Forth            2:53 

 

Something to note is the Lee's version written in Forth coding style, using the data stack rather than literally translating from BASIC, runs in one minute on Camel99 v2.67

 

 

 

 

I was wanting to try this code on Forth+, but without a screen to store, I was left to type into the command line, but after reaching the 79th char position, I got the big forth finger..

Then I found CELLS doesn't exist.

But =CELLS does. 

Edited by GDMike

Share this post


Link to post
Share on other sites

And is 2*  just 2 * because that 2* is not defined either. Ahhh. I got it

IMG_20210317_100428297.jpg

Edited by GDMike

Share this post


Link to post
Share on other sites

I've asked bill how can I get to a screen file to where I can edit, write code.

Rt now I'm able to get the program,FORTH+ to boot, but nothing else, except write in the command line.

But I'm trying this program again...lol

It's not letting me, it can't create CREATE2, and other words, I'll have to wait until I can edit a workspace. This program won't let me get or create a screen/block file because I don't know the commands yet.

Edited by GDMike
  • Like 1

Share this post


Link to post
Share on other sites

Update on FNV1A Hash

 

I think it's a bust. I built a test program to hash the dictionary. It takes about 3.5 seconds for 400 words with no CODE speedups.

But I am getting 10% collisions in a 2K location hash-table trying all difference prime numbers!!! Yuk.

 

I  found something at York U. up here in Toronto. Wish I saw it earlier. :)

www.cse.yorku.ca/~oz/hash.html

/* djb2
this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm
(now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than 
many other constants, prime or not) has never been adequately explained. */
    unsigned long
    hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;

        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

        return hash;
    }

I think the comment is wrong.    hash << 5 is * 32   not 33. 

Kids these days! :) 

Old guys these days.  ((hash<<5)+hash) is times 33 I reckon. 

 

  • Haha 2

Share this post


Link to post
Share on other sites

DJB2 in Forth

DECIMAL
: DJB2 ( addr len -- hash)
       5381 -ROT  \ initial long hash seed goes under string
       BOUNDS
       DO
          33 *  I [email protected] +
       LOOP
;

: DJB2A ( addr len -- hash)
       5381 -ROT  \ initial long hash seed goes under string
       BOUNDS
       DO
          33 *  I [email protected] XOR
       LOOP
;

(Multiplication was used for clarity above. ASM Code will use shift and add)

 

Where I was getting  39..40 collisions with FNV1A 2K cell has table (4Kbytes)

DJB2 is giving me 47 in 2K cell hash table.

DJB2A which uses XOR instead of +  gives me 44.

 

Looks like I need a good collision handler no matter which one I use.

The DJB2A is much easier to code up in Assembler so that's that one I will use.

 

 

  • Like 2

Share this post


Link to post
Share on other sites
On 3/17/2021 at 4:28 PM, TheBF said:

I think the comment is wrong.    hash << 5 is * 32   not 33. 

Kids these days! :) 

Old guys these days.  ((hash<<5)+hash) is times 33 I reckon. 

 

I am sure it is purely coincidence, but 33 ( ! ) does happen to be the lowest ASCII value possible for characters (with normal keystrokes, anyway) that can be part of Forth names. At least, that is the case since figForth, which has <null> (ASCII 0) as a word!

 

Were I to go down this road, I might take the opportunity to eliminate the word, <null>, and use another method for exiting the outer interpreter.

 

...lee

Share this post


Link to post
Share on other sites

From what I see I think you are correct in the coincidence.  I have built up a test jig to try different things hashing a real dictionary and I cannot predict anything except maybe that prime numbers give best results for MOD wrapping the size of the hash values.  I want to keep the has table within 4K but you still get collisions so I am trying some different ways to do that.  I will post something shortly so you can see what I am talking about.

 

It's a lot of code and memory for our little 99. But it will be fun to see how it performs. In preliminary tests finding EXIT at the end my dictionary it's about 5X faster. :)

Collisions will be less so but still the search depth is never more that 4 items from what I can see.

 

 

Share this post


Link to post
Share on other sites

Here is what I have so far in the Hashed dictionary testing arena.

  1. Hash the dictionary use Bernstein DJB2A string hashing method. 
    The hashing word uses 2039 prime number to MOD the output hash value to limit the number of buckets in the hash table to stay in a table of 4K bytes.
     
  2. Test jig counts the words and reports some info when completed
     
  3. Collisions are handled by just testing the next hashtable location. If it's empty use it. If not try the next one.
    The test jig prints the word and the bucket address IF there is a collision but not otherwise.
     
  4. This test jig version records the highest address used in the hashtable. That's less needed now that I understand how it behaves.
     
  5. H' can find all words that are not a in a collision bucket.  I need to make a looping word that does that.

     
Spoiler
\ DJB2 Bernstein hashing method for Camel99 Forth     Mar 17 2021 Fox
\ djb2
\   unsigned long
\    hash(unsigned char *str)
\    {
\        unsigned long hash = 5381;
\        int c;
\        while (c = *str++)
\            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
\        return hash;
\    }

NEEDS DUMP    FROM DSK1.TOOLS
NEEDS ELAPSE  FROM DSK1.ELAPSE
NEEDS VALUE   FROM DSK1.VALUES
NEEDS COMPARE FROM DSK1.COMPARE

MARKER REMOVE
HEX
   800 CONSTANT 2K
  1000 CONSTANT 4K
  2000 CONSTANT BUCKETS  \ hash table base address

DECIMAL
2031 CONSTANT PRIMEDIV
: DJB2A ( addr len -- hash)
       5381 -ROT  \ initial long hash seed goes under string
       BOUNDS
       DO
          33 *  I [email protected] XOR
       LOOP
       PRIMEDIV MOD  \ "lazy" mod limits hash table size
;

HEX
: CLRTABLE ( -- ) BUCKETS 2000 0 FILL   3000 H !  ;  \ MOVE
: ]BUCKET ( -- )  CELLS BUCKETS +  ; \ hash table as array

\ ==================================
VARIABLE WORDCNT
VARIABLE COLLISION

\ analysis code records highest address used in the table.
VARIABLE MAXADDR
: KEEPMAX  MAXADDR @ MAX MAXADDR !  ;
: ]BUCKET!  ( cfa ndx -- ) ( DUP . ) ]BUCKET DUP KEEPMAX ! ;

: NAME$ ( nfa -- addr len ) COUNT 1F AND  ;

: HASHNAME  ( addr len -- ndx)
         2DUP CR TYPE  \ debug
         DJB2A
         BEGIN
         DUP ]BUCKET @ WHILE ( bucket<>0) 
         \ **we have a collision**
           COLLISION 1+!   \ debug count the collisions
           1+              \  try next bucket
           DUP ."  :" .    \ debug
         REPEAT
;

: HASHDICT (  -- )
           CLRTABLE
           WORDCNT OFF
           COLLISION OFF
           MAXADDR OFF
           CONTEXT @ @ ( nfa)
           BEGIN
             ( nfa) DUP
             DUP NAME$ HASHNAME ( nfa Hash# ) ]BUCKET!
             KEY? IF KEY DROP THEN  \ debug, stop scroll
             NFA>LFA @ DUP          \ fetch next nfa
           WHILE
             WORDCNT 1+!            \ nice to know
           REPEAT
           DROP
           DECIMAL
           CR WORDCNT @ . SPACE ." words hashed"
           CR COLLISION @ .  ."  collisions"
           CR ." Max Address = "  HEX MAXADDR @ .
           CR ." ---"
;

: H'   ( -- nfa)
         BL PARSE-WORD 2DUP
         DJB2A ]BUCKET @ ( addr len nfa) DUP>R
         NAME$ COMPARE ABORT" Not found"
         R> NFA>CFA ;

 

 

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...