+Lee Stewart Posted March 8, 2021 Share Posted March 8, 2021 8 hours ago, TheBF said: I took a look my inline optimizer to see if it was possible optimize Forth loop structures as code. While I was at it, things were getting a little complicated so I reduced the business end of the process, copying kernel code snippets, into one word nice word call CODE, . I can now optimize DO LOOP , BEGIN UNTIL and BEGIN AGAIN with this version. I don't think I will go any further. In theory one could build a recursive descent compiler over the Forth code but I think that's above my pay grade. In order to keep the optimized loop info from getting mixed up with the Forth DATA stack, I make a little secondary LIFO called a control stack. This made it much simpler and I can do nested loops without losing my mind managing mix data on the Forth data stack. Wow! That is truly impressive! I am afraid I would never be able to do that in fbForth 2.0 because of the convoluted way I manage definitions in the ROM kernel. At the very least, I could similarly manage the loop words, but the rest would be far more complicated and likely not worth the effort even were I capable. ...lee 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 8, 2021 Author Share Posted March 8, 2021 Thanks. It's been a long time coming as I get my head back into these realms. It starts to give diminishing returns this way if I try to go much farther with optimizing. Technically I am getting close to the realm of GForth where the compiler does this kind of thing by default. I am tempted to try making that work but it gets pretty complex for our little 99. I remember years back getting a copy of GForth and compiling it for DOS. I used a commercial system called HsForth at the time which I paid $450 US dollars for in 1991 and purchased updates over the decade. It was pretty fast to me. I was shocked to discover that GForth-fast.exe ran significantly faster than HsForth. Never looked into it because my career was moving out of technology at that time. Now I know why it was faster. I guess that's progress. To his credit however Jim Kalihan provided an optimizer with HsForth that worked like INLINE[ ] but it was OPT" ". I was a huge addition to the compiler and it was a bit buggy on some code, but it gave about a 2X speedup on the code that I tested. He went down the rabbit-hole of trying make it optimize any Forth code. I have a native code Machine Forth cross-compiler that worked before I got distracted by all this VDP stuff, that I must get back to. While trying to fit 9900 to Chucks Machine Forth ideas I suddenly realized that I could make a "Forth Assembly language" for the 9900. A large number of instructions overlap so well 1+ 2+ 1- 2- "@" (indirect addressing) for "fetch. etc. I think MOV would do double duty as store and fetch but that's not clear yet. This would allow writing Forth syntax but the code would be real close to the metal 9900 code. The other thing I am now acutely aware of is how bloated sub-routines are on this machine. Unless your sub-routine is 4 instructions or more the compiler should just inline that darn code. So that means most of the Forth primitives become inline code. Fast but not very Forthy. Enough of my dreaming for today. 1 Quote Link to comment Share on other sites More sharing options...
D-Type Posted March 8, 2021 Share Posted March 8, 2021 Thanks for your explanations. I like this inline optimisation, because (I think) I can understand it! However, it looks to me that you can only inline code-primitives, loops and data words i.e. you couldn't put one of your own non-primative words in the loop. Am I right? On one hand, putting your own words in the loop would dilute the inline optimisation significantly anyway, but on the other hand, if you inlined primatives only, couldn't you could put it in a dedicated code word instead quite easily? Also, didn't you show an inliner some time back that did inline non-primatives, but not at the code level, at the reference level (using the CFA? Sorry, I'm no guru). IIRC, then I guess this is a different inliner? I'm always interested in ways of speeding up my CamelForth 6809 system, but have more learning to do first ? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 8, 2021 Author Share Posted March 8, 2021 2 hours ago, D-Type said: Thanks for your explanations. I like this inline optimisation, because (I think) I can understand it! However, it looks to me that you can only inline code-primitives, loops and data words i.e. you couldn't put one of your own non-primative words in the loop. Am I right? On one hand, putting your own words in the loop would dilute the inline optimisation significantly anyway, but on the other hand, if you inlined primatives only, couldn't you could put it in a dedicated code word instead quite easily? Also, didn't you show an inliner some time back that did inline non-primatives, but not at the code level, at the reference level (using the CFA? Sorry, I'm no guru). IIRC, then I guess this is a different inliner? I'm always interested in ways of speeding up my CamelForth 6809 system, but have more learning to do first ? You are correct. This exercise began as a way to remove NEXT from in between CODE words. Then I realized data words could be sped up and now I realized that since I had compile native code that loops could also be changed to run native code. It is possible to take a Forth word apart and if you find another Forth word recursively drill down by calling "optimize" until you find the 1st occurrence of a code word and compile that code. Then return back up one level, keep doing that until you return to the top level and are at the ';' of the top level Forth word. (makes my head hurt) In the process you would recompile end to end ALL of the native code that runs when that word executes. It would be huge piece of code. VFX Forth is a native code compiler and it has software control to tell it how many levels to drill down when it optimizes ie: expands the code into a long routine with no sub-routine calls. The control names are: no inlining normal inlining agressive inlining absurd inlining Where absurd inlining unravels everything right down to the most primitive piece of code. It's not recommended. You can learn more here: https://www.mpeforth.com/vfxcom.htm Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 8, 2021 Author Share Posted March 8, 2021 2 hours ago, D-Type said: ...couldn't you could put it in a dedicated code word instead quite easily? I didn't address this. This inliner is actually creating headless "code" words in heap memory. So no memory is used for the name, count byte, Link and precedence field, which can really add up on small systems. The code created is like :NONAME words in that there is just an XT. That XT points to the next address which contains the beginning of the machine code. At the top of the loop we grab the next free address in heap to hold the XT (code field address as it is traditionally called) , point it to the next cell which is where the code will be compiled. At the bottom of the loop we just compile that XT into the word we are compiling with nothing more than comma "," after terminating the new code with NEXT, Pretty neat if I do say so myself. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 13, 2021 Author Share Posted March 13, 2021 New VDP Driver Using this version that ran on Weiand Forth I reported at 2:29 using Camel99 V2.51 and stop-watch timing. Spoiler \ fig-forth to ANS Forth translation harness : VARIABLE CREATE , ; : CREATE2 CREATE ; : CLS PAGE ; : ENDIF POSTPONE THEN ; IMMEDIATE DECIMAL \ ===[ ORIGINAL FIG-FORTH CODE ]=== -1 CONSTANT TRUE 0 CONSTANT FALSE 180 CONSTANT SIZE 0 VARIABLE V1 0 VARIABLE V2 0 VARIABLE POWER CREATE2 A1 SIZE CELLS ALLOT 0 VARIABLE LENGTH : A1*7->A1 ( -- ) 0 V1 ! 0 V2 ! SIZE 0 DO A1 V1 @ CELLS + @ DUP 0= V1 @ LENGTH @ > AND V2 @ 0= AND IF DROP LEAVE ELSE 7 * V2 @ + DUP 10 MOD A1 V1 @ CELLS + ! 10 / V2 ! V1 @ 1+ V1 ! ENDIF LOOP V1 @ LENGTH ! ; : TYPE-A1 ( -- ) 0 V1 ! FALSE V2 ! -1 LENGTH @ 1- DO A1 I CELLS + @ 48 + DUP 48 = 0= IF TRUE V2 ! ENDIF V2 @ IF PAD V1 @ 1+ + C! V1 @ 1+ V1 ! ELSE DROP ENDIF -1 +LOOP V1 @ PAD C! CR PAD COUNT TYPE CR ; : TEST-A1 ( -- f ) 0 V1 ! FALSE V2 ! LENGTH @ 0 DO A1 I CELLS + @ 7 = 0= IF 0 V1 ! ELSE V1 @ 1+ V1 ! ENDIF V1 @ 5 > IF TRUE V2 ! ENDIF LOOP V2 @ ; : 7'S-PROBLEM CLS A1 SIZE CELLS 0 FILL 7 A1 ! 2 POWER ! BEGIN A1*7->A1 CR ." 7 ^ SEVEN TO THE POWER OF " POWER @ . ." IS" POWER @ 1+ POWER ! TYPE-A1 TEST-A1 UNTIL ; On Version 2.67 which I am beating up now, the original code runs in 2:07. Still not as fast as the compiled BASIC or TF that's a 21% improvement. I know that 7 seconds of my time is from scrolling. For Reference Compiled BASIC 1:38 TurboForth 1:49 Weiand Forth 2:53 Something to note is the Lee's version written in Forth coding style, using the data stack rather than literally translating from BASIC, runs in one minute on Camel99 v2.67 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 14, 2021 Author Share Posted March 14, 2021 So there were some re-runs of the The Matrix on TV... Multi-tasking + Magellan + Google Images and you get this. Spoiler \ THE MATRIX NEEDS DUMP FROM DSK1.TOOLS NEEDS RND FROM DSK1.RANDOM NEEDS COLOR FROM DSK1.GRAFIX NEEDS MALLOC FROM DSK1.MALLOC NEEDS FORK FROM DSK1.MTASK99 MARKER /MATRIX \ Japanese characters DECIMAL S" 007E087E08300000" 128 CALLCHAR S" 007E020202027E00" 129 CALLCHAR S" 0044442404043800" 130 CALLCHAR S" 0000600464087000" 131 CALLCHAR S" 0004081030501000" 132 CALLCHAR S" 0028282828284400" 133 CALLCHAR S" 0000107C107C1000" 134 CALLCHAR S" 003C448404041800" 135 CALLCHAR S" 003C000000007E00" 136 CALLCHAR S" 003E020214080400" 137 CALLCHAR S" 0004040404043800" 138 CALLCHAR S" 0042424242023C00" 139 CALLCHAR S" 007C107C100C0000" 140 CALLCHAR S" 007C007C007C0000" 141 CALLCHAR S" 007C007C04380000" 142 CALLCHAR S" 007C44A404380800" 143 CALLCHAR S" 007E020438448000" 144 CALLCHAR S" 0020203824202000" 145 CALLCHAR S" 00107C1424480000" 146 CALLCHAR S" 00087C0808300000" 147 CALLCHAR S" 00407C4040403C00" 148 CALLCHAR S" 00007C007C106000" 149 CALLCHAR S" 00287C2808301400" 150 CALLCHAR S" 0060600404047800" 151 CALLCHAR S" 0054540404381400" 152 CALLCHAR S" 007C04281028C400" 153 CALLCHAR S" 007C040404043800" 154 CALLCHAR S" 0000107C04043800" 155 CALLCHAR S" 007C101010107C00" 156 CALLCHAR S" 00207C2420202000" 157 CALLCHAR S" 00107C0438540000" 158 CALLCHAR 128 SET# 158 SET# 4 1 COLORS CHAR 0 SET# 16 1 COLOR ( numbers are white) \ Test : .JAPAN CR 159 128 DO I EMIT LOOP ; .JAPAN : CLIP ROT MIN MAX ; : RNDCHAR ( -- c) 31 RND 128 + ; : RNDX 32 RND 0 31 CLIP ; : RNDY 5 RND ; : RNDL 19 RND 4 + ; : RND# 9 RND ; : VPUT ( c --) VPOS VC! PAUSE ; : VROW++ ( -- ) VROW DUP @ 1+ 23 MIN SWAP ! ; : FALLING ( length col row -- ) AT-XY 0 ?DO RNDCHAR VPUT VROW++ 50 MS LOOP 5 RND 3 > IF RND# >DIGIT VPUT THEN ; \ define the three jobs as endless loops : FALLER BEGIN RNDL RNDX 0 FALLING AGAIN ; : ERASER BEGIN RNDX 0 AT-XY 24 0 DO BL VPUT VROW++ 30 MS LOOP 500 MS AGAIN ; CREATE GREENS 13 , 3 , 4 , : RNDGREEN ( -- n) 3 RND 0 2 CLIP CELLS GREENS + @ ; : SPARKLE ( -- ) 3 RND 16 + 16 19 CLIP RNDGREEN 1 COLOR ; : SPARKLER BEGIN PAUSE SPARKLE [CHAR] 0 SET# 16 1 COLOR PAUSE [CHAR] 0 SET# 15 1 COLOR PAUSE [CHAR] 9 SET# 16 1 COLOR PAUSE [CHAR] 9 SET# 15 1 COLOR AGAIN ; : TASK: ( -- ) USIZE MALLOC DUP FORK CONSTANT ; TASK: JOB1 ' FALLER JOB1 ASSIGN TASK: JOB2 ' FALLER JOB2 ASSIGN TASK: JOB3 ' ERASER JOB3 ASSIGN TASK: JOB4 ' SPARKLER JOB4 ASSIGN TASK: JOB5 ' FALLER JOB5 ASSIGN JOB1 WAKE JOB2 WAKE JOB3 WAKE JOB4 WAKE JOB5 WAKE : MAIN CLEAR 1 SCREEN 128 SET# 158 SET# 4 1 COLORS ( numbers are white) MULTI BEGIN PAUSE ?TERMINAL UNTIL SINGLE 8 SCREEN BL SET# [CHAR] Z SET# 2 1 COLORS ; THEMATRIX.mp4 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 14, 2021 Author Share Posted March 14, 2021 Just when you think you've seen everything... Ed down-under, author of DxForth: Ed's Forth Topics (minimaltype.com) ...was showing some code on comp.lang.forth a while back (last year maybe?) Anyway he had this unique way to do jumps in his assembler. I never saw this before. I figured out how to make in work in Camel99. If you prefer more conventional labels instead of Forth's structured jumps and loops, it is not a bad way to go. I added some useful error checking and changed it up to use a small stack to hold the forward references. The code looks like this meaningless little snippet: CODE TEST ( n -- n') 1 $: R4 DEC, R4 400 CI, 2 $ JEQ, R1 DEC, R2 INCT, R4 R4 MOV, 1 $ JNE, 2 $: NEXT, ENDCODE THIS CODE HAS A FLAW. It does not deal with multiple forward jumps to the same label. I need a loop to process the label stack items and resolve all the entries. Spoiler \ Numbered assembler labels: Idea from DxForth Ported to Camel99 Mar 2021 Fox \ NEEDS DUMP FROM DSK1.TOOLS NEEDS MOV, FROM DSK1.ASM9900 NEEDS PUSH FROM DSK1.STACKS MARKER REMOVE DECIMAL CREATE REFERS 15 CELLS ALLOT : CLRJMPS ( -- ) REFERS 15 CELLS 0 FILL ; : ]REF ( n -- addr) CELLS REFERS + ; \ reference array HEX 10 LIFO: ASMLS \ assembler label stack holds location of jump instructions : >LS ( addr -- ) ASMLS PUSH ; : LS> ( addr -- ) ASMLS POP ; : FWD? ( -- n) ASMLS STACK-DEPTH ; \ Defined in ASM9900: \ Compute offset, compile into 'jmp \ : RESOLVE ( 'jmp offset --) 2- 2/ SWAP 1+ C! ; : $: ( n -- ) DUP ]REF @ ABORT" Duplicate label" FWD? \ is there a forward branch on the label stack? IF DROP ( don't need the label# now) LS> HERE OVER - RESOLVE \ resolve fwd jump ELSE HERE SWAP ]REF ! \ record address for backward jump THEN ; : $ ( n -- offset ) \ offset is compiled by the Jump instruction ]REF @ DUP 0= IF ( un-resolved fwd jump) ( -- 0) \ zero offset compiled by jump instruction HERE >LS \ record where the jump instruction is ELSE ( ]REF<>0. Therefore it's a backward jump ) ( -- addr) HERE - 2- 2/ ( -- offset) THEN ; : CODE ( -- ) CODE CLRJMPS ; \ re-define code to clr jump table : ENDCODE ( -- ) ?CSP FWD? ABORT" Un-resolved forward jump" ; Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 16, 2021 Author Share Posted March 16, 2021 I went looking for a 16 bit program that could generate PI digits. I found one on the DxForth site. I made some changes to allow it to compile on Camel99 Forth, mostly library includes. I also removed things that were for creating a binary program and added a fast MOVE routine to replace CMOVE which really helps the old girl "move" faster. So it runs but it does not generate PI. I will have to try it on another DOS Forth to see if it's the program or an ENDIAN problem or maybe my D+ routine? I posted the code for Lee because it has some unusual Forth usages. I am thinking Dr. Stewart will not approve of the BEGIN WHILE UNTIL THEN loop! But the dynamic allocation of three arrays in the startup word is pretty cool. Spoiler \ PI.FTH \ \ Revised 2015-02-09 es \ \ Compute Pi to an arbitrary precision. Uses Machin's \ formula: pi/4 = 4 arctan(1/5) - arctan(1/239) \ \ Compile with 16-bit DX-Forth: FORTH - INCLUDE PI.F BYE \ Compile with CAMEL99 Forth: INCLUDE DSK*.PI ( where * is your drive no.) \ \ This 16-bit implementation allows up to 45,808 digits \ to be computed before arithmetic overflow occurs. \ \ The code can be used on 32-bit targets with appropriate \ changes: \ \ 16-bit 32-bit \ \ 10000 Multiply 100000000 Multiply \ <# # # # # #> <# # # # # # # # # #> \ 4 +loop 8 +loop \ 525 um/mod 1050 um/mod \ remove 'digits > 45808' warning \ \ Acknowledgements: \ \ Roy Williams, Feb 1994 \ J. W. Stumpel, May 1991 \ E. Ford, Aug 2009 \ R. Bishop, Aug 1978 \ \ This code is PUBLIC DOMAIN. Use at your own risk. \ Modified for Camel99 Forth Mar 2021 Fox NEEDS DUMP FROM DSK1.TOOLS NEEDS VALUE FROM DSK1.VALUES NEEDS D+ FROM DSK1.DOUBLE NEEDS .R FROM DSK1.UDOTR NEEDS ELAPSE FROM DSK1.ELAPSE NEEDS MALLOC FROM DSK1.MALLOC \ proper MOVE for 9900, cell wide HEX CODE MOVE C036 , C076 , C104 , 1306 , 0584 , 0244 , FFFE , CC31 , 0644 , 15FD , C136 , NEXT, ENDCODE DECIMAL 0 VALUE POWER ( adr) 0 VALUE TERM ( adr) 0 VALUE RESULT ( adr) 0 VALUE SIZE ( n) VARIABLE CARRY : ADD ( -- ) 0 CARRY ! RESULT 0 SIZE 1- DO I CELLS OVER + ( res) DUP @ 0 I CELLS TERM + @ 0 D+ CARRY @ M+ ( hi) CARRY ! ( lo) SWAP ( res) ! -1 +LOOP DROP ; : SUBTRACT ( -- ) 0 CARRY ! RESULT 0 SIZE 1- DO I CELLS OVER + ( RES) DUP @ 0 I CELLS TERM + @ 0 D- CARRY @ M+ ( HI) CARRY ! ( LO) SWAP ( RES) ! -1 +LOOP DROP ; 0 VALUE FACTOR \ scan forward for cell containing non-zero : +INDEX ( ADR -- ADR INDEX ) -1 BEGIN 1+ DUP SIZE - WHILE 2DUP CELLS + @ UNTIL THEN ; : DIVIDE ( ADR FACTOR -- ) TO FACTOR 0 CARRY ! +INDEX ( adr index ) SIZE SWAP ?DO I CELLS OVER + ( res) DUP @ CARRY @ FACTOR UM/MOD ( quot) ROT ( res) ! ( rem) CARRY ! LOOP DROP ; \ scan backward for cell containing non-zero : -INDEX ( adr -- adr index ) SIZE BEGIN 1- DUP WHILE 2DUP CELLS + @ UNTIL THEN ; : MULTIPLY ( adr factor -- ) TO FACTOR 0 CARRY ! -INDEX ( adr index ) 0 SWAP DO I CELLS OVER + ( res) DUP @ FACTOR UM* CARRY @ M+ ( hi) CARRY ! ( lo) SWAP ( res) ! -1 +LOOP DROP ; : COPY ( -- ) POWER TERM SIZE CELLS MOVE ; \ changed CMOVE to MOVE \ : ZERO? ( result -- f ) +INDEX NIP SIZE = ; : ZERO? ( result -- F ) SIZE CELLS 0 SKIP NIP 0= ; 0 VALUE PASS VARIABLE EXP VARIABLE SIGN : DIVISOR ( -- N ) PASS 1 = IF 5 ELSE 239 THEN ; : ERASE 0 FILL ; : INITIALIZE ( -- ) POWER SIZE CELLS ERASE TERM SIZE CELLS ERASE PASS 1 = IF RESULT SIZE CELLS ERASE THEN 16 PASS DUP * / POWER ! POWER DIVISOR DIVIDE 1 EXP ! PASS 1- SIGN ! ; 0 VALUE NDIGIT : CalcPi ( -- ) NDIGIT 45800 U> IF ." Warning: digits > 45808 will be in error " CR THEN 2 1+ 1 DO I TO PASS INITIALIZE BEGIN COPY TERM EXP @ DIVIDE SIGN @ DUP IF SUBTRACT ELSE ADD THEN 0= SIGN ! 2 EXP +! POWER DIVISOR DUP * DIVIDE POWER ZERO? UNTIL LOOP ; \ VARIABLE OUT \ : CR CR 0 OUT ! ; \ : # # 1 OUT +! ; DECIMAL : PRINT ( -- ) CR RESULT DUP @ 0 .R [CHAR] . EMIT SPACE NDIGIT 0 ?DO 0 OVER ! DUP 10000 MULTIPLY DUP @ 0 <# # # # # #> TYPE SPACE \ OUT @ C/L @ > IF CR THEN \ not needed for Camel99 4 +LOOP DROP CR ; : GetNumber ( -- n ) CR ." How many digits do you want? " PAD DUP 20 ACCEPT NUMBER? ABORT" Invalid" CR ; : PI ( n -- ) ( GetNumber ) DUP TO NDIGIT \ array size = ceil(ndigit / log10(2^16)) 109 UM* 525 UM/MOD SWAP ( rem) IF 1+ THEN ( extra for accurate last digits) 2 + TO SIZE \ create arrays in un-allocated memory HERE TO POWER SIZE CELLS ALLOT HERE TO TERM SIZE CELLS ALLOT HERE TO RESULT SIZE CELLS ALLOT 50 ALLOT ( hold buffer space) CalcPi \ PRINT ; \ end 2 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 16, 2021 Share Posted March 16, 2021 2 hours ago, TheBF said: I posted the code for Lee because it has some unusual Forth usages. I am thinking Dr. Stewart will not approve of the BEGIN WHILE UNTIL THEN loop! It makes Dr. Stewart’s head hurt. ? I cannot do that in fbForth without redefining words. Now, of course, I must sift through that code. ? ...lee 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 16, 2021 Author Share Posted March 16, 2021 LOL. I could never do that in the past either. Still makes my head spin. This is this new way to write loop words with no protections that seems to be the norm now with the standards team. It's interesting never the less. I don't know why this math doesn't work on my system. This is the second program I have found. The other was Mandelbrot. It compiles and runs but the results are not correct. I am wondering if it is the ENDIAN difference with 9900. And of course that ship has sailed and nobody else does it like TI did these days so nobody thinks about it when they write programs. 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 17, 2021 Author Share Posted March 17, 2021 I have been reading on comp.lang.forth that many Forth systems use a fully hashed lookup for their dictionary. It got me thinking about a way to map hashing onto Camel99 Forth. This was how it was done on HsForth. The default system had a simple linked list. HASHING was a Forth source file that you loaded and then you had a hashed dictionary system. Compiling speeds were almost 10X faster on my old IBM PC clone! It was amazing for the time. I have FIND as vectored word, through a variable, so that parts is fixable. I would have to re-write all dictionary creating words but that's short list for my system. HEADER, ( addr len -- ) is at the root for dictionary entry creation and then I would just need to re-define CREATE CONSTANT VARIABLE and USER to use the new HEADER, word. HEADER, is this below. It got a little more complicated with "wordlist" support in the latest versions. : HEADER, ( addr len --) ALIGN CURRENT @ DUP>R \ Fetch CURRENT wordlist id, make a copy for later \ === compile the header fields === @ , \ fetch the NFA & compile in the LFA field 0 C, \ compile the precedence byte (immediate flag) HERE DUP LATEST ! \ HERE is now a new NFA, store NFA in LATEST R> ! \ & also in the current 'Wordlist ID' we saved earlier S, ; \ compile the (addr len) string as the name. HEADER would need to be enhanced to also do: hash the name giving hash# compute the CFA from the NFA that it just created put that new CFA in the hashtable[hash#] Find would be an amazing improvement. hash the name use that hash# as an index into the hashtable Fetch hashtable[index] and voila! That's the CFA of the word. If the hash function is simple and written as CODE, FIND will be much much faster. Hash Function of Today In that past I always used a simple hash function that just summed up the ascii values in a string and modulo divided by the number of locations in the hash-table where that number was always prime. That's closed hashing. My old projects were limited and hashing collisions were handled by just looking at the next hash-table cell. So it degenerated to a linear search when the hash-table was very full. I went looking for what's that state of the art in the 21st Century and found this site: FNV Hash (isthe.com) This is really good and information I found on stackoverflow indicate the FNV1A method really is very close to optimal. The interesting thing is that nobody cares about 16 bit values. They assume 32 bit integers. I could used mixed math but I wanted to test what happens with 16 bit computation. I got some reasonable results using different values for FNV_prime. The best one so far is 17, at least for single bytes. My value for offset_basis was chosen by using the same ratio of the 32 bit number from the web page offset_basis:2^32 -> offset_basis:2^16 and then I picked a close prime number. What the hell? I am in un-charted waters. Here is my initial version of FNV1A in Forth: \ FNV1A adapted for 16 bits Mar 15 2021 Fox \ hash = offset_basis \ for each octet_of_data to be hashed \ hash = hash xor octet_of_data \ hash = hash * FNV_prime \ next octet_of_data \ return hash NEEDS DUMP FROM DSK1.TOOLS NEEDS ?BREAK FROM DSK1.BREAK NEEDS VALUE FROM DSK1.VALUES MARKER REMOVE \ FNV-1 Has no values for 16 bits hashes so I made these up based on \ discussions at: http://www.isthe.com/chongo/tech/comp/fnv/ DECIMAL 2707 CONSTANT FNV_prime \ 2^11 + 2^8 + 74 = 2707 47351 CONSTANT offset_basis \ prime# in the upper range of 2^16 \ 33767 prime \ single character hasher : FNV1A ( c -- hash) offset_basis XOR FNV_prime * ; \ string hasher : $FNV1A ( addr len -- hash) offset_basis -ROT BOUNDS DO I C@ XOR FNV_prime * LOOP ; Right now I am working a test framework to hash the Forth dictionary into a table in low ram and see how it works. I will use an open hash method where we start in table of 512 cells. If there is a hashing collision, I will add 512 to base address and use that address. So in 4K I will have room for 7 collisions at any hash# We shall see if it actually works. 4 Quote Link to comment Share on other sites More sharing options...
GDMike Posted March 17, 2021 Share Posted March 17, 2021 I was thinking about some of this today, wondering.... how the dictionary stored WORDS, values for those and tabling the addresses. I, like I say, just thought about it, I was ok with the basics. But until you see it, you really don't know. And I've not done that. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 17, 2021 Author Share Posted March 17, 2021 3 minutes ago, GDMike said: I was thinking about some of this today, wondering.... how the dictionary stored WORDS, values for those and tabling the addresses. I, like I say, just thought about it, I was ok with the basics. But until you see it, you really don't know. And I've not done that. It seems a little harder than it really is in common Forth systems. If you think about just the Forth words, so one vocabulary, it is a simple linked list. I will do Camel Forth's dictionary. There are LOTS of variations since so many people make their own Forth systems. The header data structure is: cell: LFA \ link field to previous word byte: 0 \ immdiate field byte: len \ word name lenght in bytes text: WORD \ text field The linked list looks like the text picture below: (crude pic sorry) The search starts at the bottom with a variable somewhere, Camel's is called LATEST in the plain version of the system. Fetch LATEST and you get the address of a name field, for a word. That address is the byte count of the text. Go back 3 bytes from the count byte and you find the next link field, fetch that link field and you get the next count byte and so on... When the link=0 you are at the end. last entry in the dictionary <0><00> ^ ^---<link><04>EXIT ^ ^---<link><04>DOCOL<--- ^ >-------more of this---- ^ ^ ^<link> <05>FORTH ^ ^ VARIABLE LATEST ^ \ Lastest holds the name field byte of the last word created 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 17, 2021 Share Posted March 17, 2021 52 minutes ago, TheBF said: I have been reading on comp.lang.forth that many Forth systems use a fully hashed lookup for their dictionary. It got me thinking about a way to map hashing onto Camel99 Forth. This was how it was done on HsForth. The default system had a simple linked list. HASHING was a Forth source file that you loaded and then you had a hashed dictionary system. Compiling speeds were almost 10X faster on my old IBM PC clone! It was amazing for the time. . . . Right now I am working a test framework to hash the Forth dictionary into a table in low ram and see how it works. I will use an open hash method where we start in table of 512 cells. If there is a hashing collision, I will add 512 to base address and use that address. So in 4K I will have room for 7 collisions at any hash#. We shall see if it actually works. Just stop it! This could seriously lead to fbForth 3.0. ...lee 2 2 Quote Link to comment Share on other sites More sharing options...
GDMike Posted March 17, 2021 Share Posted March 17, 2021 (edited) On 3/13/2021 at 3:59 PM, TheBF said: New VDP Driver Using this version that ran on Weiand Forth I reported at 2:29 using Camel99 V2.51 and stop-watch timing. Hide contents \ fig-forth to ANS Forth translation harness : VARIABLE CREATE , ; : CREATE2 CREATE ; : CLS PAGE ; : ENDIF POSTPONE THEN ; IMMEDIATE DECIMAL \ ===[ ORIGINAL FIG-FORTH CODE ]=== -1 CONSTANT TRUE 0 CONSTANT FALSE 180 CONSTANT SIZE 0 VARIABLE V1 0 VARIABLE V2 0 VARIABLE POWER CREATE2 A1 SIZE CELLS ALLOT 0 VARIABLE LENGTH : A1*7->A1 ( -- ) 0 V1 ! 0 V2 ! SIZE 0 DO A1 V1 @ CELLS + @ DUP 0= V1 @ LENGTH @ > AND V2 @ 0= AND IF DROP LEAVE ELSE 7 * V2 @ + DUP 10 MOD A1 V1 @ CELLS + ! 10 / V2 ! V1 @ 1+ V1 ! ENDIF LOOP V1 @ LENGTH ! ; : TYPE-A1 ( -- ) 0 V1 ! FALSE V2 ! -1 LENGTH @ 1- DO A1 I CELLS + @ 48 + DUP 48 = 0= IF TRUE V2 ! ENDIF V2 @ IF PAD V1 @ 1+ + C! V1 @ 1+ V1 ! ELSE DROP ENDIF -1 +LOOP V1 @ PAD C! CR PAD COUNT TYPE CR ; : TEST-A1 ( -- f ) 0 V1 ! FALSE V2 ! LENGTH @ 0 DO A1 I CELLS + @ 7 = 0= IF 0 V1 ! ELSE V1 @ 1+ V1 ! ENDIF V1 @ 5 > IF TRUE V2 ! ENDIF LOOP V2 @ ; : 7'S-PROBLEM CLS A1 SIZE CELLS 0 FILL 7 A1 ! 2 POWER ! BEGIN A1*7->A1 CR ." 7 ^ SEVEN TO THE POWER OF " POWER @ . ." IS" POWER @ 1+ POWER ! TYPE-A1 TEST-A1 UNTIL ; On Version 2.67 which I am beating up now, the original code runs in 2:07. Still not as fast as the compiled BASIC or TF that's a 21% improvement. I know that 7 seconds of my time is from scrolling. For Reference Compiled BASIC 1:38 TurboForth 1:49 Weiand Forth 2:53 Something to note is the Lee's version written in Forth coding style, using the data stack rather than literally translating from BASIC, runs in one minute on Camel99 v2.67 I was wanting to try this code on Forth+, but without a screen to store, I was left to type into the command line, but after reaching the 79th char position, I got the big forth finger.. Then I found CELLS doesn't exist. But =CELLS does. Edited March 17, 2021 by GDMike Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 17, 2021 Author Share Posted March 17, 2021 CELLS on a 16 bit computer is just 2*. 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted March 17, 2021 Share Posted March 17, 2021 (edited) And is 2* just 2 * because that 2* is not defined either. Ahhh. I got it Edited March 17, 2021 by GDMike Quote Link to comment Share on other sites More sharing options...
GDMike Posted March 17, 2021 Share Posted March 17, 2021 (edited) I've asked bill how can I get to a screen file to where I can edit, write code. Rt now I'm able to get the program,FORTH+ to boot, but nothing else, except write in the command line. But I'm trying this program again...lol It's not letting me, it can't create CREATE2, and other words, I'll have to wait until I can edit a workspace. This program won't let me get or create a screen/block file because I don't know the commands yet. Edited March 17, 2021 by GDMike 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 17, 2021 Author Share Posted March 17, 2021 Update on FNV1A Hash I think it's a bust. I built a test program to hash the dictionary. It takes about 3.5 seconds for 400 words with no CODE speedups. But I am getting 10% collisions in a 2K location hash-table trying all difference prime numbers!!! Yuk. I found something at York U. up here in Toronto. Wish I saw it earlier. www.cse.yorku.ca/~oz/hash.html /* djb2 this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained. */ unsigned long hash(unsigned char *str) { unsigned long hash = 5381; int c; while (c = *str++) hash = ((hash << 5) + hash) + c; /* hash * 33 + c */ return hash; } I think the comment is wrong. hash << 5 is * 32 not 33. Kids these days! Old guys these days. ((hash<<5)+hash) is times 33 I reckon. 2 Quote Link to comment Share on other sites More sharing options...
GDMike Posted March 17, 2021 Share Posted March 17, 2021 Kids' parents..lol 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 17, 2021 Author Share Posted March 17, 2021 DJB2 in Forth DECIMAL : DJB2 ( addr len -- hash) 5381 -ROT \ initial long hash seed goes under string BOUNDS DO 33 * I C@ + LOOP ; : DJB2A ( addr len -- hash) 5381 -ROT \ initial long hash seed goes under string BOUNDS DO 33 * I C@ XOR LOOP ; (Multiplication was used for clarity above. ASM Code will use shift and add) Where I was getting 39..40 collisions with FNV1A 2K cell has table (4Kbytes) DJB2 is giving me 47 in 2K cell hash table. DJB2A which uses XOR instead of + gives me 44. Looks like I need a good collision handler no matter which one I use. The DJB2A is much easier to code up in Assembler so that's that one I will use. 2 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 18, 2021 Share Posted March 18, 2021 On 3/17/2021 at 4:28 PM, TheBF said: I think the comment is wrong. hash << 5 is * 32 not 33. Kids these days! Old guys these days. ((hash<<5)+hash) is times 33 I reckon. I am sure it is purely coincidence, but 33 ( ! ) does happen to be the lowest ASCII value possible for characters (with normal keystrokes, anyway) that can be part of Forth names. At least, that is the case since figForth, which has <null> (ASCII 0) as a word! Were I to go down this road, I might take the opportunity to eliminate the word, <null>, and use another method for exiting the outer interpreter. ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 19, 2021 Author Share Posted March 19, 2021 From what I see I think you are correct in the coincidence. I have built up a test jig to try different things hashing a real dictionary and I cannot predict anything except maybe that prime numbers give best results for MOD wrapping the size of the hash values. I want to keep the has table within 4K but you still get collisions so I am trying some different ways to do that. I will post something shortly so you can see what I am talking about. It's a lot of code and memory for our little 99. But it will be fun to see how it performs. In preliminary tests finding EXIT at the end my dictionary it's about 5X faster. Collisions will be less so but still the search depth is never more that 4 items from what I can see. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 19, 2021 Author Share Posted March 19, 2021 Here is what I have so far in the Hashed dictionary testing arena. Hash the dictionary use Bernstein DJB2A string hashing method. The hashing word uses 2039 prime number to MOD the output hash value to limit the number of buckets in the hash table to stay in a table of 4K bytes. Test jig counts the words and reports some info when completed Collisions are handled by just testing the next hashtable location. If it's empty use it. If not try the next one. The test jig prints the word and the bucket address IF there is a collision but not otherwise. This test jig version records the highest address used in the hashtable. That's less needed now that I understand how it behaves. H' can find all words that are not a in a collision bucket. I need to make a looping word that does that. Spoiler \ DJB2 Bernstein hashing method for Camel99 Forth Mar 17 2021 Fox \ djb2 \ unsigned long \ hash(unsigned char *str) \ { \ unsigned long hash = 5381; \ int c; \ while (c = *str++) \ hash = ((hash << 5) + hash) + c; /* hash * 33 + c */ \ return hash; \ } NEEDS DUMP FROM DSK1.TOOLS NEEDS ELAPSE FROM DSK1.ELAPSE NEEDS VALUE FROM DSK1.VALUES NEEDS COMPARE FROM DSK1.COMPARE MARKER REMOVE HEX 800 CONSTANT 2K 1000 CONSTANT 4K 2000 CONSTANT BUCKETS \ hash table base address DECIMAL 2031 CONSTANT PRIMEDIV : DJB2A ( addr len -- hash) 5381 -ROT \ initial long hash seed goes under string BOUNDS DO 33 * I C@ XOR LOOP PRIMEDIV MOD \ "lazy" mod limits hash table size ; HEX : CLRTABLE ( -- ) BUCKETS 2000 0 FILL 3000 H ! ; \ MOVE : ]BUCKET ( -- ) CELLS BUCKETS + ; \ hash table as array \ ================================== VARIABLE WORDCNT VARIABLE COLLISION \ analysis code records highest address used in the table. VARIABLE MAXADDR : KEEPMAX MAXADDR @ MAX MAXADDR ! ; : ]BUCKET! ( cfa ndx -- ) ( DUP . ) ]BUCKET DUP KEEPMAX ! ; : NAME$ ( nfa -- addr len ) COUNT 1F AND ; : HASHNAME ( addr len -- ndx) 2DUP CR TYPE \ debug DJB2A BEGIN DUP ]BUCKET @ WHILE ( bucket<>0) \ **we have a collision** COLLISION 1+! \ debug count the collisions 1+ \ try next bucket DUP ." :" . \ debug REPEAT ; : HASHDICT ( -- ) CLRTABLE WORDCNT OFF COLLISION OFF MAXADDR OFF CONTEXT @ @ ( nfa) BEGIN ( nfa) DUP DUP NAME$ HASHNAME ( nfa Hash# ) ]BUCKET! KEY? IF KEY DROP THEN \ debug, stop scroll NFA>LFA @ DUP \ fetch next nfa WHILE WORDCNT 1+! \ nice to know REPEAT DROP DECIMAL CR WORDCNT @ . SPACE ." words hashed" CR COLLISION @ . ." collisions" CR ." Max Address = " HEX MAXADDR @ . CR ." ---" ; : H' ( -- nfa) BL PARSE-WORD 2DUP DJB2A ]BUCKET @ ( addr len nfa) DUP>R NAME$ COMPARE ABORT" Not found" R> NFA>CFA ; HASHED DICTIONARY DEMO.mp4 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.