Jump to content
IGNORED

Camel99 Forth Information goes here


TheBF

Recommended Posts

OK. That means that your code does run with interrupt service enabled, or the timing wouldn't work. When my program is running interrupts are disabled, so it saves some fraction there. My super-duper 99 also includes a real time clock, based on a clock chip from National, with a resolution of one millisecond. I check the time before and after launching the sort procedure, and get the time from that.

 

I've written a library unit called realtime, which allows access to my clock. The clock is my own design, so I had to make the drivers for it, of course. When the program includes uses realtime, a timer data type becomes available. The unit supports dynamic creation and disposal of timers. When a timer has been created, it can be reset, started and stopped. And read, of course.

 

Here is a syntactically incorrect example, which just shows the principle of how I use them. I could stop the timer before reading it, but it's not mandatory. Any number of timers can be created and run simultaneously.

 

uses realtime;

var timer: tmrtype;
  elapsed: timetype;

begin
  new(timer);
  tmrreset(timer);
  tmrstart(timer);
  quicksort(n,array);
  tmrread(timer,elapsed);
  with elapsed do
    write(hour); write(minute); write(second); writeln(fraction);
  end;
  dispose(timer);
end.

 

  • Like 3
Link to comment
Share on other sites

6 hours ago, apersson850 said:

OK. That means that your code does run with interrupt service enabled, or the timing wouldn't work. When my program is running interrupts are disabled, so it saves some fraction there. My super-duper 99 also includes a real time clock, based on a clock chip from National, with a resolution of one millisecond. I check the time before and after launching the sort procedure, and get the time from that.

 

I've written a library unit called realtime, which allows access to my clock. The clock is my own design, so I had to make the drivers for it, of course. When the program includes uses realtime, a timer data type becomes available. The unit supports dynamic creation and disposal of timers. When a timer has been created, it can be reset, started and stopped. And read, of course.

 

Here is a syntactically incorrect example, which just shows the principle of how I use them. I could stop the timer before reading it, but it's not mandatory. Any number of timers can be created and run simultaneously.

 


uses realtime;

var timer: tmrtype;
  elapsed: timetype;

begin
  new(timer);
  tmrreset(timer);
  tmrstart(timer);
  quicksort(n,array);
  tmrread(timer,elapsed);
  with elapsed do
    write(hour); write(minute); write(second); writeln(fraction);
  end;
  dispose(timer);
end.

 

Very nice. That's the way to do it. Hardware!  :)

 

I have the 9901 timer running continuously in Camel99 Forth and I use it primarily for timing the "MS" word in Forth which waits for milli-seconds.

( I limit resolution to 10mS because I poll it in Forth (slow) and it also yields to any other tasks while it's waiting)

 

( the code below is in my Cross-compiler Forth. That's the reason for the special incantations.

[CC] "cross-compiling" lets me use the host Forth interpreter for immediate commands

[TC]  "target-compiling" All definitions goe into the target image )

 

\ TICKTOCK.HSF  TMS9901 hardware timer interface for Camel 99 Forth

\ credit to: http://www.unige.ch/medecine/nouspikel/ti99/tms9901.htm#Timer
\ impovements based on code from Tursi Atariage
\ TMR! now loads from the Forth stack
\ Jan 31, 2021 simplified JIFFS
\ Dec 2021  removed JIFFS , replaced with TICKS which gives ~10mS resolution

\ timer resolution:  64 clock periods, thus 64*333 = 21.3 microseconds
\ Max Duration    :  ($3FFF) 16383 *64*333 ns = 349.2 milliseconds
CROSS-ASSEMBLING
[CC] DECIMAL [TC]
CODE TMR!   ( n -- )         \ load TMS9901 timer from stack
             0 LIMI,
             R12 CLR,        \ CRU addr of TMS9901 = 0
             0   SBO,        \ SET bit 0 to 1, Enter timer mode
             R12 INCT,       \ CRU Address of bit 1 = 2 , I'm not kidding
             TOS 14 LDCR,    \ Load 14 BITs from TOS into timer
            -1  SBZ,         \ reset bit 0, Exits clock mode, starts decrementer
             2 LIMI,
             TOS POP,
             NEXT,
             ENDCODE

CODE TMR@   ( -- n)         \ read the TMS9901 timer
             0 LIMI,
             TOS PUSH,
             R12 2 LI,      \ cru = 1 (honest, 2=1)
            -1 SBO,         \ SET bit 0 TO 1, Enter timer mode
             TOS 14 STCR,   \ READ TIMER (14 bits)
            -1 SBZ,         \ RESET bit 1, exit timer mode
             2 LIMI,
             NEXT,
             ENDCODE


[CC] DECIMAL [TC]
: TICKS  ( n -- )   \ n must be less than 4000. 4000 TICKS ~= 100 mS
            TMR@ >R
            BEGIN
              PAUSE
              TMR@ R@ - ABS  OVER >
            UNTIL
            R> 2DROP
;

: MS   ( n -- ) 10 /  0 ?DO  420 TICKS LOOP ;

[CC] HEX [TC]

 

I also use it for testing Assembly language code routines. Of it course it rolls over every 349 mS or so. :( 

I explored some code last year that reads the 9901 on the VDP interrupt and adds the difference from last reading to a 32bit int. I didn't go back to it to really test it.

I should probably look into making that work as it would be a "reasonably" accurate fined grained timer.

 

For a time I had a an 8K RAM chip in my SuperCart that had battery backed-up clock built in.  The batteries were dead or dying and it was unreliable so I took it out.

In Classic99 I could read the CLOCK file but that would be best for very long duration timing.

 

Your solution is the best of the lot but iI suppose part of the "fun" is trying make it work with the 1978 hardware.  

(sometimes its not fun) :) 

 

 

  • Like 3
Link to comment
Share on other sites

I revisited this 32 bit timer idea with fresh eyes and I think I have something that works well.

 

The idea is the 9901 can only time 349 mS.

The interrupt happens every 16mS but not reliably.

So...

 

Let's read the timer every interrupt and add it's value to a 32 bit accumulator variable.

It's not likely the interrupts will be off for 300 milli-seconds at least in my system.

That let's us keep a very big number as a timer. :) 

It's spinning away here on the screen.

 

NEEDS MOV,    FROM DSK1.ASM9900
NEEDS ISR'    FROM DSK1.ISRSUPPORT

CREATE T32  0  ,  0  ,  \ 32 bit timer accumulator

CODE TIMERISR
\ read the timer, which runs continuously in Camel99 Forth
         0 LIMI,
         R12 2 LI,
        -1 SBO,
         R0 14 STCR,
        -1 SBZ,
         2 LIMI,

\ add timer value to 32 bit accumulator
         R0  T32 2+ @@ ADD,
         OC IF,
            T32 @@ INC,
         ENDIF,
         RT,
ENDCODE

ISR' TIMERISR INSTALL

 

  • Like 4
Link to comment
Share on other sites

As a stock feature, the p-system has a 32-bit timer that's running all the time, driven by the VDP interrupt. It does of course stop each time you access a floppy disk or similar. It can be ccessed by the intrinsic Pascal procedure time.

 

My real-time clock has nothing to do with that timer, though. It's implemented with a National Semiconductor MM58167A RTC chip on a card in the PEB. It's visible in some of the pictures in this album.

 

 

  • Like 3
Link to comment
Share on other sites

17 hours ago, apersson850 said:

As a stock feature, the p-system has a 32-bit timer that's running all the time, driven by the VDP interrupt. It does of course stop each time you access a floppy disk or similar. It can be ccessed by the intrinsic Pascal procedure time.

 

My real-time clock has nothing to do with that timer, though. It's implemented with a National Semiconductor MM58167A RTC chip on a card in the PEB. It's visible in some of the pictures in this album.

 

 

That's is a great looking TI-99.  Thanks for sharing.

I guess if I really wanted a real time clock I should make a board for my PEB.

One day maybe. Still lots of S/W ideas to explore.

 

 

  • Like 2
Link to comment
Share on other sites

I am always amazed how a small change can make a difference.

So I was testing my latest kernel build and I found the Benchie benchmark.

 

It assigns a VALUE in the middle of the loop.

I was getting a timing of 26.25 seconds  whereas TurboForth could rip this off in 24.5 seconds. 

5 CONSTANT FIVE
0 VALUE BVAR
HEX
100 CONSTANT MASK

: BENCHIE
         MASK 0
         DO
            1
            BEGIN
              DUP SWAP DUP ROT DROP 1 AND
              IF FIVE +
              ELSE 1-
              THEN TO BVAR
              BVAR DUP MASK AND
            UNTIL
            DROP
         LOOP ; 

I wondered if assigning that VALUE with TO was the problem.  My TO  code just used LITERAL and !. Just bog standard Forth.

Seemed like a long shot to me but what the heck.

 

So I did this and created a literal operator that did the store and removes 2 instructions to push the stack down by using R1 for the address.

CODE LIT!  ( n addr -- ) \ combine function of LIT and !
           *IP+  R1 MOV,
            TOS  R1 ** MOV,
                 TOS POP,
           NEXT,
           ENDCODE
 .( .)
: VALUE   CONSTANT ;

: TO  ( n -- )
           ' >BODY   \ compute PFA at compile time
           STATE @
           IF  POSTPONE LIT!  ,  EXIT
           THEN  ! ; IMMEDIATE

And... just like that it's 24.5 seconds. 

 

  • Like 4
Link to comment
Share on other sites

Nice. I think TF does something similar? From memory, TF puts the address to write the value stack in line with a word called doTO. 

 

10 TO FRED

 

Might nievely compile to:

 

LIT 10 LIT FRED !

 

TF compiles it as LIT 10 doTo FRED

 

One less run through the inner interpreter. 

  • Like 2
Link to comment
Share on other sites

4 hours ago, Willsy said:

Nice. I think TF does something similar? From memory, TF puts the address to write the value stack in line with a word called doTO. 

 

10 TO FRED

 

Might nievely compile to:

 

LIT 10 LIT FRED !

 

TF compiles it as LIT 10 doTo FRED

 

One less run through the inner interpreter. 

Sounds the same yes. In my case Store is slower when you cache TOS in a register so I really win using the extra register instead of the stack.

But in the end it really is about removing cycles through the inner interpreter if you want to go faster.

I have broken my DTC version somewhere in the branching mechanism.  I don't think DTC is very practical for TI-99 because of size, but it it is certainly fun to watch it run with a two instruction NEXT.

 

*IP+ W MOV,

       *W B,

 

And the way I did it R11 becomes the W register which is pretty neat.

 

I started reviewing something I call ASM Forth that I started but didn't finish. When I can see that it works well enough, I will put it up on Github. We might have to start another topic for that one. :)

 

  • Like 4
Link to comment
Share on other sites

20 minutes ago, TheBF said:

... fun to watch it run with a two instruction NEXT.

 

*IP+ W MOV,

       *W B,

 

And the way I did it R11 becomes the W register which is pretty neat.

Yes, since the TMS 9900 doesn't have autoincrement deferred, like Digital's VAX, that's the best you can do.

For those unfamiliar with the architecture, the VAX 11 from Digital was a 32 bit architecture, with an ortoghonal instruction set and plenty of addressing modes. Where the TI in its general addressing mode has a two bit addressing mode specifier, the VAX 11 had four bits. Thus a large number of addressing modes was possible. Autoincrement implied indirect, but you could also do autoincrement deferred, which meant that the register pointed not to the data, but to the address of the data. Double indirection, in other words.

  • Like 5
Link to comment
Share on other sites

1 hour ago, apersson850 said:

Yes, since the TMS 9900 doesn't have autoincrement deferred, like Digital's VAX, that's the best you can do.

For those unfamiliar with the architecture, the VAX 11 from Digital was a 32 bit architecture, with an ortoghonal instruction set and plenty of addressing modes. Where the TI in its general addressing mode has a two bit addressing mode specifier, the VAX 11 had four bits. Thus a large number of addressing modes was possible. Autoincrement implied indirect, but you could also do autoincrement deferred, which meant that the register pointed not to the data, but to the address of the data. Double indirection, in other words.

I have never used it but I understand that the 6809 could do DTC NEXT in one instruction as well. 

 

There is lot of stuff I don't understand but when I looked at the RISC 5 instruction set it seems extremely verbose to do simple things.

I get the RISC idea but the choices made for what the instructions should be seem odd to me.

 

And... nobody seems to care about sub-routine calling overhead in these new designs. I suppose because memory is so cheap. Just put code inline.

Chuck Moore built machines that did sub-routine calls in 1 cycle and return was just bit 15 that could be set on your instruction so it was free. 

  • Like 2
Link to comment
Share on other sites

I have been looking at SCAN ( addr len char )   a non-standard word but one that is in GForth and many other systems.

It is a very handy primitive for finding a character in a string.

 

I wondered how to do it in Forth and it really responds well to the dual WHILE structure because there are two loop ending conditions.

It could be done with an AND as well I think but might need a more stack shuffles. 

: SCAN (  adr len char -- adr' len')
        >R     \ remember char
        BEGIN
          DUP
        WHILE ( len<>0)
          OVER C@ R@ <>
        WHILE ( R@<>char)
          1 /STRING            \ advance to next char address
        REPEAT
        THEN
        R> DROP  \ drop char   \ 32 bytes
;

This made me realize I could do the dual WHILE trick in Forth Assembler now that I have the ANSI style loops as part of ASM9900.

 

I am again impressed with the 9900 instruction set.  The point of making an interpreter is normally to save some space but the 9900 does the same function in 10 bytes less.

So smaller and faster.  This is partly because of using registers rather than stack operations for the variables and also because the jumps in Assembler are short jumps and so take only 2 bytes per jump.  

This make me wonder if I could use short jumps for Forth?  Hmmm...

CODE SCAN  ( adr len char -- adr' len' )    \ find matching char
        TOS SWPB,           \ char stays in TOS
        2 (SP) W MOV,       \ address->w
        *SP+ R1 MOV,        \ POP count into R1,
        BEGIN,
          R1 R1 MOV,
        NE WHILE, ( len<>0)
            *W TOS CMPB,
        NE WHILE, ( *R8<>R1)
         ( do: 1 \STRING )
             W INC,    \ inc. adr
             R1 DEC,   \ dec. len
        REPEAT,
        ENDIF,
        W *SP  MOV,                    \ store updated address on stack
        R1 TOS MOV,                    \ updated count to TOS
        NEXT,  \ 26 bytes
        ENDCODE

 

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

Sometimes I can make myself feel pretty stupid. :) 

 

I am working on a little debugger Forth for @FarmerPotato  for Geneve 2020 and I wondered if I could incorporate wordlists into the system to hide the Assembler.  It took me down the rabbit hole of how to make sure the dictionary would wake up correctly because I had not made that work in Camel99 yet. :( 

 

One of the neat things about Forth is that while you are compiling code you have access to the interpreter and can do processing inside the source code as well. This works perfectly when loading source code but if you save that code as a binary executable YOU HAVE TO REMEMBER TO DO IT at runtime. 

 

So in my WORDLIST implementation I used this feature to initialize the new Forth wordlist in three lines.

Somehow I missed the importance of these lines all this time. DOH! 

One of those lines completely changes the dictionary search mechanism.  (how dumb am I)

 

Anyway I finally can build a big system with vocabularies, save it as a binary program and it starts with everything intact. 

This has great implications for the machine Forth compiler which needs many vocabularies to partition cross-compiler, cross-assembler and target Forth dictionaries.  Here is what I needed to add

: INIT-WORDLISTS
    ['] FIND12 'FIND !
    CONTEXT @ @ FORTH-WORDLIST !
    ONLY FORTH DEFINITIONS
;

 

And now here is what it takes to build a new Forth with more features:

\ building a big Forth system with WORD lists       FEB 2022 Brian Fox

\ starting point from the CAMEL99 Kernel 
INCLUDE DSK1.WORDLISTS

\ load the assembler in it's own vocabulary
VOCABULARY ASSEMBLER
ASSEMBLER DEFINITIONS
INCLUDE DSK1.ASM9900

\ load some tools
ONLY FORTH DEFINITIONS
INCLUDE DSK1.TOOLS

\ create a new BOOT word
: COLD
    WARM             \ Init hardware, restore Forth dictionary
    INIT-WORDLISTS   \ set find mechanism, init FORTH wordlist
    ABORT ;          \ reset the interpreter

LOCK                 \ lock end of dictionary at COLD

INCLUDE DSK1.SAVESYS

' COLD SAVESYS DSK6.FATFORTH

 

  • Like 6
Link to comment
Share on other sites

14 hours ago, TheBF said:

And now here is what it takes to build a newForth with more features

I’m very grateful! 
 

My understanding of how  VOCABULARY works is poor. I know it changes the results of a dictionary lookup, and the compiled results aren’t affected by anything later. (CFAs are just pointers.) 

 

But when you switch vocabularies , does it change some pointers in the dictionary? Is that the problem you had with saving the state, there are some pointers left in the wrong state?


 

I’ve used vocabulary in FORTI because the note compiler has words A B C D E F G. (And A# and A$ for flat.) 

 

These can only execute as words when the vocabulary is active, which is always  inside the compiling word VOICE: — ;


kind of like assembler, but for music. 


The A B C words compute a note number, modified by a LOT of context (like key and octave plus ornamentation)  

They push a word or two onto the dictionary. The VOICE: is used to make a sequence of notes , to be interpreted later by a player.
 

And outside a VOICE: definition you never, ever want A to be anything but a hexadecimal number! (Even the assembler vocabulary uses A, (A comma )

 

still needing a lot of studying to really comprehend what’s going on with VOCABULARY. 

 

 

 

 

 

 


 

 

  • Like 2
Link to comment
Share on other sites

19 hours ago, FarmerPotato said:

I’m very grateful! 
 

My understanding of how  VOCABULARY works is poor. I know it changes the results of a dictionary lookup, and the compiled results aren’t affected by anything later. (CFAs are just pointers.) 

 

But when you switch vocabularies , does it change some pointers in the dictionary? Is that the problem you had with saving the state, there are some pointers left in the wrong state?

 

I’ve used vocabulary in FORTI because the note compiler has words A B C D E F G. (And A# and A$ for flat.) 

These can only execute as words when the vocabulary is active, which is always  inside the compiling word VOICE: — ;

kind of like assembler, but for music. 


The A B C words compute a note number, modified by a LOT of context (like key and octave plus ornamentation)  

They push a word or two onto the dictionary. The VOICE: is used to make a sequence of notes , to be interpreted later by a player.
 

And outside a VOICE: definition you never, ever want A to be anything but a hexadecimal number! (Even the assembler vocabulary uses A, (A comma )

still needing a lot of studying to really comprehend what’s going on with VOCABULARY. 

 

 

It is probably beyond a simple post to fully explain this but here goes.

(I could not have done this 3 years ago so I guess that's progress for an old guy) :)

 

 

Since the Forth dictionary is a linked list that begins at the last word defined, that constitutes a vocabulary in the simplest sense.

 

However...  VOCABULARY  is not part of the ANSI Forth and has been replaced by WORDLIST.

WORDLIST is a simpler thing, a little 3 field data structure that just returns its address onto the data stack.

The address is called a 'wid'.  (wordlist identifier)

: WORDLIST ( -- wid)
   HERE
   0 ,               \ init nfa of last word in wordlist
   WID-LINK @ ,      \ compile link to previous wordlist
   DUP WID-LINK !    \ link to previous wordlist
   0 ,               \ name of this wordlist. Must be patched
;

 

To make Forth search a WORDLIST you make it the "CONTEXT" wordlist with SET-CONTEXT.

 

There are two variables that determine specific roles of  dictionary searches made by the compiler/interpreter.

VARIABLE CURRENT
VARIABLE CONTEXT 

The CONTEXT "WORDLIST" is what the compiler searches to find definitions to build new definitions or to run an immediate command.

CONTEXT holds a "pointer to a pointer"  to the last word created in a WORDLIST. 

 

The CURRENT wordlist is the name space (wordlist) that new words will become part of when you define them.

 

 

The good news is that it's simple to make a vocabulary (which most implementers do) with CREATE/DOES> .

Just create the wordlist data structure and at run-time have it set itself as the CONTEXT wordlist. :) 

: VOCABULARY  ( <text> )
   CREATE
   WORDLIST   
   LATEST @ SWAP 4 + !  \ update wordlist name field
   DOES> SET-CONTEXT ;

 

In Fig-Forth or Forth 79  systems FORTH is the default vocabulary that is searched and where new words are built.

If you invoke another vocabulary by using its name,  the new vocabulary is searched first and when you get to the end of the it, the search continues on into the Forth vocabulary. So they are effectively connected by default.

 

In ANSI Forth and Forth 83 we are given control of what gets searched and in what order. 

So if you have FORTH,  ASSEMBLER and EDITOR vocabularies, you can control which one gets searched first, second and third in any order.

This is more complicated by gives you great flexibility for complex jobs.

 

In the new model the CONTEXT variable becomes an array of addresses (in my implementation).

Each array cell points to the last word defined in each specific vocabulary. 

The system knows the number of vocabularies you want searched and it will search each one in the order that they are placed in the array.

 

There is still only one CURRENT vocabulary where new definitions are added.

 

I think I will leave it there for a little digestion.

 

More info can be found here: WORDLIST - SEARCH (forth-standard.org)

 

Here is my latest code for WORDLISTS

Spoiler

\ wordlist.fth   for CAMEL99 FORTH    Oct 2020 Brian Fox
\ Code adapted from Web: https://forth-standard.org/standard/search
\ Dec 2020: Removed SET-CURRENT to save precious bytes
\ Jan 5, 2020: back migrated some enhancements from CODEX work
\ Jun 4, 2021: Changed order of patching to work with TTY version
\ Sep 25, 2021: Corrected SET-CONTEXT, Removed ROOT to save space.
\ Feb 23, 2022: Added INIT-WORDLISTS for binary program startup
\ --------
\ 'wid' is a word-list ID.
\ In Camel Forth, wid is a pointer to a Name Field Address (NFA)
\ ie: a counted string of the last word defined in the wordlist.

\ The kernel program has a pre-defined CONTEXT array to hold the
\ Forth wordlist plus 8 user defined wordlists.

\ NEEDS .S   FROM DSK1.TOOLS ( Debugging)

HERE
DECIMAL
CREATE #ORDER  1 ,  \ No. of active wordlists starts at 1
VARIABLE WID-LINK   \ Pointer to the most recently defined wordlist

CREATE FORTH-WORDLIST    0 ,  0 ,  LATEST @ ,
 FORTH-WORDLIST WID-LINK !  \ set first WID in the chain

: WORDLIST ( -- wid)
   HERE
   0 ,               \ init nfa of last word in wordlist
   WID-LINK @ ,      \ compile link to previous wordlist
   DUP WID-LINK !    \ link to previous wordlist
   0 ,               \ name of this wordlist. Must be patched
;

HEX
: .WID  ( wid -- )
  [ 2 CELLS ] LITERAL + @
  ?DUP 0= IF EXIT THEN   \ name field is empty.
  COUNT 1F AND TYPE SPACE ;

\ : ]CONTEXT ( n -- addr) CELLS CONTEXT + ; \ context as array
HEX ( Machine code is same size but faster)
CODE ]CONTEXT ( n -- addr)
     A104 ,            \ TOS TOS ADD,
     0224 , CONTEXT ,  \ TOS CONTEXT AI,
     NEXT,
     ENDCODE

.( .)
: GET-ORDER ( -- widn ... wid1 n ) \ *reversed order on stack
     #ORDER @  0 DO   #ORDER @ I - 1- ]CONTEXT @   LOOP  #ORDER @  ;

DECIMAL
: SET-ORDER ( wid1x ... wid1 n -- )  \ n cannot be 0
     DUP 0< IF DROP  FORTH-WORDLIST DUP 2  THEN
     DUP #ORDER !  0 ?DO  I ]CONTEXT !  LOOP
;

: ONLY ( -- ) TRUE SET-ORDER ;  \ set search order to FORTH FORTH

: SET-CONTEXT ( wid -- )    \ place 'wid' at beginning of search order
     >R GET-ORDER NIP       \ remove 1st wordlist
     R> SWAP SET-ORDER      \ put 'wid' first
;

\ User API ...
: ALSO        ( -- ) GET-ORDER OVER SWAP 1+ SET-ORDER ;
: PREVIOUS    ( -- ) GET-ORDER NIP 1- SET-ORDER ;
: DEFINITIONS ( -- ) CONTEXT @ CURRENT ! ;
.( .)
\ non-standard but nice to have
: VOCABULARY  ( <text> )
   CREATE
   WORDLIST
   LATEST @ SWAP 4 + !  \ update wordlist name field
   DOES> SET-CONTEXT ;

: ORDER ( -- )
   CR  GET-ORDER 0 DO   .WID   LOOP
   CR ." Current: " CURRENT @ .WID CR ;

: FORTH  ( -- ) FORTH-WORDLIST SET-CONTEXT ;

\ patch FORTH-WORDLIST to existing dictionary
   CONTEXT @ @ FORTH-WORDLIST !
\ set the new search order and current vocabulary
   FORTH-WORDLIST DUP 2 SET-ORDER  DEFINITIONS

\ Forth 2012 6.1.1550, Extend FIND to search all active wordlists
: FIND12 ( FIND12) ( c-addr -- c-addr 0 | xt 1 | xt -1 )
      FALSE   \ default flag
      CONTEXT #ORDER @ CELLS ( -- addr size)
      BOUNDS
      ?DO
          OVER I @ @ (FIND)
          ?DUP
          IF
              2SWAP 2DROP
              LEAVE
          THEN
          DROP
      2 +LOOP ;

' FIND12 'FIND !
ONLY FORTH DEFINITIONS

: INIT-WORDLISTS
    ['] FIND12 'FIND !
    CONTEXT @ @ FORTH-WORDLIST !
    ONLY FORTH DEFINITIONS
;

INIT-WORDLISTS

CR HERE SWAP - DECIMAL SPACE . .( bytes)
HEX

 

 

 

 

 

  • Like 2
Link to comment
Share on other sites

17 hours ago, TheBF said:

In the new model the CONTEXT variable becomes an array of addresses (in my implementation).

Each array cell points to the last word defined in each specific vocabulary. 

The system knows the number of vocabularies you want searched and it will search each one in the order that they are placed in the array.

 

There is still only one CURRENT vocabulary where new definitions are added.

 

You must have written this too quickly—CURRENT and CONTEXT in the above context should be reversed.

—[Corrected]—

 

...lee

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

While reading comp.lang.forth I saw a link to this page which I think is the best summation of Chuck Moore's approach to programming that I have seen.

 

It was written by the late Jeff Fox (no relation) who passed away of a heart attack at the age of 62 in 2011.

Jeff worked closely with Chuck creating software for Chuck's new CPU designs.  He is missed in the Forth community.

 

http://www.ultratechnology.com/method.htm

 

I personally like step 9 in this 10 step approach 

 

9. Code.
   Build custom tools if they help.
   Write code so simple and clear that bugs simply can't happen.
   Make the code "right by design."
   :define using one-liners about this long ;
   Interactively test each Forth word.
   Extend the core language making your custom language and moving
   you toward your solution.
   Return to 1. UNTIL the code solution falls out.

 

: HARD     REMEMBERING TO MAKE NICE SHORT WORDS  ;  :) 

 

  • Like 4
Link to comment
Share on other sites

Pythagoras + a bit of Fudge

 

I decided to remove my "improved" TI-FORTH SPRITE code from my DIRSPRIT ( direct sprites) library.

I changed my coincidence code awhile back to just compare numbers in the sprite x,y values in VDP RAM, so it was not really being used.

 

The TI-FORTH distance between sprites calculation was always sub-optimal returning the distance squared and if the sums overflowed it just returned 32767.

 

I think I remember Lee fixed this with floating point math but I went down the rabbit hole of how to do it with integers. :) 

 

I had a pretty quick square root word for 16 bits which I found somewhere years ago, in Forth magazine I think.

It's pretty clever but being 16 bits it doesn't work past 65535.

 

With a 255x195 screen summing the squares can be done to 32 bit resolution so that's no problem.

But I could not figure out how to get the square root of a 32 bit number without a lot more work. 

 

I realized that if when I did the 32 bit addition unsigned, I got a 1 in the upper bits of the double number.

Using that 1 as an overflow flag I could perform the square root on the lower 16 bits and if there was an overflow I can fudge the value to give something useful.

Not perfect Pythagorean computation but much easier to do than fighting with a 32 bit square root calculation I think. 

 

The fudge factor calculation takes the incorrect square root (in an over-flow), divides it by PI and adds it to 255. 

This cause the values to be slightly compressed as you get farther away but it would still work in a game.

A little bit like those mirrors on your car that say "Things appear closer than they are". :)

 

\ DISTANCE.FTH  compute distance between any two sprites         Mar 12 2022

NEEDS DUMP   FROM DSK1.TOOLS
NEEDS SPRITE FROM  DSK1.DIRSPRIT
NEEDS AUTOMOTION FROM DSK1.AUTOMOTION
MARKER NEW

DECIMAL
: ^2  ( n n -- n)  DUP * ;
: DIFF  ( x y x y -- dx dy)  ROT -  -ROT - ;
: SUM   ( dx^2 dy^2 -- d) 0 ROT 0 D+ ;  \ sum squares to 32 bit resolution
: SUM-SQUARES ( x y x y -- d ) DIFF  ^2 SWAP ^2  SUM ;

: SQRT   ( n -- n ) -1 TUCK DO  2+  DUP +LOOP  2/ ;

: PI/   ( n -- n ) 10000 31415 */ ;
: DISTANCE    ( x y x y -- n) SUM-SQUARES >R SQRT R> IF  PI/ 255 +  THEN ;
: SP.DISTXY   ( x  y  spr#  -- dist ) POSITION DISTANCE ;
: SP.DIST     ( spr#1 spr#2 -- dist ) POSITION ROT SP.DISTXY ;

And here is some test code

\ test code
DECIMAL
\  char clr   x    y  spr#
\ -------------------------
CHAR 0   6     0    0  0 SPRITE
CHAR 1  10   240  85  1 SPRITE
CHAR 2   9   255  255  2 SPRITE
CHAR 3  11   127   90  3 SPRITE
CHAR 4  13   199  149  4 SPRITE
1 MAGNIFY

CLEAR
0 0 SP.DIST .
0 1 SP.DIST .
1 2 SP.DIST .
2 3 SP.DIST .
0 3 SP.DIST .
0 4 SP.DIST .
0 2 SP.DIST .

 

You can see the compression in the screen capture.  Distance between sprite 0 and sprite 2 should 360, but it's only 335.

This error only occurs after the distance exceeds 255. 

I may still pursue a 32bit Square root but this seems useable.

 

image.png.51c94d82169b19ff6638f1c2a24499a7.png

Edited by TheBF
wrong image
  • Like 2
Link to comment
Share on other sites

52 minutes ago, TheBF said:

I think I remember Lee fixed this with floating point math but I went down the rabbit hole of how to do it with integers. :) 

  . . .

I may still pursue a 32bit Square root but this seems useable.

 

I ported a C program to take an unsigned double (32-bit) square to a single (16-bit) square root. It consumes 128 bytes in the fbForth dictionary:

 

Spoiler

/* from Craig McQueen’s response on stackoverflow.com to “Looking for
   an efficient integer square root algorithm for ARM Thumb2”: (https://stackoverflow.com/questions/1100090/looking-for-an-efficient-integer-square-root-algorithm-for-arm-thumb2)
*/
#define BITSPERLONG 32
#define TOP2BITS(x) ((x & (3L << (BITSPERLONG-2))) >> (BITSPERLONG-2))

struct int_sqrt {
    unsigned sqrt, frac;
};

/* usqrt:
    ENTRY x: unsigned long
    EXIT  returns floor(sqrt(x) * pow(2, BITSPERLONG/2))

    Since the square root never uses more than half the bits
    of the input, we use the other half of the bits to contain
    extra bits of precision after the binary point.

    EXAMPLE
        suppose BITSPERLONG = 32
        then    usqrt(144) = 786432 = 12 * 65536
                usqrt(32) = 370727 = 5.66 * 65536

    NOTES
        (1) change BITSPERLONG to BITSPERLONG/2 if you do not want
            the answer scaled.  Indeed, if you want n bits of
            precision after the binary point, use BITSPERLONG/2+n.
            The code assumes that BITSPERLONG is even.
        (2) This is really better off being written in assembly.
            The line marked below is really a "arithmetic shift left"
            on the double-long value with r in the upper half
            and x in the lower half.  This operation is typically
            expressible in only one or two assembly instructions.
        (3) Unrolling this loop is probably not a bad idea.

    ALGORITHM
        The calculations are the base-two analogue of the square
        root algorithm we all learned in grammar school.  Since we're
        in base 2, there is only one nontrivial trial multiplier.

        Notice that absolutely no multiplications or divisions are performed.
        This means it'll be fast on a wide range of processors.
*/

void usqrt(unsigned long x, struct int_sqrt *q)
{
      unsigned long a = 0L;                   /* accumulator      */
      unsigned long r = 0L;                   /* remainder        */
      unsigned long e = 0L;                   /* trial product    */

      int i;

      for (i = 0; i < BITSPERLONG; i++)   /* NOTE 1 */
      {
            r = (r << 2) + TOP2BITS(x); x <<= 2; /* NOTE 2 */
            a <<= 1;
            e = (a << 1) + 1;
            if (r >= e)
            {
                  r -= e;
                  a++;
            }
      }
      memcpy(q, &a, sizeof(long));
}

/**
 * \brief    Fast Square root algorithm
 *
 * Fractional parts of the answer are discarded. That is:
 *      - SquareRoot(3) --> 1
 *      - SquareRoot(4) --> 2
 *      - SquareRoot(5) --> 2
 *      - SquareRoot(8) --> 2
 *      - SquareRoot(9) --> 3
 *
 * \param[in] a_nInput - unsigned integer for which to find the square root
 *
 * \return Integer square root of the input value.
 */
uint32_t SquareRoot(uint32_t a_nInput)
{
    uint32_t op  = a_nInput;
    uint32_t res = 0;
    uint32_t one = 1uL << 30; // The second-to-top bit is set: use 1u << 14 for uint16_t type; use 1uL<<30 for uint32_t type


    // "one" starts at the highest power of four <= than the argument.
    while (one > op)
    {
        one >>= 2;
    }

    while (one != 0)
    {
        if (op >= res + one)
        {
            op = op - (res + one);
            res = res +  2 * one;
        }
        res >>= 1;
        one >>= 2;
    }
    return res;
}

HEX
\ Registers:   R0,R1 = udh,udl
\              R2,R3 = root (nh,nl)
\              R4,r5 = floating 1 (f1h,f1l)
\              R6,R7 = wkh,wkl
ASM: UDSQRT    ( ud -- n )
   *SP+ R0 MOV,      \ pop udh to R0
   *SP R1 MOV,       \ udl to R1
   R2 CLR,           \ clear running..
   R3 CLR,           \ ..root (nh,nl)
   R5 CLR,           \ set floating 1..
   R4 4000 LI,       \ ..to 4000 0000
   
   \ get highest power of 4 <= square (udh,udl)
   BEGIN,
      R4 R0 C,       \ f1h:udh?
      EQ IF,
         R5 R1 C,    \ f1l:udl?
      THEN,
   H WHILE,
      R4 R5 C,          \ f1h:f1l?
      H IF,             \ bit in f1h?
         R4 2 SRL,      \ yes..shift f1h right 2 bits
         EQ IF,         \ bit shifted out?
            R5 4000 LI, \ set f1l to shifted-out bit
         THEN,
      ELSE,             \ no..bit is in f1l
         R5 2 SRL,      \ shift f1l right 2 bits
      THEN, 
   REPEAT,

   \ calculate square root
   BEGIN,
      R4 R5 C,       \ f1h:f1l?
   NE WHILE,         \ f1h,f1l > 0?
      \ set up wkh,wkl = nh,nl + f1h,f1l to compare to udh,udl
      R2 R6 MOV,     \ nh to wkh
      R3 R7 MOV,     \ nl to wkl
      R5 R7 A,       \ f1l + wkl
      OC IF,         \ carry?
         R6 INC,     \ yes..increment wkh
      THEN,
      R4 R6 A,       \ f1h + wkh
      \ check if need to update running values of udh,udl & nh,nl
      R0 R6 C,       \ udh:wkh?
      EQ IF,         \ udh = wkh?
         R1 R7 C,    \ udl:wkl?
      THEN,
      HE IF,         \ udh,udl >= wkh,wkl?
         \ udh,udl = udh,udl - wkh,wkl
         R7 R1 S,    \ udl - wkl
         NC IF,
            R0 DEC,  \ reduce udh on no carry
         THEN,
         R6 R0 S,    \ udh - wkh
         \ nh,nl = wkh,wkl + f1h,f1l
         R5 R7 A,    \ wkl + f1l
         OC IF,      \ carry?
            R6 INC,  \ yes..increment wkh
         THEN,
         R4 R6 A,    \ wkh + f1h
         R6 R2 MOV,  \ wkh to nh
         R7 R3 MOV,  \ wkl to nl
      THEN,
      \ f1h,f1l >> 2 
      R4 R5 C,          \ f1h:f1l?
      H IF,             \ bit in f1h?
         R4 2 SRL,      \ yes..shift f1h right 2 bits
         EQ IF,         \ bit shifted out?
            R5 4000 LI, \ set f1l to shifted-out bit
         THEN,
      ELSE,             \ no..bit is in f1l
         R5 2 SRL,      \ shift f1l right 2 bits
      THEN,
      \ nh,nl >> 1
      R3 1 SRL,         \ shift nl right 1 bit
      R2 1 SRL,         \ shift nh right 1 bit
      OC IF,            \ carry?
         R3 8000 ORI,   \ set MSb of nl
      THEN,
   REPEAT,
   R3 *SP MOV,          \ return n on stack
;ASM
DECIMAL

 

 

1 hour ago, TheBF said:

You can see the compression in the screen capture.  Distance between sprite 0 and sprite 2 should be 360, but it's only 335.

 

The TI-99/4A’s 192x256 resolution has a maximum pixel distance, corner to corner, of ~319 pixels: √(2552 + 1912) ≈ 318.6. With anything more, one of the sprites is off screen, so I guess you are involving off-screen distances in your calculations, but to what end?

 

...lee

  • Like 1
Link to comment
Share on other sites

9 minutes ago, Lee Stewart said:

 

I ported a C program to take an unsigned double (32-bit) square to a single (16-bit) square root. It consumes 128 bytes in the fbForth dictionary:

 

The TI-99/4A’s 192x256 resolution has a maximum pixel distance, corner to corner, of ~319 pixels: √(2552 + 1912) ≈ 318.6. With anything more, one of the sprites is off screen, so I guess you are involving off-screen distances in your calculations, but to what end?

 

...lee

Ah yes.  Very nice work. Thanks.

Amazing how many instructions it takes in assembler. 

 

I was just trying to see how far off the calculations were with the off-screen sprite.

I will double check this thing with the 256 x 192 coordinates. I may be able to get closer on the reduced size.

 

 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...