Jump to content
TheBF

Camel99 Forth Information goes here

Recommended Posts

I did a little "googling" to supplement my poor math skills and found this page. 

 

http://www.azillionmonkeys.com/qed/sqroot.html


Section 5  is interesting and describes what I think the TI Forth engineers were using.

 

Quote used without permission. Mea culpa.

"A common application in computer graphics, is to work out the distance between two points as √(Δx2+Δy2).

However, for performance reasons, the square root operation is a killer, and often, very crude approximations are acceptable.

So we examine the metrics (1 / √2)*(|x|+|y|), and max(|x|,|y|)"

 

TI-FORTH limited the values to 32K.

My version shows that we can go un-signed and expand the range.

And with a 32 bit accumulator we get a good "out-of-range" flag.

This might be adequate for a wide range of applications , significantly improve on the TI-FORTH range of measurement and as a CODE word it would be very fast.

Keeps me occupied and out of trouble. ;) 

 

 

  • Like 1

Share this post


Link to post
Share on other sites

DIST^2  is a usable word.  It's not super speedy at 1.6 milli-seconds,  but it's not bad and it gives ranges out to 65535.

DECIMAL
: ^2    ( n -- d)   S" DUP *" EVALUATE ; IMMEDIATE 
\ 1.6 milli-seconds
: DIST^2 ( spr1 spr2 -- n ?)  \ ? = 0, range is valid
      POSITION ROT POSITION ( -- x y x2 y2)
      ROT -  -ROT -         ( -- diffy diffx)
      ^2 SWAP ^2            ( -- dx^2 dy^2)
      0 ROT 0 D+ ;          \ convert to Usigned doubles and add

Combined with a 16 bit square root word it works like this:

: SQRT ( n -- n ) -1 TUCK DO   2+  DUP +LOOP 2/ ;

: DISTANCE ( n n -- n) DIST^2 IF DROP TRUE EXIT   THEN SQRT ;

If you get a -1 the sprite distance is out of range. 

The whole thing adds 134 bytes to the system.

 

Edit: ( If we remove the text macro for ^2  it uses 114 bytes) :)

 

  • Like 1

Share this post


Link to post
Share on other sites

I had a memory of some work done by Albert Van der Horst on square roots on comp.lang.forth so I went looking.

He made a version using Newtonian interpolation. 

You can play games to find the best seed but even using a seed of 1 the results are amazing. It is over 10 times faster!

Unfortunately the current version dies on negative numbers so I am back in the trench.

But it's a good start.

 

\ By Albert Van der Horst, comp.lang.forth, Aug 29, 2017
\ For n return FLOOR of the square root of n.

VARIABLE seed 1  seed !
\ While calculating roots near each other., the seed can be kept.
\ Otherwise this can be used to save 10 iterations.
: init-seed  DUP 10 RSHIFT 1024 MAX seed ! ;

: SQRT ( n -- )
  DUP
  IF
     >R
     seed @
     [email protected] OVER / OVER + 2/ NIP ( DUP . ) \ debug viewing
     BEGIN
        [email protected] OVER / OVER + 2/  ( DUP .)
        2DUP >
     WHILE
        NIP
     REPEAT
\     seed !
     R> DROP
  THEN ;

: TESTROOTAV  [email protected] SWAP SQRT [email protected] SWAP .
         CR  -  213 10 */  . ." uS"  ;

image.png.38d45d5d2e05166dfb1df180d4480f81.png

  • Like 2

Share this post


Link to post
Share on other sites

Bugs R Us

The stuff you find when you try to do serious work with homemade code.

I really should be more professional and run the HAYES test suite on this Forth.

 

The good news: It was simple to add an un-signed division word to Forth because 9900 has an instruction in order to make Albert's SQRT work on un-signed numbers.

The bad  news: I found that the word 2/ should be a logical shift not an arithmetic shift.

I never... actually, like, ummm... read the spec. :( Oops.

 

 

2/ was simple to fix but it means that Camel99 Forth has an official BUG in the wild. 🐛 (that's a caterpillar. We don't have a bug emoji)

 

If you need to use 2/ with negative numbers add this new definition to your program until I get a new release out.

HEX
CODE 2/  ( n -- n)  
      0914 , \  TOS 1 SRL,   \ **BUG FIX**  was SRA. DUH!
      NEXT,
 ENDCODE


 

  • Like 3

Share this post


Link to post
Share on other sites

After thinking about it I decided that DISTANCE was a good general purpose function and so it can stand on it's own.

If you want to compute the DISTANCE between two sprites it is now trivial.

: SP.DIST  ( spr spr -- n) POSITION ROT POSITION DISTANCE ;

 

So here is how the DISTANCE library file looks now.  The only thing I might optimize as CODE is DXY.

There are lot of ROTs in code when your are manipulating x,y coordinates so removing two on an intermediate word make me feel better. :)

 

Spoiler
\ DISTANCE.FTH  compute distance between 2 coordidates  Mar 14 2022 Brian Fox
\ Max range is 255 pixels with "out of range" flag.

HERE
\ machine code is same size as Forth
HEX \ : U/  ( u1 u2 -- u3 )  0 SWAP UM/MOD NIP ;
CODE U/   ( u1 u2 -- u3 )     \ unsigned division
    C004 ,  \   TOS R0 MOV,   \ divisor->R0
    04C4 ,  \      TOS CLR,   \ high word in TOS = 0
    C176 ,  \ *SP+  R5 MOV,   \ MOVE low word to r5
    3D00 ,  \   R0 TOS DIV,
    NEXT,
ENDCODE

\ SQRT by Albert Van der Horst, comp.lang.forth, Aug 29, 2017
\ Newtonian Interpolation. ~10X faster than linear method
\ Returns FLOOR of the square root of n.
DECIMAL
: SQRT ( n -- n')
  DUP
  IF >R
     1  \ 1st seed
     [email protected] OVER U/ OVER + 2/ NIP ( DUP . ) \ debug viewing
     BEGIN
        [email protected] OVER U/ OVER + 2/  ( DUP .)
        2DUP >
     WHILE
        NIP
     REPEAT
     DROP
     R> DROP
  THEN ;
DECIMAL

: DXY  ( x y x y -- dx dy) ROT -  -ROT - ;
: SUMSQR   ( n1 n2 -- d) DUP * SWAP DUP * 0 ROT 0 D+ ;
: DISTANCE ( x y x y -- n) DXY SUMSQR IF DROP TRUE EXIT   THEN SQRT ;
HERE SWAP - .  ( 170 bytes)

 


 

  • Like 2

Share this post


Link to post
Share on other sites

Hi,

 

so far you have concentrated on Euclidean distance. I like the square root approximations if it is needed for say computing  gravitational attraction. But for games (imagine Asteroids) you want to detect coincidence between any or all sprites, every loop.
 

As a first pass, you want to know if square boundaries could intersect. 
 

The algorithm I learned is from Preparata & Shamos Computational Geometry (pretty old now) and probably Knuth before that. 
 

Keep the sprite list sorted by Y coordinate at all times. (X sort is not terribly useful on top of that.) 

 

As sprite positions typically change 1 pixel at a time, a bubble sort can be adequate. 
 

I like to have the sprite list in CPU Ram for updating, and to write the whole thing to VDP in each vertical interrupt interval (watch the VDPSTA bit.)

 

 

For coincidence, iterate over the list, keeping a window of sprites within 8,16,32 Y pixels(depending on magnification)

 

you need to test the Ith sprite against the last few in this window (if any.) You can reject any with too large X distance. 

 

Any slow math or slower pixel-wise comparison can be done on just these few sprites. 
 

 

  • Like 3

Share this post


Link to post
Share on other sites
1 hour ago, FarmerPotato said:

As a first pass, you want to know if square boundaries could intersect.

 

If I understand what you mean here*, this is how I manage fbForth COINC and COINCXY ( but not SPRDIST and SPRDISTXY ), so it is much quicker than calculating Δx2+Δy2 or √(Δx2+Δy2).

____________

*What I understand this to mean is that Δx and Δy are computed and each compared to a tolerance.

 

...lee

Edited by Lee Stewart
clarification

Share this post


Link to post
Share on other sites
On 3/13/2022 at 4:17 PM, TheBF said:
\ While calculating roots near each other., the seed can be kept. 
\ Otherwise this can be used to save 10 iterations. 
: init-seed DUP 10 RSHIFT 1024 MAX seed ! ;

 

 

How is init-seed used?

 

...lee

  • Like 1

Share this post


Link to post
Share on other sites

That's interesting stuff. I am not sure maintaining the sort would be able to keep up on our old girl here if the number of SPRITEs got too high but a neat idea just the same. I think it would have to be CODE and not Forth to work really well albeit reading and writing blocks of VDP RAM proceeds at machine speed.

 

In the past I had re-worked the old TI-Forth code to test the square boundaries like Lee is doing, but then I found I could make a faster routine that simply computed the difference between the x,y coordinates of two sprites and compare to a tolerance. It's brute force but it seems to take less time than what I had.

 

Here is the Forth version I had which runs in 1.4mS

: COINC ( spr#1 spr#2 tol -- ?)
          >R
          POSITION ROT POSITION ( -- x1 y1 x2 y2 )
          ROT - ABS [email protected] <
         -ROT - ABS R> < AND
;

 

And here is the slightly improved version from my recent work, which runs in 1.2mS

HEX
CODE DXY   ( x y x2 y2 -- dx dy)
  *SP+  R1 MOV, \ x2
  *SP+ TOS SUB, \ y2=y2-y
       TOS ABS,
   R1  *SP SUB, \ x=x-x2
       *SP ABS,
        NEXT,
ENDCODE

: COINC ( spr#1 spr#2 tol -- ?)
          >R
          POSITION ROT POSITION ( -- x1 y1 x2 y2 )
          DXY  [email protected]  < 
          SWAP R> < AND
;

An important part of using these is to call COINCALL  in the primary loop which is very fast being just a byte fetch. 

 

However what I like about your idea is that is could probably handle more sprite coincidences simultaneously.

In a game like asteroids for example your method probably performs better. 

 

I also do something different for getting at the SPRITE table.  I have an integer fetch routine for VDP called [email protected] so I read x,y at once.

Then I have a SPLIT word that splits that into two bytes.

 

For other situations I have turned the SPRITE attribute table into 4 fast arrays so I can read each field independently.

 

: TABLE4: ( Vaddr -- )  \ create a table of 4 byte records
         CREATE    ,             \ compile base address into this word
        ;CODE ( n -- Vaddr')     \ RUN time
             0A24 ,  \ TOS 2 SLA,  ( tos = n x 4 )
             A118 ,  \ *W TOS ADD,
             NEXT,
ENDCODE

SAT     TABLE4: SP.Y
SAT 1+  TABLE4: SP.X
SAT 2+  TABLE4: SP.PAT
SAT 3 + TABLE4: SP.COLR

 

With this POSITION is defined as:  ( removed the limit tests) 

: POSITION  ( sprt# -- dx dy ) ( ?NDX) S" SP.Y [email protected] SPLIT" EVALUATE ; IMMEDIATE


 

 

 

  • Like 1

Share this post


Link to post
Share on other sites
6 minutes ago, Lee Stewart said:

 

How is init-seed used?

 

...lee

I never did figure that out. :)

But from reading the topic Albert indicated that it could save 10 iterations if you compute a good initial seed.

For our application with sprites there were not many iterations required so I just locked it to 1.

 

Here is the topic. Maybe you can glean something from it. 

Faster integer square roots through a seed (google.com)

 

  • Like 3
  • Thanks 1

Share this post


Link to post
Share on other sites

Here is a test program that I am using to work on this.

Using this loop which is only polling for edges and coincidence it still misses a collision every now and then.

 

I am thinking about trying your idea, Erik, but running the Sprite table read/write, sorting Y and collision detections it a separate process.

Between auto-motion running on the interrupt and a separate process for collision detection it frees up the main program to the game itself.

 

I have some very simple mailboxes for inter-task communication so the collision detector sends a message to the game.

 

The game just polls the mailbox and reads the message. Only then does it deal with the sprites.

One thing that might (?) improve things is stopping the motion of sprites that collided and wait for a reply message from the game to re-start them.

This prevents automotion from messing up your universe.

Lots to think about. Thanks Erik.

 

Spoiler
\ Sprite COINC and TRAP Test

NEEDS DUMP       FROM DSK1.TOOLS
NEEDS SPRITE     FROM DSK1.DIRSPRIT
NEEDS AUTOMOTION FROM DSK1.AUTOMOTION
NEEDS HZ         FROM DSK1.SOUND
NEEDS MARKER     FROM DSK1.MARKER
NEEDS RND        FROM DSK1.RANDOM

MARKER /TEST

DECIMAL
: BOUNCE.X  ( spr# --) ]SMT.X  DUP [email protected] NEGATE  SWAP VC! ;
: BOUNCE.Y  ( spr# --) ]SMT.Y  DUP [email protected] NEGATE  SWAP VC! ;
: BOUNCE    ( spr# --) DUP BOUNCE.X BOUNCE.Y  ;


: TINK    GEN1  1500 HZ  -6 DB 40 MS  ;
: BONK    GEN2  120 HZ  -4 DB  50 MS  ;

: TRAPX ( spr# -- )
      DUP SP.X [email protected]
      239 0 WITHIN IF   BOUNCE.X  TINK EXIT THEN
      DROP  ;

: TRAPY ( spr# -- )
      DUP SP.Y [email protected]
      185 0 WITHIN IF  BOUNCE.Y   TINK  EXIT THEN
      DROP  ;

: TRAP ( spr# -- ) DUP TRAPX TRAPY   ;

DECIMAL

: SPRITES ( n -- ) \ makes n sprites
      (    char   colr x   y  sp# )
      [CHAR] 0     3   100  90  0 SPRITE
      [CHAR] 1     5   100  90  1 SPRITE
      [CHAR] 2     9   100  90  2 SPRITE
;

: RNDV   ( -- x y)  70 RND 10 + 20 -  ;
: RNDXY  ( -- dx dy)  RNDV RNDV ;

: RUN ( -- )
    15 SCREEN
    1 MAGNIFY
    PAGE ." CAMEL99 Forth"
    CR   ." Trap/Coinc Test with Automotion"
    CR
    SPRITES
      25  27 0 MOTION
     -31 -33 1 MOTION
     -13  25 2 MOTION
    AUTOMOTION
    BEGIN
         0 TRAP  1 TRAP  2 TRAP
         0 1 7 COINC IF 0 BOUNCE 1 BOUNCE BONK THEN
         0 2 7 COINC IF 0 BOUNCE 2 BOUNCE BONK THEN
         1 2 7 COINC IF 1 BOUNCE 2 BOUNCE BONK THEN
        GEN1 MUTE
        GEN2 MUTE
    ?TERMINAL
    UNTIL
    STOPMOTION
\    DELALL
    8 SCREEN ;

CR .( Type RUN to start demo)

 

 

  • Like 1

Share this post


Link to post
Share on other sites

I hear you about the sprite auto-motion interrupt. That complicates things. But it's not hard to (disable automation and) roll your own in a user interrupt routine. (if you want to be exact, find the ISR source code in say TI Intern.) There's just too many VDP reads and writes involved in the console routine, for my liking.

 

For position and motion, I prefer fixed point 16-bit X.x where byte X is the screen coordinate, byte x is a fraction, and the velocity is simply added (a signed 16-bit quantity but preferably -256 to +256.)

 

The sorting is done by swapping indices - not the actual sprite data. Because sprite #1 should always be the player, with bullets having next priority, the sprites are written to VDP in original order (nothing to do with the sorted indices.)

 

Another reason to write the whole sprite table to VDP on each interrupt, is that animating the sprite pattern can happen, too.

I also update the player sprite pattern definition, for rotation (it's expensive to keep ALL the patterns loaded for just one sprite.)

 

(the most recent time I used this sprite engine was in "parsec2020" which was only a demo of your ship and a map. But I tested automation with a bunch of asteroids flying around.) I didn't get to the coinc code. 

 

pseudocode:

 

Spoiler

 

 

Edited by FarmerPotato
  • Like 3

Share this post


Link to post
Share on other sites
9 hours ago, FarmerPotato said:

I hear you about the sprite auto-motion interrupt. That complicates things. But it's not hard to (disable automation and) roll your own in a user interrupt routine. (if you want to be exact, find the ISR source code in say TI Intern.) There's just too many VDP reads and writes involved in the console routine, for my liking.

I might even do that with a process. Since it's cooperative in my system the sprites will never be out of control.

When I read the motion code it surprised me how involved it was. I like using it because it save space in RAM.

Quote

 

For position and motion, I prefer fixed point 16-bit X.x where byte X is the screen coordinate, byte x is a fraction, and the velocity is simply added (a signed 16-bit quantity but preferably -256 to +256.)

Ooo. I like the sound of that.

Edit. Wait... I think that's what happens in the ROM code.

 

Quote

The sorting is done by swapping indices - not the actual sprite data. Because sprite #1 should always be the player, with bullets having next priority, the sprites are written to VDP in original order (nothing to do with the sorted indices.)

Another great idea.

 

Quote

Another reason to write the whole sprite table to VDP on each interrupt, is that animating the sprite pattern can happen, too.

I also update the player sprite pattern definition, for rotation (it's expensive to keep ALL the patterns loaded for just one sprite.)

This depend on the application I guess. If a sprite needs to just change between 2 states very fast two different characters are the way to go but yes you can blit paterns in fast enough for most needs.

 

Quote

 

(the most recent time I used this sprite engine was in "parsec2020" which was only a demo of your ship and a map. But I tested automation with a bunch of asteroids flying around.) I didn't get to the coinc code

 

You have given me lots to chew on. Thanks.  

Share this post


Link to post
Share on other sites

On square roots, I was once fascinated by the long division algorithm, which was an appendix to TI’s Basic Electricity AC/DC Circuits textbook. (Community college level.) 

 

Here’s a web version:

https://byjus.com/maths/square-root-long-division-method/
 

I have a hunch that this can be applied to 16 but numbers, where a digit is just 2 bits, and operations are mostly 1 bit shifts and JOC. 


My intuition is that the square root in binary has half the number of significant digits. In other words the square root of a 16 bit number is an 8 bit number. 

 

Share this post


Link to post
Share on other sites
Just now, FarmerPotato said:

On square roots, I was once fascinated by the long division algorithm, which was an appendix to TI’s Basic Electricity AC/DC Circuits textbook. (Community college level.) 

 

Here’s a web version:

https://byjus.com/maths/square-root-long-division-method/
 

I have a hunch that this can be applied to 16 but numbers, where a digit is just 2 bits, and operations are mostly 1 bit shifts and JOC. 


My intuition is that the square root in binary has half the number of significant digits. In other words the square root of a 16 bit number is an 8 bit number. 

 

Your intuition is good.

 

256^2=65536

Share this post


Link to post
Share on other sites
24 minutes ago, TheBF said:

Your intuition is good.

 

256^2=65536

 

Looking at that method I believe this is it in Forth:

: SQRT ( n -- n ) -1 TUCK DO   2+  DUP +LOOP   2/ ;

 

 

  • Thanks 1

Share this post


Link to post
Share on other sites

@Lee Stewart

 

I played with this and got this to work for 16 bits numbers but it is still probably not optimal for small numbers. 

In the case of biggest numbers we get about 2x improvement using the ROOTS test word.

seed=1   :   31.1 secs

Init-seed :   16.4 secs

64516 5000 ELAPSE ROOTS 

But doing 

9 5000 ELAPSE ROOTS 

is 9.7 seconds with seed=1  AND 11.0 seconds with INIT-SEED which is 8

 

Here is the file I am using to play around.

Spoiler
\ integer square root in Forth.  Not too fast but small
\ *WARNING* The 16 bit limit is:  65000 SQRT . 254

\ This is 10x faster than linear method
\: SQRT ( n -- n ) -1 TUCK DO   2+  DUP +LOOP   2/ ;
\
INCLUDE DSK1.TOOLS
INCLUDE DSK1.ELAPSE

\ : U/  ( u1 u2 -- u3 )  0 SWAP UM/MOD NIP ;
\ machine code is same size as Forth
HEX
CODE U/   ( u1 u2 -- u3 ) \ unsigned division
    C004 ,  \   TOS R0 MOV,   \ divisor->R0
    04C4 ,  \      TOS CLR,   \ high word in TOS = 0
    C176 ,  \ *SP+  R5 MOV,   \ MOVE low word to r5
    3D00 ,  \   R0 TOS DIV,
    NEXT,
ENDCODE

\ By Albert Van der Horst, comp.lang.forth, Aug 29, 2017
\ For n return FLOOR of the square root of n.
DECIMAL
: INIT-SEED ( n -- n n') DUP 10 RSHIFT 8 MAX  ; \ for 16 bits only

: SQRT ( n -- )
  DUP
  IF
     DUP >R
  \   INIT-SEED   ( optimized seed value) \ 64516 SQRT : 5000x 16.4 seconds
      1   ( default seed value )          \ 64516 SQRT : 5000x 31.1 seconds 
     [email protected] OVER U/ OVER + 2/ NIP ( DUP . ) \ debug viewing
     BEGIN
        [email protected] OVER U/ OVER + 2/  ( DUP .)
        2DUP >
     WHILE
        NIP
     REPEAT
     DROP
     NIP
     R> DROP
  THEN ;

: ROOTS ( n1 cnt -- n) 0 ?DO  DUP SQRT DROP  LOOP DROP ;

 

 

 

Share this post


Link to post
Share on other sites
On 3/13/2022 at 4:17 PM, TheBF said:
Spoiler

: SQRT ( n -- ) 
   DUP IF 
      >R 
      seed @ 
      [email protected] OVER 
      / OVER 
      + 2/ 
      NIP ( DUP . ) \ debug viewing 
      BEGIN 
         [email protected] OVER 
         / OVER 
         + 2/ ( DUP .) 
         2DUP 
         > 
      WHILE 
         NIP 
      REPEAT 
      \ seed ! 
      R> DROP 
   THEN ;

 

 

 

This change was made before your update to using U/ , etc. It is more compact and about 6 % faster:

: SQRT ( n1 -- n2 )
  DUP                               
  IF                                
     >R                             
     seed @                         
     [email protected] OVER                         
     /                              
     + 2/ ( DUP . ) \ debug viewing
     BEGIN                          
        [email protected] OVER                      
        / OVER                      
        + 2/  ( DUP .)           
        SWAP OVER    
        >            
     WHILE                          
     REPEAT
\     seed !
     R> DROP                        
  THEN ;

 

However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .

 

...lee

  • Like 1

Share this post


Link to post
Share on other sites

 

5 minutes ago, Lee Stewart said:

 

This change was made before your update to using U/ , etc. It is more compact and about 6 % faster:

However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .

 

...lee

Good to know.

8 times is in line with what we see going from ITC Forth to code for many routines so that sounds right.

I didn't really take the time to understand algorithm you are using. It looks very clever.  

With all the shifting it makes me wonder if the divisions in Albert's version would net out to similar speed on 9900.

 

I suppose to have good comparison between the two methods, I have to convert Albert's code to ALC. 

I wonder if I could write it in Machine Forth quicker?  Might try that too.

 

 

 

 

  • Like 2

Share this post


Link to post
Share on other sites
1 hour ago, Lee Stewart said:

However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .

 

...lee

I re-did my tests to do 10000 iterations and I get these results.

\   1 as seed value:
\  Forth:     64516 SQRT ->  10000x  62.2 seconds
\  Inlined:   64516 SQRT ->  10000x  42.6 seconds

 

So we are a bit faster by inlining the stuff between the loop words but still far off 17 seconds. :)

I also did Forth "hand optimization" by replacing DUP >R   with DUP>R  

( the inliner chokes on > because it doesn't end in next. I should fix that.) 

 

: SQRT ( n -- )
  DUP
  IF
     INLINE[ DUP>R 1 ]
     INLINE[ [email protected] OVER U/ OVER + 2/ NIP ]
     BEGIN
      INLINE[ [email protected] OVER U/ OVER + 2/ 2DUP ]
    > WHILE
        NIP
     REPEAT
    INLINE[ DROP NIP R> DROP ]
  THEN ;


 

  • Like 2

Share this post


Link to post
Share on other sites
12 hours ago, Lee Stewart said:

UDSQRT , 17 seconds

 

I forgot to account for the loop without UDSQRT , which is 3+ seconds, so UDSQRT itself takes ~14 seconds for 10,000 iterations—~1.4 ms for a single execution of UDSQRT .

 

...lee

  • Like 2

Share this post


Link to post
Share on other sites

Now you are just showing off. :) 

 

Truth be told I am totally impressed with how you converted the C program.

It's above my pay grade.

 

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites

I just did a reality check and found that I have 63Mbytes of source code in my LIB.ITC folder. Yikes. That's a lot to "maintain".

 

Anyway while looking things over I found a limitation in my BLOCK implementation for SAMS.

It works but I was not using my WINDOWS data correctly when testing if a SAMS bank was in memory or not. 

Indexed addressing for win!  I love this processor.

 

Here is the corrected code with better comments.

It has not been fully vetted but it works as expected at the command line.

\ BLOCK using 2 pages of SAMS memory in Low RAM    Mar 18 2022 Brian Fox
NEEDS .S    FROM DSK1.TOOLS
NEEDS MOV,  FROM DSK1.ASM9900

NEEDS SAMSINI FROM DSK1.SAMSINI \ *NEW* common code for SAMS card

\ Note:
\ I realized that I was not using the WINDOWS array as the source
\ of data for the 1st tests. With this code the two windows can be anywhere

\ For reference this the data to manage what banks are in RAM
 VARIABLE USE                       \ index of the last bank# used
 CREATE BLK#S       0 ,    0 ,      \ SAMS bank# in the windows
 CREATE WINDOWS  2000 , 3000 ,      \ array of windows in CPU RAM

CODE BLOCK ( bank# -- buffer)
\ FAST test if we already have the bank# in one of windows
          W CLR,                   \ W is index register = 0
          BLK#S (W) TOS CMP,       \ do we have the requested bank#
          EQ IF,                   \ yes we do
             WINDOWS (W) TOS MOV,  \ use WINDOWS(0) ie: >2000
             NEXT,                 \ Return to Forth
          ENDIF,

          W INCT,                  \ bump index to next "cell"
          BLK#S (W) TOS CMP,
          EQ IF,
              WINDOWS (W) TOS MOV, \ use windows(2) ie: >3000
              NEXT,                \ Return to Forth
          ENDIF,

\ ** bank# is not in RAM. Get it

\ whatever blk# was last used, switch to the other one
           W  0001 LI,    \ init W to 1
         USE @@  W XOR,   \ toggle it with the last buffer we used
         W  USE @@ MOV,   \ update the USE variable. Can only be 1 or 0
         W       W ADD,   \ "do 2*" It now has the index we will use

     TOS BLK#S (W) MOV,   \ store the NEW bank# in blks#s array
    WINDOWS (W) R1 MOV,   \ get the window to use

\ compute address of SAMS card register for this window
          R1    0B SRL,   \ divide by 2048
          R1  4000 AI,    \ Add base address of SAMS registers

          R12 1E00 LI,    \ select CRU address of SAMS card
                 0 SBO,   \ SAMS card on
              TOS  SWPB,  \ swap bytes on bank value
         TOS R1 ** MOV,   \ load bank into SAMS card register
                 0 SBZ,   \ SAMS card off
   WINDOWS (W) TOS MOV,   \ return buffer on TOS
                   NEXT,
ENDCODE

SAMSINI CR .( SAMS card initialized)

 

  • Like 4

Share this post


Link to post
Share on other sites

I was never happy with the way I created the ability to load code in temporary memory and then re-link the dictionary.

It seemed buggy and not clear to me and I wrote it!

It always makes my head spin when I have to re-work dictionary links. Camel Forth uses LINK->NAME ->LINK  linkage.

It's not a easy to hold in my head as LINK->LINK->LINK. 

With a small sketch I was able to get a better mental picture and that helped simplify the code. (But it still is hard to understand) 

 

I also removed the input argument to TRANSIENT. It now just uses the H variable.  Set that where you need it to be.

H= >2000 when the system boots.

 

Spoiler
\ transient compilation                        Mar 19 2022 Brian Fox
\ modified to default to use  H @ for TRANSIENT definitions memory

\ INCLUDE DSK1.TOOLS  \ for debugging
CR .( Compile transient code in LOW RAM and remove it later)
CR .( Remove temporary words with: DETACH )

HEX
VARIABLE OLDDP      \ remember the dictionary pointer
VARIABLE OLDH       \ remember the HEAP (low RAM)
VARIABLE OLDLINK    \ link field of a dummy word after PERMANENT

: TRANSIENT ( -- )
           H @ DUP>R  OLDH !
           HERE OLDDP !    \ save the dictionary pointer.
           R> DP !         \ Point DP to transient memory
;

: PERMANENT  ( Marks end of transient definitions )
        HERE H !                    \ update heap pointer (LOW RAM)
        S"  " HEADER,               \ DUMMY word is blank. Can't be found
        LATEST @ NFA>LFA OLDLINK !  \ Remember LFA of DUMMY
        OLDDP @ DP !                \ restore normal dictionary
        OLDDP OFF
;

\ removes everything from TRANSIENT to this definition
: DETACH    [ LATEST @ ] LITERAL  OLDLINK @ ! ;

 

 

DETACH is my new name to "detach" the TRANSIENT dictionary from the main dictionary. It replaces ERADICATE. :) 

Seemed like a better name.

 

So far it works as expected although it's not nestable.

So you use it to get the assembler, compile the code, DETACH. Then you could do that again for another file.

That's not a real hardship since mostly it's to get the assembler in the system without taking up memory space.

It seems to be a great use of the SUPERCART memory, especially if you are compiling programs to create EA5 executables.

It means you don't have to convert all the Assembler code to machine code to make room for your program but you can still test with real data in LOW RAM if needed.

CR .( SUPERTOOLS: utilities in SUPER Cart RAM  Mar 22 2022)
CR
NEEDS TRANSIENT FROM DSK1.TRANSIENT

CR .( Compile Tools in LOW RAM)
HEX 6000 H !  ( put heap in SUPER CART)
TRANSIENT
  INCLUDE DSK1.WORDLISTS

ONLY FORTH DEFINITIONS
  INCLUDE DSK1.ELAPSE
  INCLUDE DSK1.TOOLS

VOCABULARY ASSEMBLER
  ALSO ASSEMBLER DEFINITIONS
  INCLUDE DSK1.ASM9900

PERMANENT

HEX 2000 H !   \ restore heap to normal low ram
.FREE
DECIMAL
ONLY FORTH DEFINITIONS ALSO ASSEMBLER  
ORDER

 

  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...