Jump to content


+AtariAge Subscriber
  • Content Count

  • Joined

  • Last visited

Posts posted by TheBF

  1. Just now, dhe said:

    It's all Clint's work. I liked the use of high speed ram also. On classic99 pumped up to make speed it finished in about 5 seconds.

    Running in CPU Overdrive?

    So that is about 23 times faster than normal so at normal speed ~115 seconds, 1:55 ish.


    I am just curious so I can benchmark my compiler projects.

    As I recall Small C was not too speedy in the old days but beat the heck out of BASIC. 


    • Like 2

  2. 2 hours ago, dhe said:

    In earlier release of c99 Clint wrote many example programs, attached is one I always loved. It mentally challenging (to me) to picture all that's happening - but it isn't much to look at on the screen. (I need to c if I can find a version of life written in c99).


    /* A c version of the Sieve program as suggested by KNUTH
    #define true 1
    #define false 0
    #define size 8190
    #define sized 8191
    int flags[sized];
     AORG >8330
    int i,k,prime,count,iter,strikout;
    { puts("10 iterations\n\n");
      { strikout=true;
        while(i<=size) flags[i++]=true;
        { if(flags[i])
          { prime=i+i+1;
            { if((k=((prime*prime)-1)>>1)<size)
              while(k<=size) {flags[k]=false; k=k+prime;}
              { strikout=false;
        puts(" working...\n");



    I think it is the Eratosthenes sieve. (?) 

    About how long does it take to run? 

  3. 1 hour ago, Lee Stewart said:

    However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .



    I re-did my tests to do 10000 iterations and I get these results.

    \   1 as seed value:
    \  Forth:     64516 SQRT ->  10000x  62.2 seconds
    \  Inlined:   64516 SQRT ->  10000x  42.6 seconds


    So we are a bit faster by inlining the stuff between the loop words but still far off 17 seconds. :)

    I also did Forth "hand optimization" by replacing DUP >R   with DUP>R  

    ( the inliner chokes on > because it doesn't end in next. I should fix that.) 


    : SQRT ( n -- )
         INLINE[ DUP>R 1 ]
         INLINE[ [email protected] OVER U/ OVER + 2/ NIP ]
          INLINE[ [email protected] OVER U/ OVER + 2/ 2DUP ]
        > WHILE
      THEN ;


    • Like 2


    5 minutes ago, Lee Stewart said:


    This change was made before your update to using U/ , etc. It is more compact and about 6 % faster:

    However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .



    Good to know.

    8 times is in line with what we see going from ITC Forth to code for many routines so that sounds right.

    I didn't really take the time to understand algorithm you are using. It looks very clever.  

    With all the shifting it makes me wonder if the divisions in Albert's version would net out to similar speed on 9900.


    I suppose to have good comparison between the two methods, I have to convert Albert's code to ALC. 

    I wonder if I could write it in Machine Forth quicker?  Might try that too.





    • Like 2

  5. 1 hour ago, Reciprocating Bill said:

    BL with >83E0 set up as the WS works fine for addition, subtraction, multiplication and division (I haven't tested anything else). Unfortunately, the actual increase in speed is minimal - on the order of 2% or so. This in a routine using Newton's method to calculate square roots that doesn't do much more than iteratively call FP arithmetic operations. A positive way of saying that is that the relative cost of BWLP @XMLLNK is apparently minimal. (Still, the routine is faster than calling the GPL routine for square root.)


    What I would really like is access to the Cortex Basic FP routines. On a 16-bit console running an algorithm with a lot of floating point, Cortex Basic (executing out of a FinalGrom) is actually much faster than the console routines called from assembler! Plus it includes equally fast transcendental routines. Sigh.  

    Do we know what format Cortex FP uses?  TI-99 is a very big format with decimal coded numbers. 8 bytes long.

    You can do very "adequate" FP with 16 bit mantissa and 16 bit exponent that could run much faster but of course not as accurate at TI-99's numbers.

  6. @Lee Stewart


    I played with this and got this to work for 16 bits numbers but it is still probably not optimal for small numbers. 

    In the case of biggest numbers we get about 2x improvement using the ROOTS test word.

    seed=1   :   31.1 secs

    Init-seed :   16.4 secs

    64516 5000 ELAPSE ROOTS 

    But doing 

    9 5000 ELAPSE ROOTS 

    is 9.7 seconds with seed=1  AND 11.0 seconds with INIT-SEED which is 8


    Here is the file I am using to play around.

    \ integer square root in Forth.  Not too fast but small
    \ *WARNING* The 16 bit limit is:  65000 SQRT . 254
    \ This is 10x faster than linear method
    \: SQRT ( n -- n ) -1 TUCK DO   2+  DUP +LOOP   2/ ;
    \ : U/  ( u1 u2 -- u3 )  0 SWAP UM/MOD NIP ;
    \ machine code is same size as Forth
    CODE U/   ( u1 u2 -- u3 ) \ unsigned division
        C004 ,  \   TOS R0 MOV,   \ divisor->R0
        04C4 ,  \      TOS CLR,   \ high word in TOS = 0
        C176 ,  \ *SP+  R5 MOV,   \ MOVE low word to r5
        3D00 ,  \   R0 TOS DIV,
    \ By Albert Van der Horst, comp.lang.forth, Aug 29, 2017
    \ For n return FLOOR of the square root of n.
    : INIT-SEED ( n -- n n') DUP 10 RSHIFT 8 MAX  ; \ for 16 bits only
    : SQRT ( n -- )
         DUP >R
      \   INIT-SEED   ( optimized seed value) \ 64516 SQRT : 5000x 16.4 seconds
          1   ( default seed value )          \ 64516 SQRT : 5000x 31.1 seconds 
         [email protected] OVER U/ OVER + 2/ NIP ( DUP . ) \ debug viewing
            [email protected] OVER U/ OVER + 2/  ( DUP .)
            2DUP >
         R> DROP
      THEN ;
    : ROOTS ( n1 cnt -- n) 0 ?DO  DUP SQRT DROP  LOOP DROP ;




  7. Just now, FarmerPotato said:

    On square roots, I was once fascinated by the long division algorithm, which was an appendix to TI’s Basic Electricity AC/DC Circuits textbook. (Community college level.) 


    Here’s a web version:


    I have a hunch that this can be applied to 16 but numbers, where a digit is just 2 bits, and operations are mostly 1 bit shifts and JOC. 

    My intuition is that the square root in binary has half the number of significant digits. In other words the square root of a 16 bit number is an 8 bit number. 


    Your intuition is good.



  8. 9 hours ago, FarmerPotato said:

    I hear you about the sprite auto-motion interrupt. That complicates things. But it's not hard to (disable automation and) roll your own in a user interrupt routine. (if you want to be exact, find the ISR source code in say TI Intern.) There's just too many VDP reads and writes involved in the console routine, for my liking.

    I might even do that with a process. Since it's cooperative in my system the sprites will never be out of control.

    When I read the motion code it surprised me how involved it was. I like using it because it save space in RAM.



    For position and motion, I prefer fixed point 16-bit X.x where byte X is the screen coordinate, byte x is a fraction, and the velocity is simply added (a signed 16-bit quantity but preferably -256 to +256.)

    Ooo. I like the sound of that.

    Edit. Wait... I think that's what happens in the ROM code.



    The sorting is done by swapping indices - not the actual sprite data. Because sprite #1 should always be the player, with bullets having next priority, the sprites are written to VDP in original order (nothing to do with the sorted indices.)

    Another great idea.



    Another reason to write the whole sprite table to VDP on each interrupt, is that animating the sprite pattern can happen, too.

    I also update the player sprite pattern definition, for rotation (it's expensive to keep ALL the patterns loaded for just one sprite.)

    This depend on the application I guess. If a sprite needs to just change between 2 states very fast two different characters are the way to go but yes you can blit paterns in fast enough for most needs.




    (the most recent time I used this sprite engine was in "parsec2020" which was only a demo of your ship and a map. But I tested automation with a bunch of asteroids flying around.) I didn't get to the coinc code


    You have given me lots to chew on. Thanks.  

  9. Here is a test program that I am using to work on this.

    Using this loop which is only polling for edges and coincidence it still misses a collision every now and then.


    I am thinking about trying your idea, Erik, but running the Sprite table read/write, sorting Y and collision detections it a separate process.

    Between auto-motion running on the interrupt and a separate process for collision detection it frees up the main program to the game itself.


    I have some very simple mailboxes for inter-task communication so the collision detector sends a message to the game.


    The game just polls the mailbox and reads the message. Only then does it deal with the sprites.

    One thing that might (?) improve things is stopping the motion of sprites that collided and wait for a reply message from the game to re-start them.

    This prevents automotion from messing up your universe.

    Lots to think about. Thanks Erik.


    \ Sprite COINC and TRAP Test
    : BOUNCE.X  ( spr# --) ]SMT.X  DUP [email protected] NEGATE  SWAP VC! ;
    : BOUNCE.Y  ( spr# --) ]SMT.Y  DUP [email protected] NEGATE  SWAP VC! ;
    : BOUNCE    ( spr# --) DUP BOUNCE.X BOUNCE.Y  ;
    : TINK    GEN1  1500 HZ  -6 DB 40 MS  ;
    : BONK    GEN2  120 HZ  -4 DB  50 MS  ;
    : TRAPX ( spr# -- )
          DUP SP.X [email protected]
          DROP  ;
    : TRAPY ( spr# -- )
          DUP SP.Y [email protected]
          DROP  ;
    : TRAP ( spr# -- ) DUP TRAPX TRAPY   ;
    : SPRITES ( n -- ) \ makes n sprites
          (    char   colr x   y  sp# )
          [CHAR] 0     3   100  90  0 SPRITE
          [CHAR] 1     5   100  90  1 SPRITE
          [CHAR] 2     9   100  90  2 SPRITE
    : RNDV   ( -- x y)  70 RND 10 + 20 -  ;
    : RNDXY  ( -- dx dy)  RNDV RNDV ;
    : RUN ( -- )
        15 SCREEN
        1 MAGNIFY
        PAGE ." CAMEL99 Forth"
        CR   ." Trap/Coinc Test with Automotion"
          25  27 0 MOTION
         -31 -33 1 MOTION
         -13  25 2 MOTION
             0 TRAP  1 TRAP  2 TRAP
             0 1 7 COINC IF 0 BOUNCE 1 BOUNCE BONK THEN
             0 2 7 COINC IF 0 BOUNCE 2 BOUNCE BONK THEN
             1 2 7 COINC IF 1 BOUNCE 2 BOUNCE BONK THEN
            GEN1 MUTE
            GEN2 MUTE
    \    DELALL
        8 SCREEN ;
    CR .( Type RUN to start demo)



    • Like 1

  10. 6 minutes ago, Lee Stewart said:


    How is init-seed used?



    I never did figure that out. :)

    But from reading the topic Albert indicated that it could save 10 iterations if you compute a good initial seed.

    For our application with sprites there were not many iterations required so I just locked it to 1.


    Here is the topic. Maybe you can glean something from it. 

    Faster integer square roots through a seed (google.com)


    • Like 3
    • Thanks 1

  11. That's interesting stuff. I am not sure maintaining the sort would be able to keep up on our old girl here if the number of SPRITEs got too high but a neat idea just the same. I think it would have to be CODE and not Forth to work really well albeit reading and writing blocks of VDP RAM proceeds at machine speed.


    In the past I had re-worked the old TI-Forth code to test the square boundaries like Lee is doing, but then I found I could make a faster routine that simply computed the difference between the x,y coordinates of two sprites and compare to a tolerance. It's brute force but it seems to take less time than what I had.


    Here is the Forth version I had which runs in 1.4mS

    : COINC ( spr#1 spr#2 tol -- ?)
              POSITION ROT POSITION ( -- x1 y1 x2 y2 )
              ROT - ABS [email protected] <
             -ROT - ABS R> < AND


    And here is the slightly improved version from my recent work, which runs in 1.2mS

    CODE DXY   ( x y x2 y2 -- dx dy)
      *SP+  R1 MOV, \ x2
      *SP+ TOS SUB, \ y2=y2-y
           TOS ABS,
       R1  *SP SUB, \ x=x-x2
           *SP ABS,
    : COINC ( spr#1 spr#2 tol -- ?)
              POSITION ROT POSITION ( -- x1 y1 x2 y2 )
              DXY  [email protected]  < 
              SWAP R> < AND

    An important part of using these is to call COINCALL  in the primary loop which is very fast being just a byte fetch. 


    However what I like about your idea is that is could probably handle more sprite coincidences simultaneously.

    In a game like asteroids for example your method probably performs better. 


    I also do something different for getting at the SPRITE table.  I have an integer fetch routine for VDP called [email protected] so I read x,y at once.

    Then I have a SPLIT word that splits that into two bytes.


    For other situations I have turned the SPRITE attribute table into 4 fast arrays so I can read each field independently.


    : TABLE4: ( Vaddr -- )  \ create a table of 4 byte records
             CREATE    ,             \ compile base address into this word
            ;CODE ( n -- Vaddr')     \ RUN time
                 0A24 ,  \ TOS 2 SLA,  ( tos = n x 4 )
                 A118 ,  \ *W TOS ADD,
    SAT     TABLE4: SP.Y
    SAT 1+  TABLE4: SP.X
    SAT 2+  TABLE4: SP.PAT


    With this POSITION is defined as:  ( removed the limit tests) 

    : POSITION  ( sprt# -- dx dy ) ( ?NDX) S" SP.Y [email protected] SPLIT" EVALUATE ; IMMEDIATE




    • Like 1

  12. After thinking about it I decided that DISTANCE was a good general purpose function and so it can stand on it's own.

    If you want to compute the DISTANCE between two sprites it is now trivial.



    So here is how the DISTANCE library file looks now.  The only thing I might optimize as CODE is DXY.

    There are lot of ROTs in code when your are manipulating x,y coordinates so removing two on an intermediate word make me feel better. :)


    \ DISTANCE.FTH  compute distance between 2 coordidates  Mar 14 2022 Brian Fox
    \ Max range is 255 pixels with "out of range" flag.
    \ machine code is same size as Forth
    HEX \ : U/  ( u1 u2 -- u3 )  0 SWAP UM/MOD NIP ;
    CODE U/   ( u1 u2 -- u3 )     \ unsigned division
        C004 ,  \   TOS R0 MOV,   \ divisor->R0
        04C4 ,  \      TOS CLR,   \ high word in TOS = 0
        C176 ,  \ *SP+  R5 MOV,   \ MOVE low word to r5
        3D00 ,  \   R0 TOS DIV,
    \ SQRT by Albert Van der Horst, comp.lang.forth, Aug 29, 2017
    \ Newtonian Interpolation. ~10X faster than linear method
    \ Returns FLOOR of the square root of n.
    : SQRT ( n -- n')
      IF >R
         1  \ 1st seed
         [email protected] OVER U/ OVER + 2/ NIP ( DUP . ) \ debug viewing
            [email protected] OVER U/ OVER + 2/  ( DUP .)
            2DUP >
         R> DROP
      THEN ;
    : DXY  ( x y x y -- dx dy) ROT -  -ROT - ;
    : SUMSQR   ( n1 n2 -- d) DUP * SWAP DUP * 0 ROT 0 D+ ;
    HERE SWAP - .  ( 170 bytes)



    • Like 2

  13. Bugs R Us

    The stuff you find when you try to do serious work with homemade code.

    I really should be more professional and run the HAYES test suite on this Forth.


    The good news: It was simple to add an un-signed division word to Forth because 9900 has an instruction in order to make Albert's SQRT work on un-signed numbers.

    The bad  news: I found that the word 2/ should be a logical shift not an arithmetic shift.

    I never... actually, like, ummm... read the spec. :( Oops.



    2/ was simple to fix but it means that Camel99 Forth has an official BUG in the wild. 🐛 (that's a caterpillar. We don't have a bug emoji)


    If you need to use 2/ with negative numbers add this new definition to your program until I get a new release out.

    CODE 2/  ( n -- n)  
          0914 , \  TOS 1 SRL,   \ **BUG FIX**  was SRA. DUH!


    • Like 3

  14. I had a memory of some work done by Albert Van der Horst on square roots on comp.lang.forth so I went looking.

    He made a version using Newtonian interpolation. 

    You can play games to find the best seed but even using a seed of 1 the results are amazing. It is over 10 times faster!

    Unfortunately the current version dies on negative numbers so I am back in the trench.

    But it's a good start.


    \ By Albert Van der Horst, comp.lang.forth, Aug 29, 2017
    \ For n return FLOOR of the square root of n.
    VARIABLE seed 1  seed !
    \ While calculating roots near each other., the seed can be kept.
    \ Otherwise this can be used to save 10 iterations.
    : init-seed  DUP 10 RSHIFT 1024 MAX seed ! ;
    : SQRT ( n -- )
         seed @
         [email protected] OVER / OVER + 2/ NIP ( DUP . ) \ debug viewing
            [email protected] OVER / OVER + 2/  ( DUP .)
            2DUP >
    \     seed !
         R> DROP
      THEN ;
    : TESTROOTAV  [email protected] SWAP SQRT [email protected] SWAP .
             CR  -  213 10 */  . ." uS"  ;


    • Like 2

  15. DIST^2  is a usable word.  It's not super speedy at 1.6 milli-seconds,  but it's not bad and it gives ranges out to 65535.

    : ^2    ( n -- d)   S" DUP *" EVALUATE ; IMMEDIATE 
    \ 1.6 milli-seconds
    : DIST^2 ( spr1 spr2 -- n ?)  \ ? = 0, range is valid
          POSITION ROT POSITION ( -- x y x2 y2)
          ROT -  -ROT -         ( -- diffy diffx)
          ^2 SWAP ^2            ( -- dx^2 dy^2)
          0 ROT 0 D+ ;          \ convert to Usigned doubles and add

    Combined with a 16 bit square root word it works like this:

    : SQRT ( n -- n ) -1 TUCK DO   2+  DUP +LOOP 2/ ;

    If you get a -1 the sprite distance is out of range. 

    The whole thing adds 134 bytes to the system.


    Edit: ( If we remove the text macro for ^2  it uses 114 bytes) :)


    • Like 1

  16. I did a little "googling" to supplement my poor math skills and found this page. 



    Section 5  is interesting and describes what I think the TI Forth engineers were using.


    Quote used without permission. Mea culpa.

    "A common application in computer graphics, is to work out the distance between two points as √(Δx2+Δy2).

    However, for performance reasons, the square root operation is a killer, and often, very crude approximations are acceptable.

    So we examine the metrics (1 / √2)*(|x|+|y|), and max(|x|,|y|)"


    TI-FORTH limited the values to 32K.

    My version shows that we can go un-signed and expand the range.

    And with a 32 bit accumulator we get a good "out-of-range" flag.

    This might be adequate for a wide range of applications , significantly improve on the TI-FORTH range of measurement and as a CODE word it would be very fast.

    Keeps me occupied and out of trouble. ;) 



    • Like 1

  17. 3 hours ago, GDMike said:

    It's going to be SNP all the way, with the ability to add a DB on any screen the user selects. Then everything SNP will still work PRIOR to the DB screen, as the rest of SAMs will be used for the DB and blocked to the user. Hopefully that helps clear that up.

    I will need additional program space, enter..  supercart. But the battery doesn't need to work as the program will just load up in that space at run time.

    Maybe I'm using the wrong version of classic 99, but I'm not able to write to >6000 ? 

    I'll have to research what I'm doing wrong.

    When I load something, CL99 reboots.

    It's something I'm doing or not and or wrong version I guess, I'll look for the latest tonight at work.

    As far as I know Classic99 has 8K and >6000 all the time. I use it a lot for utilities.


    • Like 2

  18. 9 minutes ago, Lee Stewart said:


    I ported a C program to take an unsigned double (32-bit) square to a single (16-bit) square root. It consumes 128 bytes in the fbForth dictionary:


    The TI-99/4A’s 192x256 resolution has a maximum pixel distance, corner to corner, of ~319 pixels: √(2552 + 1912) ≈ 318.6. With anything more, one of the sprites is off screen, so I guess you are involving off-screen distances in your calculations, but to what end?



    Ah yes.  Very nice work. Thanks.

    Amazing how many instructions it takes in assembler. 


    I was just trying to see how far off the calculations were with the off-screen sprite.

    I will double check this thing with the 256 x 192 coordinates. I may be able to get closer on the reduced size.



    • Like 1

  19. Pythagoras + a bit of Fudge


    I decided to remove my "improved" TI-FORTH SPRITE code from my DIRSPRIT ( direct sprites) library.

    I changed my coincidence code awhile back to just compare numbers in the sprite x,y values in VDP RAM, so it was not really being used.


    The TI-FORTH distance between sprites calculation was always sub-optimal returning the distance squared and if the sums overflowed it just returned 32767.


    I think I remember Lee fixed this with floating point math but I went down the rabbit hole of how to do it with integers. :) 


    I had a pretty quick square root word for 16 bits which I found somewhere years ago, in Forth magazine I think.

    It's pretty clever but being 16 bits it doesn't work past 65535.


    With a 255x195 screen summing the squares can be done to 32 bit resolution so that's no problem.

    But I could not figure out how to get the square root of a 32 bit number without a lot more work. 


    I realized that if when I did the 32 bit addition unsigned, I got a 1 in the upper bits of the double number.

    Using that 1 as an overflow flag I could perform the square root on the lower 16 bits and if there was an overflow I can fudge the value to give something useful.

    Not perfect Pythagorean computation but much easier to do than fighting with a 32 bit square root calculation I think. 


    The fudge factor calculation takes the incorrect square root (in an over-flow), divides it by PI and adds it to 255. 

    This cause the values to be slightly compressed as you get farther away but it would still work in a game.

    A little bit like those mirrors on your car that say "Things appear closer than they are". :)


    \ DISTANCE.FTH  compute distance between any two sprites         Mar 12 2022
    : ^2  ( n n -- n)  DUP * ;
    : DIFF  ( x y x y -- dx dy)  ROT -  -ROT - ;
    : SUM   ( dx^2 dy^2 -- d) 0 ROT 0 D+ ;  \ sum squares to 32 bit resolution
    : SUM-SQUARES ( x y x y -- d ) DIFF  ^2 SWAP ^2  SUM ;
    : SQRT   ( n -- n ) -1 TUCK DO  2+  DUP +LOOP  2/ ;
    : PI/   ( n -- n ) 10000 31415 */ ;
    : DISTANCE    ( x y x y -- n) SUM-SQUARES >R SQRT R> IF  PI/ 255 +  THEN ;
    : SP.DISTXY   ( x  y  spr#  -- dist ) POSITION DISTANCE ;
    : SP.DIST     ( spr#1 spr#2 -- dist ) POSITION ROT SP.DISTXY ;

    And here is some test code

    \ test code
    \  char clr   x    y  spr#
    \ -------------------------
    CHAR 0   6     0    0  0 SPRITE
    CHAR 1  10   240  85  1 SPRITE
    CHAR 2   9   255  255  2 SPRITE
    CHAR 3  11   127   90  3 SPRITE
    CHAR 4  13   199  149  4 SPRITE
    0 0 SP.DIST .
    0 1 SP.DIST .
    1 2 SP.DIST .
    2 3 SP.DIST .
    0 3 SP.DIST .
    0 4 SP.DIST .
    0 2 SP.DIST .


    You can see the compression in the screen capture.  Distance between sprite 0 and sprite 2 should 360, but it's only 335.

    This error only occurs after the distance exceeds 255. 

    I may still pursue a 32bit Square root but this seems useable.



    • Like 2
  • Create New...