
Content Count
3,170 
Joined

Last visited
Posts posted by TheBF


2 hours ago, dhe said:In earlier release of c99 Clint wrote many example programs, attached is one I always loved. It mentally challenging (to me) to picture all that's happening  but it isn't much to look at on the screen. (I need to c if I can find a version of life written in c99).
/* A c version of the Sieve program as suggested by KNUTH */ #define true 1 #define false 0 #define size 8190 #define sized 8191 /**/ int flags[sized]; #asm AORG >8330 #endasm int i,k,prime,count,iter,strikout; #asm RORG #endasm main() { puts("10 iterations\n\n"); iter=1; while(iter++<=10) { strikout=true; count=0; i=0; while(i<=size) flags[i++]=true; i=1; while(i<=size) { if(flags[i]) { prime=i+i+1; ++count; if(strikout) { if((k=((prime*prime)1)>>1)<size) while(k<=size) {flags[k]=false; k=k+prime;} else { strikout=false; continue; } } } ++i; } puts(" working...\n"); } puts("\nDone!\n"); }
I think it is the Eratosthenes sieve. (?)
About how long does it take to run?

I like the way you can drop int variables into hispeed RAM.

Now you are just showing off.
Truth be told I am totally impressed with how you converted the C program.
It's above my pay grade.
 1
 1

1 hour ago, Lee Stewart said:However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .
...lee
I redid my tests to do 10000 iterations and I get these results.
\ 1 as seed value:
\ Forth: 64516 SQRT > 10000x 62.2 seconds
\ Inlined: 64516 SQRT > 10000x 42.6 secondsSo we are a bit faster by inlining the stuff between the loop words but still far off 17 seconds.
I also did Forth "hand optimization" by replacing DUP >R with DUP>R
( the inliner chokes on > because it doesn't end in next. I should fix that.)
: SQRT ( n  ) DUP IF INLINE[ DUP>R 1 ] INLINE[ [email protected] OVER U/ OVER + 2/ NIP ] BEGIN INLINE[ [email protected] OVER U/ OVER + 2/ 2DUP ] > WHILE NIP REPEAT INLINE[ DROP NIP R> DROP ] THEN ;
 2

Just now, OLD CS1 said:Voilà
touché 🤣
 4

5 minutes ago, Lee Stewart said:This change was made before your update to using U/ , etc. It is more compact and about 6 % faster:
However, my UDSQRT is more than 8 times faster, i.e., the original routine takes 142 seconds for 10,000 iterations; the above change, 134 seconds; UDSQRT , 17 seconds—probably because there are 2 divisions in each of the first two and none in UDSQRT .
...lee
Good to know.
8 times is in line with what we see going from ITC Forth to code for many routines so that sounds right.
I didn't really take the time to understand algorithm you are using. It looks very clever.
With all the shifting it makes me wonder if the divisions in Albert's version would net out to similar speed on 9900.
I suppose to have good comparison between the two methods, I have to convert Albert's code to ALC.
I wonder if I could write it in Machine Forth quicker? Might try that too.
 2

48 minutes ago, GDMike said:And
wholla, a PC is born. Lol ...( voila )

1 hour ago, Reciprocating Bill said:BL with >83E0 set up as the WS works fine for addition, subtraction, multiplication and division (I haven't tested anything else). Unfortunately, the actual increase in speed is minimal  on the order of 2% or so. This in a routine using Newton's method to calculate square roots that doesn't do much more than iteratively call FP arithmetic operations. A positive way of saying that is that the relative cost of BWLP @XMLLNK is apparently minimal. (Still, the routine is faster than calling the GPL routine for square root.)
What I would really like is access to the Cortex Basic FP routines. On a 16bit console running an algorithm with a lot of floating point, Cortex Basic (executing out of a FinalGrom) is actually much faster than the console routines called from assembler! Plus it includes equally fast transcendental routines. Sigh.
Do we know what format Cortex FP uses? TI99 is a very big format with decimal coded numbers. 8 bytes long.
You can do very "adequate" FP with 16 bit mantissa and 16 bit exponent that could run much faster but of course not as accurate at TI99's numbers.

I played with this and got this to work for 16 bits numbers but it is still probably not optimal for small numbers.
In the case of biggest numbers we get about 2x improvement using the ROOTS test word.
seed=1 : 31.1 secs
Initseed : 16.4 secs
64516 5000 ELAPSE ROOTS
But doing
9 5000 ELAPSE ROOTS
is 9.7 seconds with seed=1 AND 11.0 seconds with INITSEED which is 8
Here is the file I am using to play around.
Spoiler\ integer square root in Forth. Not too fast but small \ *WARNING* The 16 bit limit is: 65000 SQRT . 254 \ This is 10x faster than linear method \: SQRT ( n  n ) 1 TUCK DO 2+ DUP +LOOP 2/ ; \ INCLUDE DSK1.TOOLS INCLUDE DSK1.ELAPSE \ : U/ ( u1 u2  u3 ) 0 SWAP UM/MOD NIP ; \ machine code is same size as Forth HEX CODE U/ ( u1 u2  u3 ) \ unsigned division C004 , \ TOS R0 MOV, \ divisor>R0 04C4 , \ TOS CLR, \ high word in TOS = 0 C176 , \ *SP+ R5 MOV, \ MOVE low word to r5 3D00 , \ R0 TOS DIV, NEXT, ENDCODE \ By Albert Van der Horst, comp.lang.forth, Aug 29, 2017 \ For n return FLOOR of the square root of n. DECIMAL : INITSEED ( n  n n') DUP 10 RSHIFT 8 MAX ; \ for 16 bits only : SQRT ( n  ) DUP IF DUP >R \ INITSEED ( optimized seed value) \ 64516 SQRT : 5000x 16.4 seconds 1 ( default seed value ) \ 64516 SQRT : 5000x 31.1 seconds [email protected] OVER U/ OVER + 2/ NIP ( DUP . ) \ debug viewing BEGIN [email protected] OVER U/ OVER + 2/ ( DUP .) 2DUP > WHILE NIP REPEAT DROP NIP R> DROP THEN ; : ROOTS ( n1 cnt  n) 0 ?DO DUP SQRT DROP LOOP DROP ;

24 minutes ago, TheBF said:Your intuition is good.
256^2=65536
Looking at that method I believe this is it in Forth:
: SQRT ( n  n ) 1 TUCK DO 2+ DUP +LOOP 2/ ;
 1

Just now, FarmerPotato said:On square roots, I was once fascinated by the long division algorithm, which was an appendix to TI’s Basic Electricity AC/DC Circuits textbook. (Community college level.)
Here’s a web version:
https://byjus.com/maths/squarerootlongdivisionmethod/
I have a hunch that this can be applied to 16 but numbers, where a digit is just 2 bits, and operations are mostly 1 bit shifts and JOC.
My intuition is that the square root in binary has half the number of significant digits. In other words the square root of a 16 bit number is an 8 bit number.Your intuition is good.
256^2=65536

9 hours ago, FarmerPotato said:I hear you about the sprite automotion interrupt. That complicates things. But it's not hard to (disable automation and) roll your own in a user interrupt routine. (if you want to be exact, find the ISR source code in say TI Intern.) There's just too many VDP reads and writes involved in the console routine, for my liking.
I might even do that with a process. Since it's cooperative in my system the sprites will never be out of control.
When I read the motion code it surprised me how involved it was. I like using it because it save space in RAM.
QuoteFor position and motion, I prefer fixed point 16bit X.x where byte X is the screen coordinate, byte x is a fraction, and the velocity is simply added (a signed 16bit quantity but preferably 256 to +256.)
Ooo. I like the sound of that.
Edit. Wait... I think that's what happens in the ROM code.
QuoteThe sorting is done by swapping indices  not the actual sprite data. Because sprite #1 should always be the player, with bullets having next priority, the sprites are written to VDP in original order (nothing to do with the sorted indices.)
Another great idea.
QuoteAnother reason to write the whole sprite table to VDP on each interrupt, is that animating the sprite pattern can happen, too.
I also update the player sprite pattern definition, for rotation (it's expensive to keep ALL the patterns loaded for just one sprite.)
This depend on the application I guess. If a sprite needs to just change between 2 states very fast two different characters are the way to go but yes you can blit paterns in fast enough for most needs.
Quote(the most recent time I used this sprite engine was in "parsec2020" which was only a demo of your ship and a map. But I tested automation with a bunch of asteroids flying around.) I didn't get to the coinc code
You have given me lots to chew on. Thanks.

Here is a test program that I am using to work on this.
Using this loop which is only polling for edges and coincidence it still misses a collision every now and then.
I am thinking about trying your idea, Erik, but running the Sprite table read/write, sorting Y and collision detections it a separate process.
Between automotion running on the interrupt and a separate process for collision detection it frees up the main program to the game itself.
I have some very simple mailboxes for intertask communication so the collision detector sends a message to the game.
The game just polls the mailbox and reads the message. Only then does it deal with the sprites.
One thing that might (?) improve things is stopping the motion of sprites that collided and wait for a reply message from the game to restart them.
This prevents automotion from messing up your universe.
Lots to think about. Thanks Erik.
Spoiler\ Sprite COINC and TRAP Test NEEDS DUMP FROM DSK1.TOOLS NEEDS SPRITE FROM DSK1.DIRSPRIT NEEDS AUTOMOTION FROM DSK1.AUTOMOTION NEEDS HZ FROM DSK1.SOUND NEEDS MARKER FROM DSK1.MARKER NEEDS RND FROM DSK1.RANDOM MARKER /TEST DECIMAL : BOUNCE.X ( spr# ) ]SMT.X DUP [email protected] NEGATE SWAP VC! ; : BOUNCE.Y ( spr# ) ]SMT.Y DUP [email protected] NEGATE SWAP VC! ; : BOUNCE ( spr# ) DUP BOUNCE.X BOUNCE.Y ; : TINK GEN1 1500 HZ 6 DB 40 MS ; : BONK GEN2 120 HZ 4 DB 50 MS ; : TRAPX ( spr#  ) DUP SP.X [email protected] 239 0 WITHIN IF BOUNCE.X TINK EXIT THEN DROP ; : TRAPY ( spr#  ) DUP SP.Y [email protected] 185 0 WITHIN IF BOUNCE.Y TINK EXIT THEN DROP ; : TRAP ( spr#  ) DUP TRAPX TRAPY ; DECIMAL : SPRITES ( n  ) \ makes n sprites ( char colr x y sp# ) [CHAR] 0 3 100 90 0 SPRITE [CHAR] 1 5 100 90 1 SPRITE [CHAR] 2 9 100 90 2 SPRITE ; : RNDV (  x y) 70 RND 10 + 20  ; : RNDXY (  dx dy) RNDV RNDV ; : RUN (  ) 15 SCREEN 1 MAGNIFY PAGE ." CAMEL99 Forth" CR ." Trap/Coinc Test with Automotion" CR SPRITES 25 27 0 MOTION 31 33 1 MOTION 13 25 2 MOTION AUTOMOTION BEGIN 0 TRAP 1 TRAP 2 TRAP 0 1 7 COINC IF 0 BOUNCE 1 BOUNCE BONK THEN 0 2 7 COINC IF 0 BOUNCE 2 BOUNCE BONK THEN 1 2 7 COINC IF 1 BOUNCE 2 BOUNCE BONK THEN GEN1 MUTE GEN2 MUTE ?TERMINAL UNTIL STOPMOTION \ DELALL 8 SCREEN ; CR .( Type RUN to start demo)
 1

6 minutes ago, Lee Stewart said:How is initseed used?
...lee
I never did figure that out.
But from reading the topic Albert indicated that it could save 10 iterations if you compute a good initial seed.
For our application with sprites there were not many iterations required so I just locked it to 1.
Here is the topic. Maybe you can glean something from it.
Faster integer square roots through a seed (google.com)
 3
 1

That's interesting stuff. I am not sure maintaining the sort would be able to keep up on our old girl here if the number of SPRITEs got too high but a neat idea just the same. I think it would have to be CODE and not Forth to work really well albeit reading and writing blocks of VDP RAM proceeds at machine speed.
In the past I had reworked the old TIForth code to test the square boundaries like Lee is doing, but then I found I could make a faster routine that simply computed the difference between the x,y coordinates of two sprites and compare to a tolerance. It's brute force but it seems to take less time than what I had.
Here is the Forth version I had which runs in 1.4mS
: COINC ( spr#1 spr#2 tol  ?) >R POSITION ROT POSITION (  x1 y1 x2 y2 ) ROT  ABS [email protected] < ROT  ABS R> < AND ;
And here is the slightly improved version from my recent work, which runs in 1.2mS
HEX CODE DXY ( x y x2 y2  dx dy) *SP+ R1 MOV, \ x2 *SP+ TOS SUB, \ y2=y2y TOS ABS, R1 *SP SUB, \ x=xx2 *SP ABS, NEXT, ENDCODE : COINC ( spr#1 spr#2 tol  ?) >R POSITION ROT POSITION (  x1 y1 x2 y2 ) DXY [email protected] < SWAP R> < AND ;
An important part of using these is to call COINCALL in the primary loop which is very fast being just a byte fetch.
However what I like about your idea is that is could probably handle more sprite coincidences simultaneously.
In a game like asteroids for example your method probably performs better.
I also do something different for getting at the SPRITE table. I have an integer fetch routine for VDP called [email protected] so I read x,y at once.
Then I have a SPLIT word that splits that into two bytes.
For other situations I have turned the SPRITE attribute table into 4 fast arrays so I can read each field independently.
: TABLE4: ( Vaddr  ) \ create a table of 4 byte records CREATE , \ compile base address into this word ;CODE ( n  Vaddr') \ RUN time 0A24 , \ TOS 2 SLA, ( tos = n x 4 ) A118 , \ *W TOS ADD, NEXT, ENDCODE SAT TABLE4: SP.Y SAT 1+ TABLE4: SP.X SAT 2+ TABLE4: SP.PAT SAT 3 + TABLE4: SP.COLR
With this POSITION is defined as: ( removed the limit tests)
: POSITION ( sprt#  dx dy ) ( ?NDX) S" SP.Y [email protected] SPLIT" EVALUATE ; IMMEDIATE
 1

After thinking about it I decided that DISTANCE was a good general purpose function and so it can stand on it's own.
If you want to compute the DISTANCE between two sprites it is now trivial.
: SP.DIST ( spr spr  n) POSITION ROT POSITION DISTANCE ;
So here is how the DISTANCE library file looks now. The only thing I might optimize as CODE is DXY.
There are lot of ROTs in code when your are manipulating x,y coordinates so removing two on an intermediate word make me feel better.
Spoiler\ DISTANCE.FTH compute distance between 2 coordidates Mar 14 2022 Brian Fox \ Max range is 255 pixels with "out of range" flag. HERE \ machine code is same size as Forth HEX \ : U/ ( u1 u2  u3 ) 0 SWAP UM/MOD NIP ; CODE U/ ( u1 u2  u3 ) \ unsigned division C004 , \ TOS R0 MOV, \ divisor>R0 04C4 , \ TOS CLR, \ high word in TOS = 0 C176 , \ *SP+ R5 MOV, \ MOVE low word to r5 3D00 , \ R0 TOS DIV, NEXT, ENDCODE \ SQRT by Albert Van der Horst, comp.lang.forth, Aug 29, 2017 \ Newtonian Interpolation. ~10X faster than linear method \ Returns FLOOR of the square root of n. DECIMAL : SQRT ( n  n') DUP IF >R 1 \ 1st seed [email protected] OVER U/ OVER + 2/ NIP ( DUP . ) \ debug viewing BEGIN [email protected] OVER U/ OVER + 2/ ( DUP .) 2DUP > WHILE NIP REPEAT DROP R> DROP THEN ; DECIMAL : DXY ( x y x y  dx dy) ROT  ROT  ; : SUMSQR ( n1 n2  d) DUP * SWAP DUP * 0 ROT 0 D+ ; : DISTANCE ( x y x y  n) DXY SUMSQR IF DROP TRUE EXIT THEN SQRT ; HERE SWAP  . ( 170 bytes)
 2

Bugs R Us
The stuff you find when you try to do serious work with homemade code.
I really should be more professional and run the HAYES test suite on this Forth.
The good news: It was simple to add an unsigned division word to Forth because 9900 has an instruction in order to make Albert's SQRT work on unsigned numbers.
The bad news: I found that the word 2/ should be a logical shift not an arithmetic shift.
I never... actually, like, ummm... read the spec. Oops.
2/ was simple to fix but it means that Camel99 Forth has an official BUG in the wild. 🐛 (that's a caterpillar. We don't have a bug emoji)
If you need to use 2/ with negative numbers add this new definition to your program until I get a new release out.
HEX CODE 2/ ( n  n) 0914 , \ TOS 1 SRL, \ **BUG FIX** was SRA. DUH! NEXT, ENDCODE
 3

I had a memory of some work done by Albert Van der Horst on square roots on comp.lang.forth so I went looking.
He made a version using Newtonian interpolation.
You can play games to find the best seed but even using a seed of 1 the results are amazing. It is over 10 times faster!
Unfortunately the current version dies on negative numbers so I am back in the trench.
But it's a good start.
\ By Albert Van der Horst, comp.lang.forth, Aug 29, 2017 \ For n return FLOOR of the square root of n. VARIABLE seed 1 seed ! \ While calculating roots near each other., the seed can be kept. \ Otherwise this can be used to save 10 iterations. : initseed DUP 10 RSHIFT 1024 MAX seed ! ; : SQRT ( n  ) DUP IF >R seed @ [email protected] OVER / OVER + 2/ NIP ( DUP . ) \ debug viewing BEGIN [email protected] OVER / OVER + 2/ ( DUP .) 2DUP > WHILE NIP REPEAT \ seed ! R> DROP THEN ; : TESTROOTAV [email protected] SWAP SQRT [email protected] SWAP . CR  213 10 */ . ." uS" ;
 2

DIST^2 is a usable word. It's not super speedy at 1.6 milliseconds, but it's not bad and it gives ranges out to 65535.
DECIMAL : ^2 ( n  d) S" DUP *" EVALUATE ; IMMEDIATE \ 1.6 milliseconds : DIST^2 ( spr1 spr2  n ?) \ ? = 0, range is valid POSITION ROT POSITION (  x y x2 y2) ROT  ROT  (  diffy diffx) ^2 SWAP ^2 (  dx^2 dy^2) 0 ROT 0 D+ ; \ convert to Usigned doubles and add
Combined with a 16 bit square root word it works like this:
: SQRT ( n  n ) 1 TUCK DO 2+ DUP +LOOP 2/ ; : DISTANCE ( n n  n) DIST^2 IF DROP TRUE EXIT THEN SQRT ;
If you get a 1 the sprite distance is out of range.
The whole thing adds 134 bytes to the system.
Edit: ( If we remove the text macro for ^2 it uses 114 bytes)
 1

I did a little "googling" to supplement my poor math skills and found this page.
http://www.azillionmonkeys.com/qed/sqroot.html
Section 5 is interesting and describes what I think the TI Forth engineers were using.Quote used without permission. Mea culpa.
"A common application in computer graphics, is to work out the distance between two points as √(Δx^{2}+Δy^{2}).
However, for performance reasons, the square root operation is a killer, and often, very crude approximations are acceptable.
So we examine the metrics (1 / √2)*(x+y), and max(x,y)"
TIFORTH limited the values to 32K.
My version shows that we can go unsigned and expand the range.
And with a 32 bit accumulator we get a good "outofrange" flag.
This might be adequate for a wide range of applications , significantly improve on the TIFORTH range of measurement and as a CODE word it would be very fast.
Keeps me occupied and out of trouble.
 1

3 hours ago, GDMike said:It's going to be SNP all the way, with the ability to add a DB on any screen the user selects. Then everything SNP will still work PRIOR to the DB screen, as the rest of SAMs will be used for the DB and blocked to the user. Hopefully that helps clear that up.
I will need additional program space, enter.. supercart. But the battery doesn't need to work as the program will just load up in that space at run time.
Maybe I'm using the wrong version of classic 99, but I'm not able to write to >6000 ?
I'll have to research what I'm doing wrong.
When I load something, CL99 reboots.
It's something I'm doing or not and or wrong version I guess, I'll look for the latest tonight at work.
As far as I know Classic99 has 8K and >6000 all the time. I use it a lot for utilities.
 2

9 minutes ago, Lee Stewart said:I ported a C program to take an unsigned double (32bit) square to a single (16bit) square root. It consumes 128 bytes in the fbForth dictionary:
The TI99/4A’s 192x256 resolution has a maximum pixel distance, corner to corner, of ~319 pixels: √(255^{2} + 191^{2}) ≈ 318.6. With anything more, one of the sprites is off screen, so I guess you are involving offscreen distances in your calculations, but to what end?
...lee
Ah yes. Very nice work. Thanks.
Amazing how many instructions it takes in assembler.
I was just trying to see how far off the calculations were with the offscreen sprite.
I will double check this thing with the 256 x 192 coordinates. I may be able to get closer on the reduced size.
 1

Pythagoras + a bit of Fudge
I decided to remove my "improved" TIFORTH SPRITE code from my DIRSPRIT ( direct sprites) library.
I changed my coincidence code awhile back to just compare numbers in the sprite x,y values in VDP RAM, so it was not really being used.
The TIFORTH distance between sprites calculation was always suboptimal returning the distance squared and if the sums overflowed it just returned 32767.
I think I remember Lee fixed this with floating point math but I went down the rabbit hole of how to do it with integers.
I had a pretty quick square root word for 16 bits which I found somewhere years ago, in Forth magazine I think.
It's pretty clever but being 16 bits it doesn't work past 65535.
With a 255x195 screen summing the squares can be done to 32 bit resolution so that's no problem.
But I could not figure out how to get the square root of a 32 bit number without a lot more work.
I realized that if when I did the 32 bit addition unsigned, I got a 1 in the upper bits of the double number.
Using that 1 as an overflow flag I could perform the square root on the lower 16 bits and if there was an overflow I can fudge the value to give something useful.
Not perfect Pythagorean computation but much easier to do than fighting with a 32 bit square root calculation I think.
The fudge factor calculation takes the incorrect square root (in an overflow), divides it by PI and adds it to 255.
This cause the values to be slightly compressed as you get farther away but it would still work in a game.
A little bit like those mirrors on your car that say "Things appear closer than they are".
\ DISTANCE.FTH compute distance between any two sprites Mar 12 2022 NEEDS DUMP FROM DSK1.TOOLS NEEDS SPRITE FROM DSK1.DIRSPRIT NEEDS AUTOMOTION FROM DSK1.AUTOMOTION MARKER NEW DECIMAL : ^2 ( n n  n) DUP * ; : DIFF ( x y x y  dx dy) ROT  ROT  ; : SUM ( dx^2 dy^2  d) 0 ROT 0 D+ ; \ sum squares to 32 bit resolution : SUMSQUARES ( x y x y  d ) DIFF ^2 SWAP ^2 SUM ; : SQRT ( n  n ) 1 TUCK DO 2+ DUP +LOOP 2/ ; : PI/ ( n  n ) 10000 31415 */ ; : DISTANCE ( x y x y  n) SUMSQUARES >R SQRT R> IF PI/ 255 + THEN ; : SP.DISTXY ( x y spr#  dist ) POSITION DISTANCE ; : SP.DIST ( spr#1 spr#2  dist ) POSITION ROT SP.DISTXY ;
And here is some test code
\ test code DECIMAL \ char clr x y spr# \  CHAR 0 6 0 0 0 SPRITE CHAR 1 10 240 85 1 SPRITE CHAR 2 9 255 255 2 SPRITE CHAR 3 11 127 90 3 SPRITE CHAR 4 13 199 149 4 SPRITE 1 MAGNIFY CLEAR 0 0 SP.DIST . 0 1 SP.DIST . 1 2 SP.DIST . 2 3 SP.DIST . 0 3 SP.DIST . 0 4 SP.DIST . 0 2 SP.DIST .
You can see the compression in the screen capture. Distance between sprite 0 and sprite 2 should 360, but it's only 335.
This error only occurs after the distance exceeds 255.
I may still pursue a 32bit Square root but this seems useable.
 2

Can't we just look as the assembler code output?
Glorious People's small c compiler
in TI99/4A Development
Posted
Running in CPU Overdrive?
So that is about 23 times faster than normal so at normal speed ~115 seconds, 1:55 ish.
I am just curious so I can benchmark my compiler projects.
As I recall Small C was not too speedy in the old days but beat the heck out of BASIC.