-
Posts
4,463 -
Joined
-
Last visited
Content Type
Profiles
Forums
Blogs
Gallery
Events
Store
Posts posted by TheBF
-
-
I missed that but it I have that commented out in mine.
It's like
: COINC COINCALL 0= IF EXIT THEN ...CONTINUED CODE HERE
I will run that on the collider too. It should make the internal loop spin faster but sometimes the overhead of making decisions is slower than just doing the test.
I will remove the BREAK code in the loop now that it is reliable as well. Numbers should change a little.
-
I don't fully understand your trials and tribulations of course, but once you get it working the way you want it would be possible to retry SAMS.
Maybe even to have multiple files in memory that can be selected. Just page out the blocks where the text is located with some other pages.
Keep at it.
- 1
-
Been busy today but finally tried my sprite collider with some different versions of COINC.
So my new COINC is this. v@ reads 2 bytes. The code word SPLIT, splits the word into 2 bytes on the stack.
This version measures 66 ticks of the 9901 clock include parameters
: COINC ( spr#1 spr#2 tol -- ?) \ 1.4 mS, 1.1 mS optimized >R SP.Y V@ SPLIT ROT SP.Y V@ SPLIT ( -- col row col row) ROT - ABS R@ < -ROT - ABS R> < AND ;
The original TI code has been named COINC.TI is being tested with my new SP.DIST which is about 2X faster than the original.
My minor change is to change the DUP + to the code word 2*. This is called COINC.NEW
My improved version times at 78 ticks
The TI version is 83 ticks.
The test:
1. Fire 2 sprites at each other at automotion speed 100
2. Read until COINC=TRUE
2. Report the distance between the sprites. (uses SQRT on the output of SP.DIST to a real value)
Assumption:
A faster COINC routine will halt with less overlap of the sprites ie: a greater distance.
Here is the code.
The word DETECTOR is a deferred word so we can change the action with the different versions of COINCThe video shows the result.
Spoiler\ coincidence test NEEDS RED FROM DSK1.COLORS NEEDS AUTOMOTION FROM DSK1.AUTOMOTION NEEDS DEFER FROM DSK1.DEFER DEFER DETECTOR ' COINC IS DETECTOR MARKER /REMOVE : SQRT ( n -- n ) -1 TUCK DO 2+ DUP +LOOP 2/ ; : COINC.NEW ( sp#1 sp#2 tol -- ? ) DUP * 2* -ROT SP.DIST >= ; : COINC.TI ( spr#1 spr#2 tol --- f ) ( 0= no coinc 1= coinc ) DUP * DUP + >R ( STACK: spr#1 spr#2 R: tol*tol+tol*tol) SP.DIST R> ( STACK: dist^2 2*tol^2) > 0= ; ( within tolerance? STACK: flag) STOPMOTION DECIMAL : COLLIDE ( speed -- ) PAGE ." Coincidence Collider" [CHAR] A DKRED 0 100 0 SPRITE [CHAR] B DKGRN 240 100 1 SPRITE 1 OVER NEGATE 1 MOTION 0 OVER 0 MOTION 0 22 AT-XY ." Speed= " . 0 23 AT-XY ." Press key to fire..." KEY DROP AUTOMOTION BEGIN 0 1 7 DETECTOR ?TERMINAL ABORT" halted" UNTIL STOPMOTION CR ." Distance= " 0 1 SP.DIST SQRT . ;
- 1
-
Very nice that you know where to find all this.
So we are keeping "kosher" if we use similar methods.
- 1
-
56 minutes ago, Lee Stewart said:
I don’t know whether this is enough of the code to get the gist, but here it is:
TSTRTN B @RETNC RESTORE PROGRAM COUNTER * COINCIDENCE ROUTINE FOR INSERTION INTO REL4 INTERPRETER * UPON ENTRACE TO THIS ROUTINE AT LABEL 'COINC' THE * REGISTERS ARE ASSUMED TO BE SET UP: * MSBY R2=Y2 IN MSBY AND X2 IN LSBY; * MSBY R0=Y1 IN MSBY AND X1 IN LSBY; * IT IS ALSO ASSUMED THAT THE GROM'S INTERNAL ADDRESS IS SET * UP PREPARED TO READ (FOLLOWING THE COINC INSTRUCTION): * - A ONE BYTE GRANULARITY VALUE, FOLLOWED BY: * - A TWO BYTE ADR. POINTING TO THE COINCIDENCE TABLE. * THE TABLE IS ASSUMED TO RESIDE IN GROM, AND HAVE THE * FOLLOWING FORMAT: * BYTE 0- TV = VERTICAL BIT SIZE OF TABLE LESS 1 * BYTE 1- TH = HORIZ. BIT SIZE OF TABLE LESS 1 * BYTE 2- V1 = VERTICAL DOT SIZE OF OBJECT 1/2**GR * BYTE 3- H1 = HORIZ. DOT SIZE OF OBJECT 1/2**GR * BYTES 4 ON - THE BIT TABLE ITSELF; THE BITS ARE * ARRANGED SUCH THAT THE FIRST (TH+1) BITS REPRESENT BOOLEAN * CONICIDENCE VALUES CORRESPONDING TO A DELTA Y (Y1-Y2) OF -V1 * THRU -V1+TV AND DELTA X (X1-CX2) VALUES -H1 THRU -H1+TH * * ENTRY = BR TABLE COINC MOV R0,R8 MOV R8,R3 FIRST GET DELTA Y AND DELTA X SB R2,R3 R3= Y1-Y2= DELTA Y SWPB R8 GET X1 IN MSBY SWPB R2 GET X2 IN MSBY SB R2,R8 R8 X1-X2 = DELTA X MOVB *R13,R0 SET RESLN AND TABLE POINTER SRL R0,8 R0 = GRAN MOVB *R13,R5 SWPB R5 MOVB *R13,R5 SWPB R5 R5 = TABLE POINTER BL @PUTSTK SAVE GROM PC * * NOW GET TV,TH,V1,H1, OUT OF THE 1ST 4 BYTES OF TABLE * MOVB R5,@GWAOFF(R13) PUT OUT TABLE POINTER LSBY SWPB R5 MOVB R5,@GWAOFF(R13) PUT OUT TABLE POINTER MSBY SWPB R5 MOVB *R13,R2 R2=TV(MSBY) NOP MOVB *R13,R1 R1=TH(MSBY) NOP MOVB *R13,R6 R6=V1(MSBY) NOP MOVB *R13,R7 R7=H1(MSBY) * NOW ON WITH THE SHOW, THE REGISTERS ARE NOW SET UP AS: * R0= GRANULARITY; * MSBY R1= TH = CONICIDENCE TABLE HORIZONTAL SIZE -1 * MSBY R2= TV = CONICIDENCE TABLE VERTICAL SIZE -1 * MSBY R3= Y1 - Y2 = DELTA Y * MSBY R8= X1 - X2 = DELTA X * R5= PNTR TO COINCIDENCE TABLE IN GROM * MSBY R6= V1 = VERTICAL SIZE OF OBJECT ONE IN DOTS * MSBY R7= H1 = HORIZ. SIZE OF OBJECT ONE IN DOTS * R13 = GROM READ ADR. * MOV R0,R0 IF GRANULARITY IS 0, DON'T SHIFT JEQ DNTSHF BECAUSE 9900 SHIFT BY 0 IS 16 SRA R3,R0 DIVIDE DELTA Y BY (2** GRAN) SRA R8,R0 DIVIDE DELTA X BY (2** GRAN) DNTSHF AB R7,R8 R8 = B = H1 + DELTA X JLT NOCOIN AB R6,R3 R3 = A = V1 +DELTA Y JLT NOCOIN CB R3,R2 A::TV JGT NOCOIN CB R8,R1 B::TH JGT NOCOIN RANGE TEST PASSED? SRL R1,8 NOW COMPUTE TABLE INDEX INC R1 R1=TH+1 SRL R3,8 R3=A MPY R3,R1 R2=A*(TH+1) SRL R8,8 R8=B A R8,R2 R2= INDEX. COMPUTE TABLE & BIT POSN MOV R2,R0 R0 = INDEX ALSO ANDI R2,>FFF8 R2 = ROUNDED DOWN TO LOWER MULT OF 8 S R2,R0 R0 = BIT DISPLACEMENT (0= LEFTMOST) SRA R2,3 R2 = BYTE INDEX INTO TABLE A R5,R2 R2 = ACTUAL ADDRESS OF BYTE C *R2+,*R2+ INC PNTR BY 4 FOR 4 BYTE HEADER MOVB R2,@GWAOFF(R13) PULL PROPER BYTE FROM GROM INC R0 MOVB @R2LSB,@GWAOFF(R13) LI R2,>2000 MOVB *R13,R3 R3 = THE BYTE FROM THE TABLE SLA R3,R0 GET PROPER BIT INTO THE STATUS CARRY JOC YUP IF BIT IS 0, NO COINCIDENCE NOCOIN CLR R2 NO, WE HAVE COINCIDENCE YUP MOVB R2,@STATUS YES, WE HAVE COINCIDENCE JMP TSTRTN
Personally, I am not sure it is worth the effort. I am chewing on using just the tolerance square, as I think you are (were) doing—a lot quicker, for sure. I may try to use a user-settable flag to do it either way, but I only have 162 bytes left in that bank. It might be enough.
...lee
Wow! That is a lot of code.
Thanks for finding it.
back of the napkin... (0 wait state thinking just to compare things)
So it is 58 lines of code if we say the 9900 averages 18 clocks per instruction that is on the order of 350 uS.
With 20 clocks as an average that's 386 uS.
My difference method in Forth, including putting 3 parameters on the stack is ~1,500 uS. measured with the 9901 timer.
Three parameters uses 234 uS. leaving 1266uS for the routine.
In code I should be able to make that 5x faster... 253 uS.
Still not that much better.
Will have to do some tests.
-
7 hours ago, Lee Stewart said:
I think I would rather trust the user to increase the tolerance. I just wish I could guess why they did it. Perhaps I should again try to grok how GPL does it using coincidence tables (see here). I think I had it once upon a time. It just does not seem worth the effort.
...lee
OK I don't fully understand all the logic but it again seems to be an exercise that leads to slower determination of coincidence albeit it creates symmetrical coincidence around the object from what I can see. It would be helpful to see how much code it takes to process these tables.
Is there a way to find that code in the GPL interpreter?
-
I agree that the user should control it and that it is all too much effort for something that needs to be fast on a very slow machine.
That's why I believe the pixel coordinate comparison makes more sense. The data is sitting there so just read it and difference it.
My preliminary colliider tests showed that this VDP x,y comparison method works very well.
I have some stuff on my plate but this week but I want to run the tests with a deferred word COINC and plug in different methods and view the sprites and where they actually collide.
I will take a look at the GPL and see if any of it clicks.
- 2
-
Lol. I don't think I can.
I am just now over looking over what I translated it into.
I did the same thing, just faster. I am writing a little "collider" to compare how well different COINC routines work.
It's like a particle collider for sprites with one sprite coming from opposite sides of the screen.
Perhaps they were purposely expanding the window to create a higher chance of coincidence?
Even in Forth it's hard catch the asynchronous automotion sprites.
HEX \ text macros improve speed of coicidence detection : 2(X^2) ( n -- 2(n^2) S" DUP * 2* " EVALUATE ; IMMEDIATE : <= ( n n -- ? ) S" 1- <" EVALUATE ; IMMEDIATE \ simple machine code optimizers for DIST CODE RDROP ( -- ) 05C7 , \ RP INCT, NEXT, ENDCODE CODE DXY ( x2 y2 x1 y1 --- dx dy ) \ Common factor for SP.DIST,SP.DISTXY C036 , \ *SP+ R0 MOV, \ pop x1->R0 6136 , \ *SP+ TOS SUB, \ pop y1-y2->tos 6016 , \ *SP R0 SUB, \ x1-x2->R0, keep stack location C0C4 , \ TOS R3 MOV, \ dup tos in r3, MPY goes into R4 38C4 , \ TOS R3 MPY, \ r3^2, result->r4 (tos) C080 , \ R0 R2 MOV, \ dup R0 3802 , \ R2 R0 MPY, \ RO^2 C581 , \ R1 *SP MOV, \ result to stack NEXT, \ 16 bytes ENDCODE ( factored DIST out from SPRDISTXY in TI-Forth) : DIST ( x2 y2 x1 y1 -- distance^2) \ distance between 2 coordinates DXY 2DUP + \ sum the squares (DXY is code word) DUP >R \ push a copy OR OR 8000 AND \ check out of range IF RDROP 7FFF \ throw away the copy, return 32K ELSE R> \ otherwise return the calculation THEN ; : SP.DIST ( #1 #2 -- dist^2 ) \ distance between 2 sprites POSITION ROT POSITION DIST ; : SP.DISTXY ( x y # -- dist^2 ) POSITION DIST ; ( 0 means no coinc ) : COINC ( sp#1 sp#2 tol -- ? ) 2(X^2) >R SP.DIST R> <= ;
- 1
-
Extended BASIC for sure.
Less appealing these days but still amazingly powerful.
- 3
-
I was testing this yesterday for copying 4K SAMS pages as fast as I could.
This ASM code is reverse notation Forth Assembler so you might need to twist your head a bit.
The results are shown on the screen capture.
CMOVE is the same as MOVE16 below but using MOVB instruction ie: byte at a time, and does not correct the byte count to an even number of course.
MOVE32 has no benefit for moving a 4K block as you can see but was 20% faster moving an 8K block so it is better for >8K block moves.
Meanings so you can translate
-----------------------------------
BEGIN, is a universal label to jump back to in this assembler
OC WHILE, compiles to: JNC REPEAT+2
REPEAT, compiles to: JMP BEGIN
LTE UNTIL, compiles to: JGT BEGIN
TOS renamed R4
NEXT, returns to the Forth interpreter
CODE MOVE16 ( src dst n -- ) \ n= no. of CELLS to move *SP+ R0 MOV, \ pop DEST into R0 *SP+ R1 MOV, \ pop source into R1 TOS INC, \ make sure n is even TOS -2 ANDI, BEGIN, TOS DECT, \ dect by two, moving 2 bytes at once OC WHILE, \ if n<0 get out R1 *+ R0 *+ MOV, \ mem to mem move, auto increment REPEAT, TOS POP, NEXT, ENDCODE \ no improvement for 4K byte moves. 20% faster for 8K bytes CODE MOVE32 ( src dst n -- ) \ n= no. of CELLS to move *SP+ R0 MOV, \ pop DEST into R0 *SP+ R1 MOV, \ pop source into R1 BEGIN, R1 *+ R0 *+ MOV, \ memory to memory move, auto increment R1 *+ R0 *+ MOV, \ memory to memory move, auto increment TOS -4 AI, \ we are moving 4 bytes at once! LTE UNTIL, TOS POP, NEXT, ENDCODE
- 1
-
I guess it's a requirements compromise then. The code ran significantly faster than what I had (which was already 2X faster than the TI-Forth code) and when I used it in a test the result seemed to work just like the other version to my eye. Sprites collided and bounced back.
Different horses for different courses... ?
-
The Forth code I posted is used with a tolerance of 8 typically and it seems to work great even in Forth.
- 1
-
I am thinking that even in code the extra cycles required to compute distance to determine coincidence if pretty slow compared to subtracting the actual coordinates and comparing each difference to a tolerance value.
I have not got around to converting my Forth words to code but with a few temp registers it should be pretty efficient.
-
Simplifed ALLOCATE, FREE, RESIZE in ANS/ISO Forth
I was reading a thread in comp.lang.forth about these words and discovered that a lot of people don't bother implementing the most formal interpretation of these words for small systems.
By formal I mean something that would allow allocation, freeing and resizing memory blocks in such a way that there would never be fragmentation. This requires a way to read all the allocations either in a table or as a linked list so you can examine the state of each allocation.
However if you don't need all that it becomes quite simple to make simple system that does the same job with the caveat that you have a more static allocation process which is more in line with Forth thinking.
So instead of a full implementation that takes 768 bytes. Here is one that takes 118 bytes. In fact if you remove the luxury of remembering the size of an allocation it would be even smaller.
This version includes the word SIZE which seems to be commonly written by others.
The Forth variable H is initialized to >2000 when Camel99 Forth starts and is used a the HEAP pointer for the lower 8K RAM.
To reset the heap you would use HEX 2000 H ! in Forth or make a word to do it.
\ Minimal ALLOCATE FREE RESIZE for Camel99 Forth B Fox Sept 3 2020 \ Mostly Static allocation HEX : HEAP, ( n --) H @ ! [ 1 CELLS ] LITERAL H +! ; : ALLOCATE ( n -- addr ?) DUP HEAP, H @ SWAP H +! FALSE ; : SIZE ( addr -- n) 2- @ ; \ not ANS/ISO commonly found \ *warning* FREE removes everything above it as well : FREE ( addr -- ?) 2- DUP OFF H ! FALSE ; \ *warning* RESIZE will fragment the HEAP : RESIZE ( n addr -- addr ?) DROP ALLOCATE ;
Usage would typically be something like this:
\ protection and syntax sugar : ?ALLOC ( ? --) ABORT" Allocate error" ; : -> ( -- addr ?) ?ALLOC POSTPONE TO ; IMMEDIATE \ define the variables during compiling 0 VALUE X 0 VALUE Y : START-PROGRAM 50 ALLOCATE -> X 50 ALLOCATE -> Y .... PROGRAM continues
- 2
-
The only thing you might consider is writing these in CODE. I kept DIST for computing actual distance but I felt it was too much overhead for coincidence since it's all just sitting there in VDP RAM to read and compare.
I think these could be really fast using registers versus the stack juggling in the Forth version.
Notice that I purposely have code duplication in COINC rather than calling COINCXY. This is just for a bit of extra speed.
CODE overhead with BL would be low enough to allow calling COINCXY IMHO.
: COINCXY ( dx dy sp# tol -- ? ) >R SP.Y V@ SPLIT ( -- col row col row ) ROT - ABS R@ < -ROT - ABS R> < AND ; : COINC ( spr#1 spr#2 tol -- ?) \ 1.4 mS, 1.1 mS optimized >R SP.Y V@ SPLIT ROT SP.Y V@ SPLIT ( -- col row col row) ROT - ABS R@ < -ROT - ABS R> < AND ;
Just my 2 cents on the matter.
- 1
-
The other thing that workspaces are very good for is context switching.
If you initialize a group of workspaces as if they were called by BLWP, in a circle, ( A calls B, B calls C, C calls A) you can change tasks with just RTWP. That's is pretty cool!.
- 2
-
10 hours ago, GDMike said:
1. two different SAMS "windows" in your RAM space, switch in source and destination SAMS pages and copy from one to the other.
yeah, that's what I'm doing, I'm using unpaged >E000->EFFF to write temporary data to and read from.
I think this is my option 2 because you are copying SAMS data from a window in CPU RAM (?) , to "unpaged" >E000..>EFFF.
Am I understanding what you are doing correctly?
Option1 means you have 2 - 4K windows say at >3000 and >E000. You set the source SAMS bank to say >3000 , the destination bank to >E000 and copy 4KBYTES from >3000 to >E000.
That is a SAMS-to-SAMS transfer.
-
I find it's really hard make big performance differences with the 9900 in the nestable sub-routine area.
If you build a little stack it takes ~28 clocks to push R11 onto a stack and 12 to BL (no wait-state comparisons here)
So thats 30 and another 12 to return so total overhead is 42.
BLWP/RTWP is 26+14= 40
If you have to pass any data back and forth to and from different workspaces you lose more time, where as pushing R11 lets you share registers.
Of course if you need to push a few registers with a stack model the 9900 will kill you.
You really have to work it through for every situation or just bite the bullet and take the penalty in exchange for a consistent calling convention.
It reminds me of a song my grandfather sang after a suitable number of drinks. "Gone are the days when free lunches came with beer..."
- 2
- 1
-
2 hours ago, GDMike said:
TheBF,
Talking about VMBR and VMBW.
Could rewriting and using a different VMBR and BW code work better than those built in referenced code? I thought maybe I should use something that wasnt a built in reference too.
For sure but they work fine until you need to go faster.
-
2 hours ago, GDMike said:
Ahh, lucky for me, my user data is all in ram to start with, so my mov's are pretty fast. But I had a feeling, well I read somewhere that it rolls over like KSARUL said previously, but the article I found, and sorry but I can't find it now, but it talked regarding the older AMS? Not sure it was the same for my 1MB card, but KSARUL put it in perspective for me.
It's my first time working with larger data all at once with this card and I'm enjoying the heck out of It. I'm actually at a point where I'm importing just short of 8K of user data from what they create in the 8K of the supercart, but in order to import, I have to push everything in SAMs,(>3000->3FFF) lower banks to the right or up, to higher banks by 9 banks so that pages 1-9 of my SNP pages are actually SNE pages imported/merged into existing SAMs at end up at the lower part of SAMs bank because I want them to show up as SNP pages 1-9 and what was SNP pages 1-9 are now pages 10-19 if that makes sense...
And I've got that done, but I wanted to run a test while I was here and I bumped my loop up and outside the limit of banks available and saw no issues like a crash, so it led me to look into this.
Thx everyone for chiming in.
I appreciate that.
Doesn't matter if all the data is RAM. To copy from SAMS page to SAMS page you need to have :
1. two different SAMS "windows" in your RAM space, switch in source and destination SAMS pages and copy from one to the other.
-or-
2. use one SAMS window , copy SAMS to a RAM buffer , switch the page in the window and copy the buffer back to SAMS
-or-
3. or as I tried, one SAMS windows, copied to VDP, switch the page in the window copy VDP back to SAMS.
That's all the options I can think of. (well I suppose you could write to file and then copy back but that's not practical)
- 1
-
Thank you. Thank makes perfect sense. It would be an interesting albeit significant project I am sure to build the NCG.
I found this in Wikipedia:
Niklaus Wirth specified a simple p-code machine in the 1976 book Algorithms + Data Structures = Programs.
The machine had 3 registers - a program counter p, a base register b, and a top-of-stack register t. There were 8 instructions:
- lit 0, a : load constant a
- opr 0, a : execute operation a (13 operations: RETURN, 5 math functions, and 7 comparison functions)
- lod l, a : load variable l,a
- sto l, a : store variable l,a
- cal l, a : call procedure a at level l
- int 0, a : increment t-register by a
- jmp 0, a : jump to a
- jpc 0, a : jump conditional to a[6]
These look very familiar to Forth people.
Wirth seemed so ahead of the pack back then.
- 1
-
8 hours ago, GDMike said:
I did a test of copying data from a 1 mb Sam's card bank to the next higher bank. My test kept copying past bank 245, well because I wanted to see how the console would handle my loop. And my tests kept performing as if the higher banks existed.
Do these banks roll over, as in start writing to bank 1 if it can't find bank 247,248,250 or something similar?
Thanks strange. I was doing the same thing.
I was testing how fast I could copy pages using only one 4K buffer in CPU RAM. I was blitting into VDP RAM, switching pages and blitting back to SAMS.
It took about 3 seconds to copy 64K that way using Assembler VMBW,VMBR inside a Forth loop.
If I used a 4K buffer in CPU RAM I could get it to 2 seconds for 64K by using a custom copy routine that moved 16 bit cells at a time instead of bytes. By extension then I should be able to speed that up by moving 2 or maybe even 4 cells inside the assembler loop.
The VDP method is not bad for performance and means I can don't need to play with CPU RAM for the copy buffer. Still deciding which way I want to go.
- 1
-
"It's such a good feeling..." ? (Fred Rogers)
- 1
-
My confusion was more around how you modify the existing compiler to understand 9900 Assembler opcodes or convincing the compiler to convert Pascal to native code.
I assumed that changing the compiler is not possible at least for TI-99, so then I wondered about writing the code manually and putting the binary in system friendly form.
If my assumptions about the flexibility of the compiler are wrong then problem solved.
- 1
SAMS usage in Assembly
in TI-99/4A Development
Posted
Using the register at >4006 you can select safely any SAMS page from >10 .. >FF
So that means you can have 240 different 4K blocks living in your memory at >3000, but only one at a time.
To select SAMS block >10 put the number into >4006 with the bytes reversed, so >10 must be stored as >1000
>11 must be stored as >1100
The code would look something like this:
(Maybe you have not turned on the mapper?)
NOT TESTED!!!