Jump to content


+AtariAge Subscriber
  • Content Count

  • Joined

  • Last visited

Everything posted by TheBF

  1. Thanks Lee. That means a lot to me coming from you. If you want this kind of option in FB Forth, it's not hard to add. I created a weird way to make user variables. I extend the workspace after the registers. But with TI Forth architecture you have a USER Pointer? or no? Anyway I have this RTWP version which I published here earlier and I also started with this alternative version that is for a more conventional machine that you could use. Happy to answer any questions should you want to do a port of it. I find it a lot of fun to make the little TI look like a big machine. If I ever get my hardware back up and running, I would want to have a terminal task running on RS232 and the graphics stuff/ games working as it's own job. Utopia...
  2. Ya that's only real downside. And if I REALLY needed a HEX dump in the middle of a video game ( ) I literally just reduce the priority by inserting the word PAUSE into the loops that are printing out the text and numbers. For example below is the offending code. Notice the routines called: .CELLS .ASCII and DUMP. Each routine is a DO/LOOP. To reduce their GOL DARN CPU stealing ways, all you do with this system is put word 'PAUSE' after the word DO in each loop. I put it in brackets here, but I should add it to the actual code anyway. PAUSE is the actual task switching routine, so you can see how other tasks will get serviced before doing the work inside the loop. And if you really want to reduce their priority, you add more PAUSEs or you can use MS which will delay but continuously give the CPU to the other tasks while it is waiting for the timer to expire. I didn't invent this stuff if was perfected in the 1970s but Chuck Moore and Forth Inc. It's pretty cool for lower power computing because it's lower overhead. This context switch on the 9900 changes in 20uS if it ran proper full speed memory. On the 8 bit buss it's what? double or so. But still ridiculously fast compared to conventional switchers. By the way, the word TYPE is also a DO/LOOP that prints our 'n' characters at any address. It was made properly for multi-tasking which is why the sprites didn't stop dead when the DUMP ran on the screen. TYPE was giving away the CPU after each character was put on the screen. 8 CONSTANT 8 \ no. of bytes to dump for each line : BOUNDS ( adr len -- end-adr start-adr OVER + SWAP ; : .#### ( n --) S>D <# # # # # #> TYPE ; : .ADR ( ADR --) .#### [CHAR] : EMIT ; : .CELLS ( ADR N --) BOUNDS DO ( PAUSE) SPACE I @ .#### 2 +LOOP ; : .ASCII ( adr n --) \ print ascii values or '.' for non-printable chars BOUNDS DO ( PAUSE) I [email protected] DUP BL 1- T[CHAR] ~ WITHIN \ check for printable char (from SPACE to ASCII '~') 0= IF DROP T[CHAR] . \ replace unprintable chars with '.' THEN EMIT LOOP ; : DUMP ( offset n -- ) BOUNDS DO ( PAUSE ) \ 'I' is the address pointer CR @ I .ADR \ print the adr I 8 .CELLS SPACE \ print 8 bytes of memory I 8 .ASCII \ print 16 ascii format BYTES KEY? IF LEAVE THEN 8 +LOOP \ increment the offset address by 16 CR ;
  3. Here a little video of sprites under cooperative control. I am pretty happy with this version. I need to make Pong game or something simple like that to see how it all works in the heat of battle. CAMEL99 Sprites.mov
  4. Yes that is a cool enhancement. I saw a post by someone here, maybe you, about that put it in my VMBW and measure the speed increase at +12.9%. I am crammed for space right now because my homemade cross-compiler can only make one 8K program image. (gotta fix that one day) So I can't un-roll the loop in this case.
  5. I don't know MikeOS, but from what you describe I think you need to consider the following. 1. You can create the table shown above as a finite size table with fields laid out in memory * flags start end I.D>. name * -------------------------------------------------------- DATA 0 , 0, 0 , 0 'name#1' " " " " " " " " " " " " " " " * etc.... -OR- you create the table as a linked list so you can add a new entry any time. For that you will need a link field in the record * Link flags start end I.D name * -------------------------------------------------------------------------- ​RECRD1 DATA 0, xxxx, yyyy, zzzz , qqqq , 'name#1' * first entry links to nothing (0) <memory> <memory> <memory> RECRD2 DATA RECRD1, xxxx, yyyy, zzzz , qqqq , 'name#1' * next record links to the previous record Also the pointer that is used to find a table RECORD contains the address of the first number of the table record In the link-list case it would be the LINK field. In the first case it would be the FLAGS field. But in both cases, you could remove the I.D field because each record has a unique address and that can be the ID # if you want to use it for that purpose. My 2 cents BF
  6. Yes I think it is a typo. Should be write to the VDP is preferred. So I measured what I had and been doing with each sprite motion, 1 at a time, in Forth, with some ALC assistance to add x,y to the motion number and combine to 16 bits. This took 66 ticks on the 9901 time (1.3mS) for each computation and update. To write an entire sprite table (128 bytes) with VMBW takes 88 ticks or 1.8mS! :-) So Master Lee is right again. I changed everything. I still update each sprite motion in a Forth loop, but then I write it all to VDP at once. My SPRITE command keeps track of how many sprites are created so I only update as many as needed. Then I wrapped the whole thing up in a background task and it works pretty much like XB now. Here is what the "automotion" task looks like in CAMEL99 Multi-Forth. (I am going to steal that name for the task Lee. It's a good one.) Compared to the TI ISR that RXB posted it's pretty simple. CREATE MOVER USIZE ALLOT \ create mem block for a task MOVER FORK \ init the mem block to be a task : SPMOVER \ make a Forth word BEGIN \ begin a loop SPRITE# @ 1+ 0 \ FOR I=SPRITE#+1 TO 0 DO \ I SP.MOVE PAUSE \ calc next position of SPRITE(I) LOOP \ & give some time to other tasks SP.UPDATE 1 MS \ write to VDP, wait 10 ms (I will fix this) AGAIN ; \ GOTO BEGIN ' SPMOVER MOVER ASSIGN \ ASSIGN the address (') of SPMOVER to MOVER To get it going you just say MOVER WAKE. I will get a demo video next week and post the code on GITHUB. Got a wedding to go to.
  7. All good reasons to roll your own. And the one I found was if I want to add X and Y in the sprite table (VDP RAM) to 2 other numbers in the motion table (also VDP RAM) You have to read them both out of VDP ram, then add them, then put the new location x y back into VDP ram. That was pretty slow. I created the motion table in memory expansion so I can get at the motion vectors faster. Then I made a couple of Assembler routines that can read and write 2 bytes to/from the Sprite table. Then another one that can add the location and motion vectors together and combine them back into one 16 bit number. Using those little routines, it goes pretty fast and I only update the sprites that I want to update when I want to update. I have found the sprites work very well when I use my multitasker to do the updating as a separate task. And since it is not on an interrupt, my program gets full control when I need to test for coincidence. Now I need to create something with it. Actually I think these kind of very small assembler routines could be used nicely from XB as well.
  8. So in the journey to have some fun with the old TI I finally got to the place with my system where I had to deal with Sprites. TI Extended Basic does an excellent job of hiding all the details of how these things work making it easy to use. How it really works was news to me so I had to do some sleuthing around and peeking inside the code of Willsy's TurboForth provided excellent insights as always. For ease of communication about what I learned I will use XT-BASIC terminology unless there is no alternative. My hope is this will be interesting to those who like to use sprites in their games and want to "peek behind the curtain" * For those who really know about Sprites, please jump in and correct the stuff I have misunderstood * -------- How it works in BASIC The Sprite control in the VDP chip memory is contained in something called the Sprite Attribute List (SAL). In BASIC it could be understood like this: 100 DIM SAL$(32) (The Sprites are indexed from 0 to 31. #1 is actually 0, but we will use option base 1 terminology to avoid confusion) There is a requirement that cannot be defined in BASIC but the SAL$() array MUST begin at a specific address in the VDP memory. So let's pretend our imaginary BASIC looked after that for us and started the array at HEX 300. Each element in the SAL$() can only be 4 bytes long. The function of the bytes are as follows: byte1: X location byte2: Y location byte3: ASCII char to use as a pattern descriptor byte4: Split into 2 "nibbles": right half clock bit, left half color Each byte has a specific function in controlling a SPRITE The only way to express it in BASIC would be to do something like this: 300 SAL$(1)=CHR$(X) & CHR$(Y) & CHR$(PAT) & CHR$(CLR) So in our imaginary BASIC, if you ran this code: 300 SAL$(1)=CHR$(10) & CHR$(10) & CHR$(65) & CHR$(3) A green letter "A" sprite would appear at pixels 10,10. Ya. It's that simple. You put the correct bytes into the right spot in VDP memory and the TMS9918 chip does all the hard work. Now in fact these data are not strings. They are 8 bit numbers. So another way to think of this in BASIC is a "byte" array (not really possible but you get what I mean): 310 DIM SAL(32,4) So in this case our fancy basic would let us put a number in each element this way: 319 REM create Sprite #1 in VDP memory 320 SAL(1,1)=10 ( X coordinate pixel) 330 SAL(1,2)=10 ( Y coordinate pixel) 340 SAL(1,3)=65 ( ASCII char) 350 SAL(1,4)=3 ( foreground color) Let's stick with this byte array example for the next step. How do Sprites move by themselves? There is another array that holds the information for MOTION. It can be understood as a 2 dimensional array: 400 DIM MOTION(32,2) So when you write CALL MOTION(#1,10,-4) what really happens is: LET MOTION(1,1)= 10 LET MOTION(1,2)= -4 That's it. CALL MOTION stores the numbers in the MOTION() array. The part that makes the sprites move is a special fast sub-routine that that does something like this: 490 REM MOVE ALL SPRITES 500 FOR SPRITE=1 TO 32 510 SAL(SPRITE,1)=SAL(SPRITE,1) + MOTION(SPRITE,1) 520 SAL(SPRITE,2)=SAL(SPRITE,2) + MOTION(SPRITE,2) 530 NEXT SPRITE So as that loop runs, each sprite has the motion value (x,y) added to the LOCATION(x,y) values for the sprite in VDP memory. So the sprite moves to new location. Simple. ISR is the Magic The only thing missing is automatic motion. That is done with something called an interrupt and our little program becomes what is called an Interrupt Service Routine (ISR). This is a fancy name for a program that stops YOUR program every now and then and does something else for a while. So in the TI-99 there is a timer running at 60 times a second (or 50 in European models) and when it "interrupts" your BASIC program, it runs the "MOVE ALL SPRITES" program and then returns to run BASIC. So that's the normal way it works.(as far as I can tell) I wondered could it be done differently? Do I need an ISR? I have a multi-tasker in my system. More to come...
  9. Thanks for sharing this. It looks to me like the scheduler is doing something like BLWP but manually. Any why you chose to do it this way?
  10. Would this be easier if you implemented the -HOLD and HOLDA on your CPU? (I am guessing here, but it looks like it would help with a single step implementation)
  11. Yes having a hardware timer running is critical even in this cooperative version. It means that even with 5 tasks running, my BEEP and HONK sounds still sound correct. :-) Without the 9901 in my delay mechanism the sound duration really varied by the amount of load I put on the computer. Can you shows the code for the interrupt handler that does the task switch? I would love to see it. Here is a conventional (non workspace changing) Forth Task switcher that I wrote first: \ Conventional Forth Pause CODE: PAUSE ( -- ) \ this is the context switcher SP RPUSH, \ 28 IP RPUSH, \ 28 RP 4 (UP) MOV, \ 22 save my return stack pointer in RSAVE user-var BEGIN, 2 (UP) UP MOV, \ 22 load the next task's UP into CPU UP (context switch) *UP R0 MOV, \ 18 test the tlag for zero NE UNTIL, \ 10 loop until it's not zero 4 (UP) RP MOV, \ 22 restore local Return stack pointer so I can retrieve IP and SP IP RPOP, \ 22 load this task's IP SP RPOP, \ 22 load this task's SP NEXT, \ = 194 * .333 = 64.6uS context switch END-CODE The really fascinating thing about Chuck Moore's method is that there is no separate scheduler. Which means the machine has less work to do right out of the gate. It sound like it can't possibly work when you are use to premptive systems and yet it does. (as long as you create a little routine that incorporates a timer, if you need real time) The amazing thing about it is the granularity of the task switch is very small and even in conventional processors the task switch only needs to save 3 registers. (that's not unlike the TMS9900) How it works: Emit a character, switch Check for a key press, switch Read a key, switch Waiting for a timer to expire, switch Write a block of memory, switch. The only thing you have to be aware of as a designer is that your code does not try to do too much at one time. But if you are importing code from other environments then of course you would need a pre-emptive tasker. Then you have all the problems of multiple tasks calling shared resources with pre-emption. That goes away with Chuck's system and if you have a task that needs to steal the CPU for something really important, you have it. No record locking needed. Everything in the system runs to some form of completion by design.
  12. I really need to get a life :-) But I had to see what coding the word would do and the results are amazing. CODE: DXY ( x2 y2 x1 y1 --- x^2 y^2 ) *SP+ R0 MOV, \ x1->R0 *SP+ TOS SUB, \ y1-y2->tos *SP R0 SUB, \ x1-x2->R0 TOS R3 MOV, \ dup tos TOS R3 MPY, \ tos^2, result->R4 (tos) ( edit: removed and instruction here) R0 R2 MOV, \ dup R0 R2 R0 MPY, \ RO^2 R1 *SP MOV, \ result to stack NEXT, \ 16 bytes END-CODE The code word is the 2 bytes smaller (after editing) than Lee's Forth version I believe, 18 bytes vs 16 But it is 3.375 times faster than Lee's version and 5.5 Times faster that my factored version. Just adding ROT to word slows it down a lot it seems. Makes me wish for register based locals for things like this.
  13. I corrected my earlier post. I should not do this so late at night clearly.
  14. LOL. Very cool Lee. Of course the ABS is redundant. The master speaks again. So I thought since I was still up I would measure these beasts. I did a version of mine with DELTA removed as well. In my current kernel -ROT is : -ROT ROT ROT ; so no speed advantage there. But... there was no real penalty for factoring out ^2 on my system. Yes there was a penalty but it was masked by the very slow ROT and -ROT words written in Forth. This black screen version has code words. Here is the code. (apologies for the timer name. I ran out of space in 8K so I squeaked it in) :-) ​​I fixed the copy and paste errors in the original version. This one seems correct now. \ factoring dxy tests HEX \ TMR! sets the 9901 timer to >3FFF \ [email protected] reads the 9901 timer : TIDXY ( X2 Y2 X1 Y1 --- X^2 Y^2 ) ROT - ABS ROT ROT - ABS DUP * SWAP DUP * ; : ^2 ( n -- n^2) DUP * ; : DELTA ( n -- n') - ABS ; : BFDXY ( x2 y2 x1 y1 -- x^2 y^2 ) ROT DELTA -ROT DELTA ^2 SWAP ^2 ; : BFDXY2 ( x2 y2 x1 y1 -- x^2 y^2 ) ROT - -ROT - ^2 SWAP ^2 ; : WILLXY >R SWAP >R - ABS DUP * ( x^2 ) R> R> - ABS DUP * ; ( x^2 y^2 ) : LEEXY ( x2 y2 x1 y1 --- x^2 y^2 ) ROT - DUP * >R \ S:X2 X1 R:dy^2 - DUP * \ S:dx^2 R:dy^2 R> \ S:dx^2 dy^2 ; : INPUTS 10 20 30 40 ; : .OUTPUT ( n n t -- ) >R . . 3FFF R> - . ; : TESTTI INPUTS TMR! TIDXY [email protected] .OUTPUT ; : TESTBF INPUTS TMR! BFDXY [email protected] .OUTPUT ; : TESTBF2 INPUTS TMR! BFDXY2 [email protected] .OUTPUT ; : TESTWILL INPUTS TMR! WILLXY [email protected] .OUTPUT ; : TESTLEE INPUTS TMR! LEEXY [email protected] .OUTPUT ;
  15. As I think about it a little bit more, a register based VM could take direct advantage of the 9900's memory to memory architecture. Your byte codes can add,subtract, mpy etc directly to memory and use hardware registers only when needed. Just a thought And taking a page from Parrot... create some Virtual registers in memory myregs BSS 32*2 * now you have 32 16 bit registers. myfloats BSS 8* 8 * and 8 floating point registers ... how many more and what kind would you like? :-)
  16. Excellent question! There are divergent opinions on the matter. The rumors say a stack based VM can created smaller code. The Java VM is one of the best modern ones. Once you add all the classes and methods the code is probably not any smaller when you write in Java, but you should be able to do any language on top of JVM. The Parrot VM which is used by a bunch of languages now is a register VM and I have read it outperforms stack VMs for speed. https://en.wikipedia.org/wiki/Parrot_virtual_machine You should probably get familiar with a all three. Byte code Forth VM, JVM and Parrot VM and then make a decision. :-) B
  17. While looking at how to implement sprites I studied the old TI-Forth code that Lee has published for some clues. I found this piece of code that seemed hard to understand. : DXY ( X2 Y2 X1 Y1 --- X^2 Y^2 ) ROT - ABS ROT ROT - ABS DUP * SWAP DUP * I am sure the author was very proud of this tight statement with all stack juggling working out just perfect. But I find it hard to understand the intentions of the code. So I took it apart by removing the code that repeats and giving it a name. This is how Forth code is commonly "re-factored". So the first this I see is DUP *. Take a number on the stack and make a copy then multiply them together. That is like saying X^2 in BASIC. So I made this word: : ^2 ( n -- n') DUP * ; Now at the Forth console I can type 2 ^2 . and it prints 4. Perfect. Also noticed the phrase " - ABS" is used twice... so if you subtract 2 numbers and take the absolute value you have a calculation for the Delta, between them. So how about this definition: : DELTA ( n n -- n') - ABS ; I also noticed the "ROT ROT". Rot takes 3 numbers on the stack and "rotates" the 3RD number to the TOP position. So 1 2 3 ROT will give you 2 3 1 on stack. Doing 1 2 3 ROT ROT will give you 3 1 2, so it is like a backwards rotate, There is a Forth word to do that called "-ROT" so I replaced those 2 words with 1 word. So with those factors removed the DXY word became. : DXY ( x2 y2 x1 y1 -- x^2 y^2 ) ROT DELTA -ROT DELTA ^2 SWAP ^2 ; It still has some stack juggling noise in it, but by choosing meaningful names I can see that it takes 2 sets of x/y coordinates, gets the delta between the x and y parts and computes the square of each delta. IMHO, that is how you make Forth easier to read for the next poor guy who has to read your code. Create small, well named definitions. Soon the code starts to read clearly. B
  18. - is a stack based VM - 2 stack VM but who's counting - interprets byte code fast - The implementations you see here are running "address" lists rather than byte codes and some Assembler at the lowest level. There are byte-code Forths. Open-firmware which boots Sun machines is one such system. - is memory efficient - Evidence points to smaller final program sizes yes. Some of that can be to small routines, easily re-used and re-combined to make higher level functions. rather than the system itself. But that is disputed sometimes. - supports the TI hardware well - Sure. You write assembler code to get the real work done and the point to them on demand.
  19. I was thinking about the bare console, and how they engineers had to shoe-horn the language to use the PAD memory.
  20. Redid the DENILE program with little tasks running and posted it over there. http://atariage.com/forums/topic/265231-in-denile/page-2
  21. Denile multi.mov I want to thank retrospect for this nice little scene. It gave me something to work on to beat up my graphics and muli-tasker. I might have gone a little overboard... But anyway, it works!! ... And isn't that all we care about. Here is the reworked code in Forth with 5 tasks running independently including the Console task. Check back in a few minutes for a movie of the "animation" (No sprites yet... this is not done yet.)
  22. After the success with the ISR timer, which is 16.6mS, I tried the 9901 timer with a 10mS countdown and it worked even better. The key is to have something counting for the system while the programs are doing their own thing cooperatively. So [email protected] reads the hardware in the 9901 instead of reading the memory address. I get finer resolution without stopping everything just to read the TIMER.
  23. I have found a much better timer method on the TI-99 for a cooperative multi-tasker. I was using the 9901 timer before and waiting for it to count down for 1 milli-second. That is not ideal because the entire system comes to a stop while waiting. The ideal is you run the switcher routine whenever you have to wait, but TMS9900 is too slow to jump to another task and get back in 1mS. So.... Since the interrupts are spinning a number at >8379 all the time, I can get an accurate 1/60 of a second measurement any time I want. And in 1/60 of a second I can service some other tasks. It's faster in a 16 bit Forth to read an integer so we will read >8378 which gives use the value at 8379 without playing with the bytes. Much faster. So the method is: read the value at 8378 & keep a copy on the stack Then switch to another task When you come back, read the 8378 again Goto 1, until it's not equal to the copy Here is how it looks in Forth : [email protected] ( -- n ) 8378 @ ; \ read ISR counter : 1/60 ( -- ) [email protected] \ read the timer onto stack BEGIN PAUSE \ service other tasks. DUP [email protected] \ copy 1st reading & get a new one <> UNTIL \ loop until not equal DROP ; \ DROP initial timer reading The overall effect is a much snappier system, because it's not wasting time while doing timing delays.
  24. For those running on Windows, Context is a nice editor written in Delphi. (I don't believe source code is available) It has a simple way to create highlighter files for any language, with the common languages already provided which you can use as examples. I use it for Forth on Windows 10 64 and it's pretty nice. http://www.contexteditor.org/index.php Oh and it's freeware
  25. I thought it would be fun to try to run 6 tasks on the poor old TI-99. It actually worked. I created some tools to monitor the tasks. .TASKS command shows the Process ID, the Parameter stack, the return stack, the instruction pointer of the task and the awake/asleep status. I am closer to improving KSCAN for multi-tasking, but the code took me past the 8K file size limit of my cross-compiler, so that's the next improvement. The MULTICAM EA5 program is on GITHUB and so it the MTASKDEM.FTH file in the \LIB folder on GITHUB. https://github.com/bfox9900/CAMEL99 So you can play with this if you are interested. I will get around to doing a demo version of the DENILE program one day. It really changes the way you do games when you can assign moving pieces to a separate task. A movie of the 6 tasks test is attached. TI99 6TASKS.mov
  • Create New...