Jump to content

TheBF

+AtariAge Subscriber
  • Content Count

    2,738
  • Joined

  • Last visited

Everything posted by TheBF

  1. Just curious. Are you using a stack in those programs now? I find the old 9900 not too bad at dealing with them even though they were not baked into the hardware. Having a stack gives you options for those times when all you need is a bit of space rather than setting up a new workspace and vector.
  2. "Hmmm strong the Forth in this one is"
  3. Ah so not the complete absence of ELSE but rather XB enhancements. Thanks.
  4. Not too bad then. I didn't know about the loss of ELSE. Wonder why.
  5. Compiling a language designed to be interpreted always requires adapting language in some way. XB is so slow however it has driven people to go with compiling and tolerating the extra work. "There no such thing as free lunch" is true with computers for sure. As my grandfather used to sing after a few bottles of the amber liquid: 😢 "Gone are the days when free lunches went with beer. These memories still fill my eyes with tears..."
  6. Not sure if this of any help but this seems to work here. TOS is just R4. Looks like I used bit 21 to test if character is ready to read. And 18 to reset (This is not the entire file. ) [CC] DECIMAL [TC] CODE CKEY? ( -- n ) \ "com-key" 0 LIMI, R12 RPUSH, \ save R12 on return stack *Needed?* CARD @@ R12 MOV, \ set base address of CARD TOS PUSH, \ give us a new TOS register (R4) TOS CLR, \ erase it \ *** handshake hardware ON *** 5 SBZ, \ CARD CTS line LOW. You are clear to send UART @@ R12 ADD, \ add UART, >1300+40 = CRU address 21 TB, \ test if char ready EQ IF, TOS 8 STCR, \ read the char 18 SBZ, \ reset 9902 rcv buffer TOS SWPB, \ shift char to other byte ENDIF, \ *** handshake hardware off *** CARD @@ R12 MOV, \ select card 5 SBO, \ CTS line HIGH. I am busy! \ ****************************** R12 RPOP, \ restore old R12 *Needed?* 2 LIMI, NEXT, ENDCODE
  7. Are we sure it's not the code? Perhaps you could give us all a peek.
  8. So you had a huge amount of code inside 1000 iteration loop. I think that was the problem. I am also wondering if the amount of code in the comparison would be less by just checking for non-zero in CLS(7) ... CLS(15) as in the BASIC code above. I just tested BASIC with this so I know the IF statement works this way in the interpreter. Don't about the compiler. 100 X=0 110 IF X THEN 120 ELSE 130 120 PRINT "TRUE" 130 PRINT "FALSE"
  9. You would have to look at the code the compiler generates. Is that easy to see?
  10. Replacing the Assembler with Machine Forth I am finally doing some experiments to see how this will work. My conclusion so far is that it is best to allow naming the registers to fit with the architecture of the 9900. However the register names will be Forth names. In the case of CAMEL99 they would be: \ R0 temp \ R1 temp \ R2 temp \ R3 AREG \ address register \ TOS R4 is top of stack cache (you need to manage it) :-) \ SP data stack pointer \ RP return stack pointer : NOS *SP ; \ Next on Stack : 3RD 2 (SP) ; : 4TH 3 (SP) ; : 5TH 4 (SP) ; TOS (top of stack) and NOS (next on stack) are register names from the F21 Forth CPU. So this is consistent with Chuck Moore's work. I have not decided if some machine Forth instructions should use the TOS NOS pair implicitly sometimes, so it is more like Forth or mandate that registers be explicitly used to make it more Assembler like. I am starting with explicit registers because that is the best fit to the 9900 architecture. I renamed the Assembler jump tokens to make the code look more Forth-like. (not sure they are all correct) HEX \ Action if TRUE 01 CONSTANT > \ JLT to ENDIF, *signed 02 CONSTANT U> \ JLE to ENDIF, 03 CONSTANT 0<> \ JEQ to ENDIF, 04 CONSTANT U< \ JHE to ENDIF, 05 CONSTANT <= \ JGT to ENDIF, *signed 06 CONSTANT 0= \ JNE to ENDIF, 07 CONSTANT OC \ JNC to ENDIF, 08 CONSTANT NC \ JOC to ENDIF, 09 CONSTANT OO \ JNO to ENDIF, 0A CONSTANT U< \ JLO to ENDIF, 0B CONSTANT U>= \ JH to ENDIF, 0C CONSTANT NP \ JOP to ENDIF, The concept works at least at the simple level. Here are two programs that generate the same machine code: HEX \ Code in Forth Assembler ASSEMBLER CODE ASM1 TOS FFFF LI, BEGIN, TOS DEC, EQ UNTIL, NEXT, ENDCODE \ Same code in MForth Assembler. \ NOTE: Registers must be explictly referenced MFORTH CODE MFORTH1 FFFF TOS !# BEGIN TOS 1- 0= UNTIL NEXT, ENDCODE I have added the '->' operator to compile memory to memory MOV instructions. (I suppose I will need C-> or something like that for byte moves) Here is a test program that works \ using variables/addresses MFORTH CODE MFORTH2 FFFF TOS !# BEGIN TOS 1- TOS X ! X -> Y \ mem2mem X->X assignment Y -> Z \ Y -> X 0= UNTIL NEXT, ENDCODE So yes it is a new notation to learn but it does make something of a universal Assembler that could in theory allow machine code to be generated on other machines quite easily. Here is the code generated by MFORTH2 DADE 0204 li R4,>ffff DAE2 0604 dec R4 DAE4 C804 mov R4,@>da8e DAE8 C820 mov @>da8e,@>da98 DAEE C820 mov @>da98,@>daa2 DAF4 16F6 jne >dae2 DAF6 045A b *R10
  11. ok. This makes me wonder if the compiled code would be smaller (and therefore faster) if you just summed the contents of the CLS() array in a FOR/NEXT loop. That would be like an AND on each value but I don't know if the compiler will make faster code than that inline statement. The loop overhead would be not bad in machine code. Might be worth a try. You might have to change the initial value in the array . ?
  12. Just curious. Is most of the Geneve code written in Assembler?
  13. When I look at all those logical conditions I find myself wondering if there is one of them that is critical? Maybe there are two that are show-stoppers. If so those could be part of an IF statement so that you shortcut testing everything IF you don't need to. Using logical flags with AND and OR can save time for two conditions vs IF under the right conditions, but I think you are forcing the machine to compute all of the logical operations, even if you fail the first one. That could be a bottle-neck maybe. (?) But maybe it has to be that way for your program. Just a thought. The compiler make fast code but it's not very speedy processor under the hood.
  14. To CREATE/DOES> or not to CREATE/DOES>, that might be the question After embedding myself in Machine Forth I started wondering about an idea I had some time back which was: Can I replace the 9900 Assembler mnemonics with equivalent Forth names and make a "machine forth" Assembler. The answer so far is "I think so". (whether anybody cares is another matter) However in the course of looking at the existing Assembler that we inherited from TI-Forth I see some needless complexity. It might just be young engineers saying "Look how cool my code is!" The CREATE DOES> or <BUILDS DOES> idea is really neat but it adds run-time overhead and takes up extra space in the system so its use should be reserved for those times when it really solves the problem. Compare these two ways to make the simpler instructions for TMS9900 Original: : 0OP CREATE , DOES> @ , ; 0340 0OP IDLE, 0360 0OP RSET, 03C0 0OP CKOF, 03A0 0OP CKON, 03E0 0OP LREX, 0380 0OP RTWP, : ROP CREATE , DOES> @ + , ; 02C0 ROP STST, 02A0 ROP STWP, : IOP CREATE , DOES> @ , , ; 02E0 IOP LWPI, 0300 IOP LIMI, Versus: HEX \ : IDLE, ( -- ) 0340 , ; \ "Should not be used on the HOME Computer" \ : RSET, ( -- ) 0360 , ; \ : CKOF, ( -- ) 03C0 , ; \ : CKON, ( -- ) 03A0 , ; \ : LREX, ( -- ) 03E0 , ; : RTWP ( -- ) 0380 , ; : STST ( reg -- ) 02C0 + , ; : STWP ( reg -- ) 02A0 + , ; : LWPI ( addr --) 02E0 , , ; : LIMI ( n -- ) 0300 , , ; Which one is easier to understand? Which one is compiles to less bytes? Which one will Assemble code faster? Edit: Corrected instruction CKOF mistake IDLE REST CKON CKOFF "...should not be used on the Home Computer..." E/A Manual. So they will be removed from the Camel99 Assembler to save space.
  15. So you want convenient syntax AND optimization too? Just kidding. I think it can be hard to do both in a single pass. (?) I have new found respect for code generating programs after my recent research project.
  16. My pop/push optimizer problem seems to have been my logic on when to invoke it. It seems to work reliably now. In this little program program it was used 8 times which saved 48 bytes! To be clear "Forth" is not in the program. It's just native code glued together by Forth. So here is a little video of how it works. There is still lots of work to do to make it something someone else could use but I have always wanted to know more about Forth generating native code so this is a bit of personal victory. It pales in comparison to XB256 but it is a compiler that can generate fast code so it could be "library enabled". Here is the entire benchmark program using some tricks so that it runs as fast as I can make it go. The video shows it built and run. You could run it from within Forth and return to Forth, but I wanted to show the EA5 creation function. I gotta go make pizza. Happy weekend MachineForthTest.mp4
  17. It turns out it is hard to make a compiler that fits in 18.3K beat GCC performance. I thought I would try Tursi's Sprite benchmark with this new compiler. GCC did this benchmark in 5 seconds. This version in generic Forth ran in 27 seconds DECIMAL ( more direct translation of Tursi ASM code to Forth) : TURSI.OPT 100 0 DO 239 0 DO I $301 VC! LOOP 175 0 DO I $300 VC! LOOP 0 239 DO I $301 VC! -1 +LOOP 0 175 DO I $300 VC! -1 +LOOP LOOP ; This version using the Camel99 inline optimizer and ran in 20 seconds ( optimize inner loop code) : TURSI.INLINE 100 0 DO INLINE[ 239 0 ] DO INLINE[ I $301 VC! ] LOOP INLINE[ 175 0 ] DO INLINE[ I $300 VC! ] LOOP INLINE[ 0 239 ] DO INLINE[ I $301 VC! -1 ] +LOOP INLINE[ 0 175 ] DO INLINE[ I $300 VC! -1 ] +LOOP LOOP ; This version in Machine Forth ran in 15 seconds. It uses the A register on two loops because the FOR NEXT loop as envisioned by Chuck Moore is a down-counter. 100 # BEGIN \ using register A for up counting 0 #A! 239 # FOR $301 VDPA! [email protected] VDPWD #C! A1+! NEXT 0 #A! 175 # FOR $300 VDPA! [email protected] VDPWD #C! A1+! NEXT \ for/next index is a down-counter 239 # FOR $301 VDPA! [email protected] VDPWD #C! NEXT 175 # FOR $300 VDPA! [email protected] VDPWD #C! NEXT 1- -UNTIL If I made a macro for VDPA! (VDP address store) it ran in 10 seconds. That was the best I could do so far. I have also added some incrementors/decrementors for the A register because they are native instructions on the 9900. A1+! A1-! A2+! A2+! My push/pop optimizer failed in this test as well so more sleuthing is required. Edit: Got the optimizer working on [email protected] and [email protected] That got it down to 8 seconds with a VDPA! as a macro.
  18. Thanks Tursi. I will give it a whirl.
  19. Well I have had trouble following it myself. Some of the advanced stuff started to fall into place in the last week or so. Here is a summary: Normal Forth has a bunch of Assembly language words that do stuff. ( DUP SWAP OVER + - * / etc). These things are always "called" so there is some overhead to make everything go but they only take 2 bytes in your program every time you use a word. Machine Forth does the opposite. It uses these same short pieces of Assembly code but instead of calling them, it copies them into RAM one after another. No calling unless you want that. The magic is that the Forth colon definition lets you record Forth Assembler code as a Forth word. When you run that word it will run the Assembler code which writes the code into memory. This would be called a macro in a modern "macro-assembler" language. So when I want machine Forth to do addition I make this: : + ( n n -- n) *SP+ TOS ADD, ; It does not RUN the code when you type + in your machine Forth program. When you use + in a machine Forth program it is like you typed in the assembly language, so the code gets written into RAM. In this case in a separate memory block, not part of Camel Forth. Make a bit more sense? The rest is the details of getting the @#$!# thing to make an actual EA5 program image.
  20. Wait. Were you just making a joke?
  21. From the human perspective that makes perfect sense. You can see all the data a glance. From the TI-99 perspective our old computer doesn't really care. It's a wild memory model however with a lot of different type of memory in the system. Many modern machines force a separation of code and data memory so there's that.
  22. I may be getting the hang of this. I am going to quote an article from ForthWrite Magazine, from the Forth Interest Group UK. June 2000, Special Issue. I used it for reference and the author, John Tasgal, explains this better than I could. ----------- "Tail-Recursion Optimisation In any definition the return action of the word before a semicolon, and of the semicolon itself, can always be compiled into a single return. word1 ..... lastword ; As nothing happens between lastword returning and ';' returning, the lastword return is superfluous. A more elaborate example is the recursive call at the end of a WHILE loop. If we have a series of nested calls then the last instruction is in each case a return. At runtime this produces '; ; ; ; ;' viz. a sequence of returns. The point is that when these calls unwind all that happens is that a sequence of returns are executed, one after the other. Nothing is done between them. The only necessary return is the first one pushed onto the return stack (and so the last to be executed). Removing these superfluous returns is known as tail-recursion optimisation. Most Machine Forth compilers (and also Color Forth) contain a 'tail-recursion optimiser'." ----------- Machine Forth has a special semi-colon for this purpose called -; Like most things Forth it is up to you to use it where you want to. This would be whenever the last word in a definition is a COLON definition ie: a sub-routine. It won't work if the last item is a constant or a variable or an inline primitive word for example. Here is how I implemented -; and it seems to work. (H: ;H are aliases for Camel99's (the Host) colon/semi-colon so I can keep my head on straight) \ tail call removal semi-colon H: -; ( -- ) LOOKBACK ( addr ) >R \ fetch & save sub-routine address -8 TALLOT \ remove the call sequence (go back 8 bytes) R> @@ B, \ compile a branch to the sub-routine ;H Here is a the test program that showed it working. It saves 32 bytes using tail-call optimization which is a welcome bonus and on the TI-99 that's 16 instructions of speed improvement too! \ tail-call optimization test program Sept 8 2021 Fox COMPILER NEW. HEX 2000 ORIGIN. OPT-ON TARGET \ code for the target binary program HEX 8C02 EQU VDPWA \ Write Address port HEX 8C00 EQU VDPWD \ Write Data port CREATE TXT S" Hello World!" S, : HI 0 # VDPWA #C! \ character store VDP address LSB 40 # VDPWA #C! \ character store VDP address MSB + "write" bit TXT [email protected] BEGIN *AREG+ VDPWD @@ MOVB, 1- -UNTIL DROP ; : LEVEL4 HI -; : LEVEL3 LEVEL4 -; : LEVEL2 LEVEL3 -; : LEVEL1 LEVEL2 -; HEX PROG: MAIN \ setup Forth machine 0 LIMI, 3F00 WORKSPACE 3D00 RSTACK 3E00 DSTACK LEVEL1 8300 WORKSPACE NEXT, END.
  23. If you've optimized one thing... Working at compile time is really interesting. If you understand how to detect a situation that you don't like it's easy to remove it and replace it with different code. This is new to me. So I have a smart DUP that detects if there was a DROP in the previous instruction. This can save 6 bytes whenever two Forth primitives are connected together where the 1st one ends with DROP and the second word starts with DUP. (to make room in R4) Here is how the optimizer looks: \ pop/push optimizer HEX C136 CONSTANT 'DROP' \ machine code for DROP : DUP, ( n -- n n) TOS DPUSH, ; \ normal dup : LOOKBACK ( -- u) THERE 2- @ ; \ fetch previous instruction code : OPT-DUP, ( n -- n ?n) \ SMART dup LOOKBACK 'DROP' = \ look back for DROP IF -2 TALLOT \ move target dictionary back 1 cell ELSE DUP, THEN ; * TALLOT is like ALLOT but operates on the target memory image I wanted to see how I could remove the assembly language in the Hello program print loop but continue to use the A register. I have landed on using 9900 type syntax so the A register looks like a 9900 register in the Forth code but with extra characters that are from Forth. [email protected] fetches register A to the top of the data stack. A! stores the top of the data stack into the A register. *[email protected] means fetch A, indirect address to top of data stack *[email protected]+ means fetch A, indirect with auto-incrementing. This is different than Chuck Moore's CPU but in order to get the performance out of the CPU we have to use its features. \ A register Machine Operators for TMS9900 : [email protected] ( -- n) ?DPUSH AREG TOS MOV, ; \ Dpush(T) T=A : *[email protected] ( -- n) ?DPUSH *AREG TOS MOV, ; \ Dpush(T) T=*A : *[email protected]+ ( -- n) ?DPUSH *AREG+ TOS MOV, ; \ Dpush(T) T=*A A=A+cell : (A)@ ( u --) ?DPUSH (AREG) TOS MOV, ; \ Dpush(T) [email protected](A) : #A! ( addr --) AREG SWAP LI, ; \ load A with literal number BF addition : A! ( addr -- ) TOS AREG MOV, DROP ; \ A! A=T Dpop(T) : *A! ( addr) TOS *AREG MOV, DROP ; \ !A [A]=T Dpop(T) : *A!+ ( n --) TOS *AREG+ MOV, DROP ; \ !A+ [A]=T A=A+cell Dpop(T) : (A)! ( n --) TOS SWAP (AREG) MOV, DROP ; \ addr A-plus-store for versatility. : A+! ( n -- ) TOS AREG ADD, DROP ; Chuck's machine did not have byte access and so he did it in his code as needed. That's not right for the 9900 so I have these byte-wise operators again with the 9900 addressing modes. \ added byte operations. BFox : *[email protected] ( -- 0c00) ?DPUSH *AREG TOS MOVB, TOS 8 SRL, ; : *[email protected]+ ( -- 0c00) ?DPUSH *AREG+ TOS MOVB, TOS 8 SRL, ; : *AC! ( 0c00 --) 1 (TOS) *AREG MOVB, DROP ; : *AC!+ ( 0c00 --) 1 (TOS) *AREG+ MOVB, DROP ; A problem arises when you do this *[email protected]+ VDPWD #C! As seen above, the *[email protected]+ ends with the SRL instruction to swap the byte in TOS (ie: R4) But the #C! operator is this: : #C! ( c addr --) TOS SWPB, TOS SWAP @@ MOVB, DROP ; So we swap the byte to one side only to swap it back to other side. So I replaced TOS SWPB, with ?SWPB, \ swap byte optimizer : ?SWPB, ( n -- n) LOOKBACK 0984 = \ look back for "SRL R4,8" IF -2 TALLOT \ remove SRL ELSE TOS SWPB, \ we need SWPB THEN ; Seems to work and program is still pretty efficient. It wastes a move into R4 versus using the Assembly language single instruction. So the actual program looks like this with only machine Forth. PROG: MAIN \ setup Forth machine 0 LIMI, 3F00 WORKSPACE 3D00 RSTACK 3E00 DSTACK 0 # VDPWA #C! \ character store VDP address LSB 40 # VDPWA #C! \ character store VDP address MSB + "write" bit TXT [email protected] BEGIN *[email protected]+ VDPWD #C! 1- -UNTIL DROP 8300 WORKSPACE NEXT, END. It's not normal Forth but I think you can write some pretty fast programs with it. The next thing to tackle is tail-call optimization.
  24. Thanks for that info. I was thinking similarly. I used a DOS Forth system in the 90s that took the .OBJ and the .MAP file and allowed linking external language code into the Forth environment. Forth traditionally didn't play nice in the sandbox with other languages in the old days. Not true today with commercial systems. I am just keeping my old brain running so making something like this work on the 99 is good challenge.
×
×
  • Create New...