+TheBF Posted May 15, 2019 Author Share Posted May 15, 2019 For a little more clarity here is the actual code I used as my example. EVENT1 CLR R12 CRU base of the TMS9901 SBO 0 Enter timer mode LI R1,>3FFF Maximum value INCT R12 Address of bit 1 LDCR R1,14 Load value DECT R12 There is a faster way (see below) SBZ 0 Exit clock mode, start decrementer EVENT2 CLR R12 SBO 0 Enter timer mode STCR R2,15 Read current value (plus mode bit) SRL R2,1 Get rid of mode bit LDCR R12,15 Clear Clock register, and exit timer mode S R2,R1 How many cycles were done? My apparently unique use is to let the timer run continuously. When I want to time something, I read the timer with code like EVENT2 , but I don't subtract anything, I simply store the timer value. Then some time later (less than 349 mS) I just read the timer again, subtract both readings and get the ABS value. Simple and faster than reloading each time. Here is the actual code in RPN assembler. Notice I had to disable interrupts. CODE: TMR! ( -- ) \ load TMS9901 timer to max value 3FFF W 3FFF LI, \ load scratch register W with MAXIMUM timer value R12 CLR, \ CRU addr of TMS9901 = 0 0 LIMI, 0 SBO, \ SET bit 0 to 1, Enter timer mode R12 INCT, \ CRU Address of bit 1 = 2 , I'm not kidding W 0E LDCR, \ Load 14 BITs from R1 into timer R12 DECT, \ go back to address 0 0 SBZ, \ reset bit 0, Exits clock mode, starts decrementer 2 LIMI, NEXT, \ 16 bytes END-CODE CODE: TMR@ ( -- n) \ read the TMS9901 timer TOS PUSH, R12 CLR, 0 LIMI, 0 SBO, \ SET bit 0 TO 1, ie: Enter timer mode TOS 0F STCR, \ READ TIMER (14 bits plus mode bit) into W TOS 1 SRL, \ Get rid of mode bit 0 SBZ, \ SET bit 1 to zero 2 LIMI, NEXT, END-CODE Quote Link to comment Share on other sites More sharing options...
+mizapf Posted May 15, 2019 Share Posted May 15, 2019 BTW my source for the code was here: http://www.unige.ch/medecine/nouspikel/ti99/tms9901.htm#Timer The decrementer can be stopped by simply writing a zero to the leaving register, and leaving timer mode." I'm afraid Thierry is wrong here. I used Tursi's program (slightly edited): DEF START START LIMI 0 CLR R6 LI R7,>2000 LI R0,>3000 * Start value LI R12,2 SBO -1 * Enter clock mode LDCR R0,14 * Load clock register LP SBO -1 * Enter clock mode STCR R5,14 * Read register SBZ -1 * Leave clock mode C R5,R6 * Keep highest value JL J1 MOV R5,R6 J1 CLR R0 MOVB R0,@>8C02 MOVB R0,@>8C02 MOV R5,R0 BL @DIG MOVB R7,@>8C00 MOV R6,R0 BL @DIG TB 27 * Mouse button (R12=2) on the Geneve JEQ LP BLWP @0 * PRINT A HEX VALUE FROM R0 DIG MOV R0,R1 LI R3,4 DIGL SRC R1,12 MOV R1,R4 ANDI R4,>000F MOVB @HEX(R4),@>8C00 DEC R3 JNE DIGL RT HEX TEXT '0123456789ABCDEF' END When you set the start value to 0000, the clock is still counting down. You can try this program for yourself, just leave away the TB 27 check when you run it on a TI-99/4A (it is the left mouse button on the Geneve; when I press it, the program exits). So I turn on the clock mode, load the register with 0, and leave clock mode, and it still counts. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 16, 2019 Author Share Posted May 16, 2019 I'm afraid Thierry is wrong here. I used Tursi's program (slightly edited): When you set the start value to 0000, the clock is still counting down. You can try this program for yourself, just leave away the TB 27 check when you run it on a TI-99/4A (it is the left mouse button on the Geneve; when I press it, the program exits). So I turn on the clock mode, load the register with 0, and leave clock mode, and it still counts. Ok thanks. I have not actually tried loading the timer with 0 using my code. All I knew what that it worked as expected on real iron. I will write a version that lets me load the initial value interactively in Forth so I can play with it. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 16, 2019 Author Share Posted May 16, 2019 So I re-wrote my code and made it more like Tursi's in terms of setting up the CRU address. (it took 1 less instruction ) I re-built Forth and put it on the old machine and sure enough the timer keeps running even when I load it with 0 as you can see in the screen shot. CODE: TMR! ( n -- ) \ load TMS9901 timer from stack 0 LIMI, R12 CLR, \ CRU addr of TMS9901 = 0 0 SBO, \ SET bit 0 to 1, Enter timer mode R12 INCT, \ CRU Address of bit 1 = 2 , I'm not kidding TOS 0E LDCR, \ Load 14 BITs from TOS into timer -1 SBZ, \ reset bit 0, Exits clock mode, starts decrementer 2 LIMI, TOS POP, NEXT, END-CODE CODE: TMR@ ( -- n) \ read the TMS9901 timer TOS PUSH, 0 LIMI, R12 2 LI, \ cru = 1 (honest, 2=1) -1 SBO, \ SET bit 0 TO 1, Enter timer mode TOS 0E STCR, \ READ TIMER (14 bits) -1 SBZ, \ RESET bit 1, exit timer mode 2 LIMI, NEXT, END-CODE 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted May 16, 2019 Share Posted May 16, 2019 Usually, TI's specification documents are very precise, but this here is at least ambiguous. It led to several misunderstandings in the same way. Saying that a clock is "enabled" or "disabled" is normally understood as running or stopped. In fact, the formulation glitches already start with the name "clock mode" which could make you think you have to turn on this mode to run the clock, but this mode is rather used to read or write the clock register, while the clock is running in interrupt mode. And there are some more open questions that I will need to check to make sure: - A soft reset (SBZ 15 in clock mode) resets all I/O ports to input. Does it also reset the interrupt mask? (not explicitly stated) - If you set a port to output mode (e.g. P15), can it trigger the interrupt line with which it shares the pin (/INT7)? (not explicitly stated) Raphael Nabet (the original author of the TI emulation in MESS) assumed that the latter is not possible; but I'll try to test it on my Geneve. The point is that I can rewrite the interrupt handler on the Geneve, as it resides in RAM. On the TI, if the interrupt source is not the VDP, the handler searches the DSRs. 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 17, 2019 Author Share Posted May 17, 2019 Step 1 to generating native 9900 code from Forth. I have ripped up parts of the XFCC99 Forth cross-compiler and kept other parts and created NATIVE99 , the beginning a Forth cross-compiler that generates native code. It's still very manual. I can compile colon definitions but they don't know how to call themselves yet. There is no Forth dictionary, that is all kept in the PC more like a C compiler would do. Most of the Forth primitives compile inline at the moment but nothing is compiled into the binary unless you use it in a program, so the programs can be very small. The spoiler has the first program which is displaying all the dirty details. (but it works as expected) Here is the compiler summary Program Summary: A000 40960 Load address 90 144 Code size A074 41076 boot address 116 278 Image size 4 4 Code words 0 0 Forth words \ Native99 test program 1 TARGET-COMPILING START. \ sets a timer NEW. \ init target memory to FFFF ABSOLUTE A000 ORIGIN. TI-99.EA5 [CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH INCLUDE CC9900\NATIVE\NCCOLON.FTH [TC] VARIABLE X VARIABLE Y VARIABLE Z \ nested sub-routines : SUB3 Y @ Z ! ; : SUB2 X @ Y ! SUB3 CALL, ; : SUB1 X 1+! SUB2 CALL, ; PROGRAM: RUN \ sets the entry address 8300 WORKSPACE FF00 RSTACK FEA0 DSTACK BEGIN SUB1 CALL, AGAIN BYE END. [CC] FILENAME: NCPROG1 FILENAME$ $SAVE-EA5. \ FILENAME$ was set by FILENAME: // copy NCPROG1 cc9900\clssic99\dsk1\ CR ." === COMPILE ENDED PROPERLY ===" 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 17, 2019 Author Share Posted May 17, 2019 Step 2: Colon definitions call themselves In Step 1 we used a sub-routine definer aliased as colon to create Forth words that had to be explicited called. In step 2 we defined a proper <DOCOL> routine with CREATE DOES>. CREATE lets us define the compile-time activity. DOES> lets us define what happens when we run the WORD that CREATE created, if that makes sense. It's like a simple object constructor. \ Native code COLON COMPILER : <DOCOL> ( n -- ) CROSS-ASSEMBLING CREATE \ create the word in compiler dictionary THERE , \ remember my address in compiler Forth R11 RPUSH, \ compile TI-99 entry code into target program \ runtime: DOES> @ @@ BL, \ fetch my address, branch&link indirect ; \ define the cross-compiler's TI-99 colon and semi-colon CROSS-ASSEMBLING HOST: : TFORTHWORDS [ FORTH ] 1+! \ count the word for reporting <DOCOL> ;HOST HOST: ; RET, ENDSUB ;HOST The spoiler has the new sub-program that no longer needs CALL. \ Native99 test program 2 TARGET-COMPILING START. \ sets a timer NEW. \ init target memory to FFFF ABSOLUTE A000 ORIGIN. TI-99.EA5 [CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH INCLUDE CC9900\NATIVE\NCCOLON.FTH [TC] VARIABLE X VARIABLE Y VARIABLE Z \ nested sub-routines : SUB3 Y @ Z ! ; : SUB2 X @ Y ! SUB3 ; : SUB1 X 1+! SUB2 ; PROGRAM: RUN \ sets the entry address 8300 WORKSPACE FF00 RSTACK FEA0 DSTACK BEGIN SUB1 AGAIN BYE END. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 18, 2019 Author Share Posted May 18, 2019 (edited) Native Code Compiler (NCC) Preliminary Performance Comparison It is clear to me now that I will need to make a much more sophisticated compiler to come close to Assembler performance. Here is a comparison of three Forth compilers and the equivalent written in Forth assembler. : TEST FFFF BEGIN 1- DUP 0= UNTIL BYE ; Forth Assembler that does the same thing. TOS PUSH, TOS 0FFFF LI, BEGIN, TOS DEC, EQ UNTIL, BYE ITC 10.9 secDTC 8.6 secNCC 4.4 sec ( all inline primitives)ASM < 1 sec Some obvious points to optimize... 1- DUP should be smarter to remove the TOS POP from 1- so the DUP is not needed. This would become TOS DEC, 0= UNTIL should be optimized to a JNE @BEGIN This is called "peephole" optimization in Forth compilers so that is where the focus has to be to make this closer to ASM code. Edit: I should add that this is the worst case example. In real programs ITC Forth tends to be 3 to 4 times slower than optimized compilers as seen in the Benchmarking languages thread. Edited May 18, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted May 18, 2019 Share Posted May 18, 2019 I cannot speak to the NCC for fbForth, but, substituting DROP for BYE so I do not need to reload everything while testing, TEST becomes in fbForth (12.7 seconds) HEX : TEST FFFF BEGIN 1- DUP 0= UNTIL DROP ; which is ~1.5 seconds slower than (11.2 seconds) HEX : TEST2 FFFF DUP BEGIN WHILE 1- DUP REPEAT DROP ; which, not needing the stack in fbForth ALC, becomes (~1 second) HEX ASM: TEST3 R0 FFFF LI, BEGIN, R0 DEC, NE WHILE, REPEAT, ;ASM Using BEGIN, ... UNTIL, (as in your code) saves one ALC instruction and is marginally faster (10%?—stopwatch timing): HEX ASM: TEST4 R0 FFFF LI, BEGIN, R0 DEC, EQ UNTIL, ;ASM ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 18, 2019 Author Share Posted May 18, 2019 Thanks Lee, That confirms my results. Yes now I remember, you have the BYE that writes to low RAM. Mine is just 2 instructions. The problem with the NCC concept at the moment is that to compute 0= as a primitive it is a TOS TOS MOV, and then set/reset the TOS register appropriately, followed by UNTIL which is another TOS TOS MOV, and then drop the TOS and jump back or jump forward, so it is a pile of instructions on the Forth VM. I am going to see if I can make something smarter for 0= like is done in the Forth Assembler and potentially use that idea for branching as well. It's a challenge to do it really efficiently without leveraging the machines native branching but Forth uses the TOS as the status flag... It might be better to just create a bunch of machine macros like @, !, etc and use ALC. :-) Without the Forth headers in the code however you save about 25% of the space so that leaves room for code bloat due to the VM concept. I am going to take a page from TI-Forth and put the return macro in a register so RET, will be as below. This will save me 2 bytes per colon definition. Sub-routines (colon defs.) are just called with BL because each colon definition begins by pushing R11 onto the rstack. CODE RET, *R10 B , \ R10 will contain this code located in scratchpad RAM *RP+ R11 MOV, *R11 B, Ultimately if I can get it in reasonable shape it would be fun to generate a working Forth kernel that is all native code. It will be bigger than 8K I am sure, but maybe only 50% if make it smart enough. In the shorter term I want to make it complete enough to compile some real program and see how it all works and how big it is. I should be able to do the Sieve benchmark in a little while. (He said over-optimistically) Anyway its a good education. Thanks for following the progress. How's that knee? 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 18, 2019 Author Share Posted May 18, 2019 Native Code Compiler (NCC) Preliminary Performance Comparison SP DECT, R4 SP+ MOV, ( PUSH, macro) R4 0FFFF LI, BEGIN, R4 DEC, EQ UNTIL, BYE BTW for clarity my assembler code is actually this, because TOS is just an alias for R4 in my Forth assembler and PUSH, is a 2 instruction macro. So our ALC tests are functionally the same. When I first started this project I was so confused I used every trick I could to simplify the code. 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted May 18, 2019 Share Posted May 18, 2019 . . . Anyway its a good education. Thanks for following the progress. How's that knee? The knee (1 year post-surgery) is doing well, as is the other one (2 years post-surgery)! Thanks for asking. ...lee 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 19, 2019 Author Share Posted May 19, 2019 (edited) NCC Program 3 Here is something that I didn't expect. In native code the Forth colon is just is an Assembly language sub-routine. I was looking at the docs for MeCrisp Forth which is a native code Forth for a number of processors. I was shocked to see the colon used with Assembly language inside. Guess what? It works in my crude compiler as well. This program compiles to 112 bytes. \ Native99 test program 3 CROSS-COMPILING \ Compiler pre-amble START. \ sets a timer NEW. \ init target memory to FFFF A000 ORIGIN. TI-99.EA5 [CC] INCLUDE CC9900\NATIVE\NCFORTH.FTH [CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH [CC] INCLUDE CC9900\NATIVE\NCCOLON.FTH [CC] HEX CROSS-ASSEMBLING FFFF CONSTANT >FFFF : FTEST >FFFF BEGIN 1- DUP 0= UNTIL ; : ATEST TOS PUSH, TOS FFFF LI, BEGIN, TOS DEC, EQ UNTIL, ; PROGRAM: RUN \ sets the entry address 8300 WORKSPACE FF00 RSTACK FEA0 DSTACK FTEST ATEST BYE END. Edited May 19, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 19, 2019 Author Share Posted May 19, 2019 (edited) NCC Program 4 Keyboard This is become very interesting. I stole code from CAMEL99 Forth and created Forth KEY very quickly. \ Native99 test program 4 Keyboard interface CROSS-COMPILING \ Compiler pre-amble START. \ sets a timer NEW. \ init target memory to FFFF A000 ORIGIN. TI-99.EA5 \ first build the compiler [CC] INCLUDE CC9900\NATIVE\NCFORTH.FTH [CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH [cc] INCLUDE CC9900\NATIVE\NCCOLON.FTH CROSS-ASSEMBLING : KEY? ( -- char | 0) TOS PUSH, TOS CLR, \ TOS is output 0 LIMI, 83E0 LWPI, \ switch to GPL workspace 000E @@ BL, \ call ROM keyboard scanning routine 8300 LWPI, \ return to Forth's workspace , interrupts are restored 2 LIMI, 837C @@ R0 MOVB, \ read GPL status byte (=2000 if key pressed) NE IF, 8374 @@ TOS MOV, \ read the key into TOS (R4) ENDIF, ; : KEY ( -- char) BEGIN KEY? ?DUP UNTIL ; PROGRAM: RUN \ set the entry address 8300 WORKSPACE FF00 RSTACK FEA0 DSTACK KEY DROP \ wait for key & drop it BYE \ return to boot screen END. And here is the compiled code from CLASSIC99 with my comments EDIT: Found a bug with ?DUP A05C 02E0 lwpi >8300 * workspace A060 0207 li R7,>ff00 * return stack pointer A064 0206 li R6,>fea0 * data stack pointer A068 06A0 bl @>a040 * call RUN A040 0647 dect R7 * enter sub-routine A042 C5CB mov R11,*R7 A044 06A0 bl @>a014 * call "key?" A048 C104 mov R4,R4 * ?DUP (inline) A04A 1601 jeq >a050 *** FIXED THIS JUMP A04C 0646 dect R6 * DUP (inline) A04E C584 mov R4,*R6 * A050 C104 mov R4,R4 * UNTIL (inline) A052 1602 jne >a058 A054 C136 mov *R6+,R4 A056 10F6 jmp >a044 A058 C2F7 mov *R7+,R11 * ';' (inline) A05A 045B b *R11 Edited May 19, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted May 19, 2019 Share Posted May 19, 2019 I do not suppose there is any easy way to remove superfluous line A050—perhaps through “peephole” optimizing? ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 19, 2019 Author Share Posted May 19, 2019 I do not suppose there is any easy way to remove superfluous line A050—perhaps through “peephole” optimizing? ...lee That's the challenge for sure. At the moment this is a very naive compiler. My strategy is to get a lot of working before attempting to get clever. I am working at my personal "bleeding edge" as it is. :-) Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 20, 2019 Author Share Posted May 20, 2019 (edited) I do not suppose there is any easy way to remove superfluous line A050—perhaps through “peephole” optimizing? ...lee I just tried this and it failed but... I removed MOV R4,R4 from the 0= code and it works. This is because the DUP is operating on the EQ flag. It also goes faster. 3.99 seconds by the armstrong method. So your idea was a good one. Thanks! HOST: 0= ( n -- ? ) \ TOS TOS MOV, 3 JEQ, TOS CLR, 2 JMP, TOS SETO, ;HOST IMMEDIATE HOST: UNTIL ( n --) TOS TOS MOV, \ test tos=0 3 JNE, \ if tos=0 TOS POP, \ drop n BACK JMP, \ loop TOS POP, \ elseif tos<>0, drop ;HOST IMMEDIATE I just stole some stuff from the TI-FORTH assembler that we all use to create BEGIN WHILE REPEAT and it seems to work. Removed compile time error detection for now to keep the code easier to understand. i might have to add a DROP in here some where. EDIT: Yes IF needed a DROP (TOS POP,) and so did THEN, because you need to drop the TOS if you take the jump AND if you don't take the jump just like UNTIL above. HOWEVER, If you make a word called DUPWHILE the decrementing WHILE loop executes in 1 second vs 4 seconds. Worth adding that to the peephole optimizer. \ Branch calculators taken from TI-FORTH Assembler HOST: AHEAD ( -- addr) THERE 2- ;HOST HOST: RESOLVE ( addr -- ) THERE OVER - 2- 2/ SWAP 1+ TC! ;HOST \ here we use parts of the Assembler directly HOST: IF ( n --) NE CJMP AHEAD TOS POP, ;HOST IMMEDIATE HOST: ELSE ( -- ) 0 JMP, RESOLVE ;HOST IMMEDIATE HOST: THEN ( addr -- ) RESOLVE TOS POP, ;HOST IMMEDIATE HOST: WHILE ( ) POSTPONE IF 2+ ;HOST IMMEDIATE HOST: REPEAT >R POSTPONE AGAIN R> 2- POSTPONE THEN ;HOST IMMEDIATE Edited May 20, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 20, 2019 Author Share Posted May 20, 2019 (edited) I am having so much fun with this that I woke up early. So once I figured out how to do WHILE loops it was simple to make a Chuck Moore style FOR NEXT loop. When Chuck starting building Forth architecture CPUs he found that it was always faster to just run a down counter for a loop. He stopped using DO/LOOP.He made FOR NEXT which in it's simplest form loads a number somewhere and NEXT decrements the number and loops WHILE the number<>0.We do this all the time in Assembly Language.This version uses the return stack to hold the number, it could make a little faster by using a spare register. So with the WHILE structure as an example, here is all it took to add FOR/NEXT EDIT: The empty FOR/NEXT loop with 64K iterations runs in 1 second. Less if we put the loop counter in a register. HOST: FOR TOS RPUSH, TOS POP, \ refill data stack cache register THERE \ leave compiler's current working address on PC Forth stack ;HOST IMMEDIATE HOST: NEXT *RP DEC, NE CJMP AHEAD 2+ \ same as WHILE but no need to DROP data stack >R BACK JMP, \ loop not finish, jump back to THERE R> 2- RESOLVE \ compute the address need by AHEAD and put it in the code RP INCT, \ drop the index from the return stack ;HOST IMMEDIATE Here is how it is used (compiler pre-amble removed) CROSS-ASSEMBLING 0 CONSTANT 0 1 CONSTANT 1 2 CONSTANT 2 FFFF CONSTANT >FFFF VARIABLE X PROGRAM: RUN \ set the entry address 8300 WORKSPACE FFFF RSTACK FF00 DSTACK FFFF FOR X 1+! NEXT BYE END. Edited May 20, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 20, 2019 Author Share Posted May 20, 2019 It was trivial to use R9 as the loop counter so I just did it. R9 is the IP register in threaded Camel99 Forth but we don't need that anymore. :-) By pushing R9 onto the return stack when we enter FOR and RPOPing it when we leave FOR/NEXT is nestable. A 64K loop nested 10 times takes about 9 seconds so each 64K loop is 900mS. Not to shabby! HOST: FOR R9 RPUSH, \ R0 will be the loop counter TOS R9 MOV, TOS POP, THERE ;HOST IMMEDIATE HOST: NEXT R9 DEC, NE CJMP AHEAD 2+ \ while *RP<>0 >R BACK JMP, R> 2- RESOLVE R9 RPOP, ;HOST IMMEDIATE 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 22, 2019 Author Share Posted May 22, 2019 (edited) Removed Duplicate Edited May 22, 2019 by TheBF Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 22, 2019 Author Share Posted May 22, 2019 (edited) NATIVE99 CROSS-COMPILER is starting to work like Forth After I got the VDP code moved over from Camel99 Forth I put a cheap and dirty set of screen routines together and a string compiler word. Armed with that and the FOR/NEXT loop I made a little demo. I have a bug in my foreground color setting which I need to find. But this shows me how much faster the is than threaded Forth. Wow! Also I have had to relearn how to code because now I can intermix ALC and Forth. If you make a colon definition it looks like Forth but it's really a machine code sub-routine that calls itself when invoked. (And it is nestable) So if you use it in a bunch of Assembly language you don't have to BL to it! It does it by itself. Example: : RMODE W 8C02 LI, \ VDP port address into working register 0 LIMI, \ enter a critical section R0 SWPB, \ R0= VDP-adr we are using. Set up 1st byte to send R0 *W MOVB, \ send low byte of vdp ram write address R0 SWPB, R0 *W MOVB, \ send high byte of vdp ram write address 2 LIMI, \ leave the critical section ; : WMODE R0 4000 ORI, \ set control bits to write mode RMODE \ we can mix Forth and Assembler :-) ; The spoiler shows the code for this text demo and the MP4 shows it running. I also compiled the same code on (using the same simple VDP driver) on CAMEL99 Forth for a comparison. The native99 version + libraries compiles to 950 bytes. ( Because the libraries are source, I could remove some things if I really wanted a smaller final program.) The threaded Forth version compiles to 252 bytes, but there is an 8K byte compiler/interpreter underneath it. I have not yet made the COLON compiler handle literal numbers properly meaning it needs to compile literals into the TOS register with LI when they are encountered in a program. That's why all the constants are defined. I wanted to have as many of the core routines running properly before I mess with that. \ Native99 test program A VDP Text speed CROSS-COMPILING \ Compiler pre-amble START. \ sets a timer NEW. \ init target memory to FFFF A000 ORIGIN. TI-99.EA5 \ first build the Native code compiler words [CC] INCLUDE CC9900\NATIVE\NCFORTH.FTH \ add inline primitives to the compiler [CC] INCLUDE CC9900\NATIVE\NCPRIMS.FTH \ create the colon and semi-colon words [cc] INCLUDE CC9900\NATIVE\NCCOLON.FTH \ Now we have forth, we can pull in some libraries [cc] INCLUDE CC9900\NATIVE\RUNTIME.FTH [CC] INCLUDE CC9900\NATIVE\LIB.NCC\KEY.FTH [CC] INCLUDE CC9900\NATIVE\LIB.NCC\VDP9918.FTH [CC] HEX CROSS-ASSEMBLING 0 CONSTANT 0 1 CONSTANT 1 2 CONSTANT 2 3 CONSTANT 3 4 CONSTANT 4 5 CONSTANT 5 7 CONSTANT #7 0E CONSTANT >1E 20 CONSTANT BL 41 CONSTANT 'A' 28 CONSTANT #40 3C0 CONSTANT C/SCR FFFF CONSTANT >FFFF CREATE A$ ," Forth Native Code " CREATE B$ ," is pretty fast... " VARIABLE VROW VARIABLE VCOL : VPOS ( -- vaddr) VROW @ #40 * VCOL @ + ; : EMIT ( char --) VPOS VC! VCOL 1+! ; : TYPE ( addr len --) FOR DUP C@ EMIT 1+ NEXT DROP ; : CR ( -- ) VROW 1+! VCOL OFF ; : PRINT ( $addr -- ) COUNT TYPE ; : PAGE ( -- ) VCOL OFF VROW OFF 0 C/SCR BL VFILL ; : DELAY >FFFF FOR NEXT ; : TIMES FOR DUP PRINT NEXT DROP ; PROGRAM: RUN 8300 WORKSPACE \ setup Forth VM FFFF RSTACK FF00 DSTACK >1E #7 VWTR BEGIN PAGE A$ #40 TIMES DELAY PAGE B$ #40 TIMES DELAY KEY? UNTIL BYE END. Native99 Forth.mp4 Threaded Forth.mp4 Edited May 22, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 24, 2019 Author Share Posted May 24, 2019 (edited) STOP THE PRESSES I have been struggling with how to make a native code Forth compiler take advantage of the memory to memory architecture of the 9900. Typically to deal with getting the value of a variable, for example, Forth will move the address from the stack to a register and then "fetch" the value from that address and put the value onto the stack. To assign a value to a variable the address must be moved into a register and the value on the stack moved to the address. However the 9900 can do things like: X DATA 17 Y DATA 0 MOV @X,@Y I had played with a Native code Forth compiler for DOS 30 years ago by a guy named Tom Almy and it was screaming fast. I found a paper by Tom from 1986 and my eyes have been opened. It turns out that if you make all data declarations put their data (addresses or literal numbers) onto a "LITERAL stack" you now have access these numbers as literal numbers or addresses and therefore you can compile code that takes advantage of the CPU's best features. So in the example above it would look like this in Forth ( I have created a new operator called := as in the Pascal 'assignment' operator for the example) VARIABLE X VARIABLE Y X Y := The VARIABLE declarations do the same as a label and DATA directive in Assembler. Nothing more. When we invoke the variables in our code they don't compile any code either. They just push their addresses onto the literal stack (LSTACK). When we invoke the := operator it grabs those addresses from the LSTACK and compiles them in the optimal way to move data from X into Y In the case of the 9900 we use memory to memory move. The Forth data stack is not required! How cool is that? The resultant Assembly code looks a little odd (OK a lot) but it is much more efficient than all the stack juggling. I have a lot more testing to do, but this method has made my day. Edit: Replaced experimental code with final code HOST: ! ( n -- ) ( l: addr -- ) TOS POPARG @@ MOV, TOS POP, ;HOST HOST: @ ( -- n) ( l: addr -- ) ?TOSPUSH, POPARG @@ TOS MOV, ;HOST HOST: C@ ( -- char) ( l:addr --) ?TOSPUSH, POPARG @@ TOS MOVB, TOS 8 SRL, ;HOST HOST: +! ( l:n addr --) TOS POPARG @@ ADD, TOS POP, ;HOST HOST: 1+! ( l: addr -- ) POPARG @@ INC, ;HOST HOST: 2+! ( l: addr -- ) POPARG @@ INCT, ;HOST HOST: 1-! ( l: addr -- ) POPARG @@ DEC, ;HOST HOST: 2-! ( l: addr -- ) POPARG @@ DECT, ;HOST HOST: ON ( L:adr -- ) POPARG @@ SETO, ;HOST HOST: OFF ( L:adr -- ) POPARG @@ CLR, ;HOST HOST: := ( L: src dst -- ) POPARG @@ POPARG @@ 2SWAP MOV, ;HOST Edited May 25, 2019 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 25, 2019 Author Share Posted May 25, 2019 After a lot of fiddling I think I have it working. One bug left with initializing the Literal stack... Here is the Forth program: ( I have added a report control and PROGRAM: now sets the TI-99 filename in the program image. CROSS-ASSEMBLING START. \ sets a timer NEW. \ init target memory to FFFF A000 ORIGIN. TI-99.EA5 REPORT ON VARIABLE X PROGRAM: NCPROG1 \ we are interpreting now 8300 WORKSPACE \ make the Forth VM FF00 RSTACK FFA0 DSTACK X 1+! X 1-! X ON X OFF BYE END. A006 02E0 lwpi >8300 (18) 8300 A00A 0207 li R7,>ff00 (20) FF00 A00E 0206 li R6,>ffa0 (20) FFA0 > A012 05A0 inc @>a004 A004 A016 0620 dec @>a004 A004 A01A 0720 seto @>a004 A004 A01E 04E0 clr @>a004 A004 A022 0300 limi >0000 0000 A026 0420 blwp @>0000 Here is the output code: Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 25, 2019 Author Share Posted May 25, 2019 (edited) PUSH/POP Optimization So Tom Almy's paper mentioned looking backwards in the compiled program to find optimizations. The 9900 is kind of "verbose" when it comes to generating stack code since we have to make the stacks with normal registers. Since I cache the top of stack (TOS) a register for code efficiency it can happen that at the end a routine like a store to a variable, I have to "refill" the TOS register. (POP it). That is one instruction. If the next thing that happens is I need to load the TOS register with a new value, like a literal number for example, then I immediately have to PUSH the TOS back onto the stack. That is two instructions! So that is 6 bytes and three needless instructions. I use a MACRO called PUSH, because so many routines have a line "TOS PUSH," in them. I added this code to the compiler and replaced all the TOS PUSH lines with a smart ?TOSPUSH and voila! push/pop optimization appears when you add OPTIMIZE ON to the program. Edit: Replaced with final code VARIABLE OPTIMIZE \ control switch variable \ ======================================================================= \ ?TOSPUSH gives us simple push/pop optimization for the TOS register R4 \ It saves 3 instructions when it detects the condition so very valuable HEX C136 CONSTANT TOSPOP \ machine code for *SP+ R4 MOV, \ look back in the compiled code 1 cell and get the data : PREVINSTR ( -- n) THERE CELL- T@ ; : ?TOSPUSH, OPTIMIZE @ IF PREVINSTR TOSPOP = IF 1 CELLS NEGATE TALLOT \ erase the TOS POP, ie: de-allocate it ELSE TOS PUSH, \ too bad. We gotta do it THEN ELSE TOS PUSH, \ normal operation THEN ; Edited May 26, 2019 by TheBF Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 27, 2019 Author Share Posted May 27, 2019 (edited) My Brain is on Fire Making a native code compiler is hard but fun. After taking the idea of "literal stack from Thomas Almy's paper * "Compiling Forth for Performance" I built up a handy dandy set of routines to manage this new thing. It has a word LDEPTH that lets me know how "deep" the literal stack is, in other words how many parameters are sitting on it at the moment. Tom was very light on how to use this stack exactly, but so far I can know about my input parameters at compile time. (While the program is being read) This adds some surprising features and also some complexity. I have been playing with the Forth word store (!) . It needs a value and an address to put it in. Typically these two arguments are sitting on the Forth data stack for store to grab them and put the value into the address. I may be over complicating things but I think I need to handle 3 possible cases now with this "literal stack" while compiling: There is nothing on the literal stack because some other code left my arguments on the data stack. ie: Normal Forth There is a value on the data stack, but some code has put the address on the literal stack, Both arguments are on the literal stack. So with that assumption here is what it takes to compile the Forth ! operator: Edit: Updated code. I quickly realized there could be other stuff on the literal stack from un-resolved operations, so now if there is more than 1 item, we go to the default method 3. HOST: ! ( n -- ) ( l: n addr -- ) LDEPTH CASE \ ** Here we have no args at compile time. It's like regular Forth 0 OF *SP+ *TOS MOV, TOS POP, \ refill TOS ENDOF \ ** Here we know the address at compile time, value is on Forth stack 1 OF TOS POPARG @@ MOV, TOS POP, \ refill TOS ENDOF \ DEFAULT: ** here we know both arguments at compile time. \ No need to bother the stack at all POPARG >R \ address -> Rstack R0 POPARG LI, \ push value to TOS R0 R> @@ MOV, \ store tos at address ENDCASE ;HOST The very cool thing is in case #3. Here we know everything at compile time and therefore do not need to play with the stack. I used R0 to do the job. I suspect there will be other cases like this. Tom mentions that array address calculations can be done at compile time for example. This means the compiler will use the CASE statement heavily slowing down the compilation speed, but at the moment it's happening in DOS box on the PC. * Journal of Forth Application and Research Volume 4, Number 3 Edited May 28, 2019 by TheBF Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.