+TheBF Posted November 16, 2022 Author Share Posted November 16, 2022 I have been noodling on how best to add conventional comparison operations to native code Forth, efficiently, for way too many days now. (I was thinking that dementia was starting to set in...) The mis-match is the status register for the CPU versus the top of the stack flags in Forth. In frustration I cheated and took a look at Mecrisp Forth for the TI MSP430. There I found that even Mathias has to use extra instructions. The MSP430 has SUBC instruction which he uses a fair bit, but we don't have that. I did see that he defined the masks for the status bits. He also SETs the bits in the status register. We can't do that either. I think I finally have a use for the STST instruction. Here is where I am going for these comparisons. These two seem to work in my test environment. (I am using ASMFORTH for testing because it is interactive, and I have added conventional IF and UNTIL that drop their argument) I can't think of a way to make these smaller/faster. \ status register masks HEX 8000 CONSTANT L> 4000 CONSTANT A> 2000 CONSTANT EQ> 1000 CONSTANT C> : 0= ( n -- ?) TOS 0 CI, TOS STST, TOS A> ANDI, \ and A> flag ; : = ( n n -- ?) *SP+ TOS CMP, \ set's carry flag if 0 TOS STST, TOS A> ANDI, \ and C> flag ; Where the Test code is: CODE MFTEST2B ( 3.8 seconds) FFFF # BEGIN 1- DUP 0= UNTIL DROP ;CODE Where UNTIL is defined as: : UNTIL TOS DEC, TOS POP, \ refill TOS register CS> HERE 0 JNC, <BACK \ CS> pops address of BEGIN from control stack \ <BACK computes difference and modifies the JNC instruction ; Way too long but I think I can work with this. Open to comments from the Assembly language experts. Here is the output code from the loop. Spoiler DE14 0646 dect R6 DE16 C584 mov R4,*R6 DE18 0204 li R4,>ffff \ push FFFF onto the stack \ BEGIN DE1C 0604 dec R4 \ 1- DE1E 0646 dect R6 DE20 C584 mov R4,*R6 DE22 0284 ci R4,>0000 \ 0= DE26 02C4 stst R4 DE28 0244 andi R4,>2000 DE2C 0604 dec R4 \ UNTIL DE2E C136 mov *R6+,R4 \ drop DE30 17F5 jnc >de1c \ jump back DE32 C136 mov *R6+,R4 \ drop DE34 045A b *R10 \ return to Forth 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted November 16, 2022 Share Posted November 16, 2022 2 hours ago, TheBF said: He also SETs the bits in the status register. We can't do that either. Actually, you can—though what you might need to do is likely slower than your solution. You can BLWP to a small routine in scratchpad RAM with new workspace also in scratchpad RAM, by passing a byte in R0 that contains the status bits to change. Change those bits in R15 and return with RTWP. R15 will be copied to the status register upon return. With judicious use of the workspace, you could make it a pretty small routine. I mean by “judicious use of the workspace” that for one register, use R12; for 2, R11 and R12, etc. That way you will only use the end of the workspace and can overlay working code with the unused registers without fear. Obviously, 3 registers (R13, R14, R15) are used by BLWP/RTWP, so you must work outside of those, but you get the picture. ...lee 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted November 17, 2022 Share Posted November 17, 2022 Another way is to store the address of the currently used workspace in R13, the desired content of the status register in R15 and the address after the next instruction in R14, then execute RTWP. This will be an "inline" return, since the workspace will not change, the branch will be to the instruction you would execute anyway and the status register will change. In a standard assembly program it works fine. May be a bit tougher to include in Forth. For efficiency you let R13 be what it is the whole time and only change R14 if you really have to execute this in more than one place. That's the price to pay for this way to create an LST instruction. The LWP instruction can be created in the same way. Both LWP and LST are present in the TMS 9995, so it's obvious that at TI, they also realized that these instructions could be handy to have. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 17, 2022 Author Share Posted November 17, 2022 (edited) 2 hours ago, apersson850 said: Another way is to store the address of the currently used workspace in R13, the desired content of the status register in R15 and the address after the next instruction in R14, then execute RTWP. This will be an "inline" return, since the workspace will not change, the branch will be to the instruction you would execute anyway and the status register will change. In a standard assembly program it works fine. May be a bit tougher to include in Forth. For efficiency you let R13 be what it is the whole time and only change R14 if you really have to execute this in more than one place. That's the price to pay for this way to create an LST instruction. The LWP instruction can be created in the same way. Both LWP and LST are present in the TMS 9995, so it's obvious that at TI, they also realized that these instructions could be handy to have. I completely forgot about that way of getting at the status register. Thanks to both of you. By the way I am working in Assembler here. It's just RPN Assembler, so there are no limitations that way. The challenge is how best to translate Forth source code to native 9900 code which is what all modern Forth systems do these days. I am working way above my pay grade but it's a good challenge. If you don't mind reading German code comments here is a link to Mecrisp Forth MSP430 which about as close as you get to 9900 these days. Mecrisp generates native code interactively. With constant folding and other optimizations. Mathias Koch is a wizard to a guy like me. Oops: Wrong link was posted. Fixed. mecrisp/mecrisp-source/common at master · kevinfish/mecrisp · GitHub Edited November 17, 2022 by TheBF bad link 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted November 17, 2022 Share Posted November 17, 2022 Also remember that when you do a BLWP, you can let the branch vector contain the same workspace as you are coming from. It will then be like BL - you can only do it once (not nested). The only penalty otherwise is that R13-R15 of the calling workspace will be used to contain return data, not just the called workspace. And the reason is of course that they are effectively the same. 2 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 20, 2022 Author Share Posted November 20, 2022 So all this time playing with the status register and with coaching from @Lee Stewart and @apersson850 I went back to the Machine Forth compiler. Chuck's Machine Forth only branched on the carry flag of his CPU as I understand for his code. (or carry was set on the stack data . Not sure of the hardware details) Now in MachForth I am back to branching on true or false with these macros. : IF ( -- $$) THERE 0 JNC, ; : -IF ( -- $$) THERE 0 JOC, ; Then for consistency, I just re-use the branching words with <BACK to make loops. <BACK knows how to resolve a backwards branch and modifies the jump instruction offset field (the high byte). : BEGIN THERE ; IMMEDIATE : WHILE IF SWAP ; : -WHILE -IF SWAP ; : UNTIL ( addr --) IF <BACK ; : -UNTIL ( addr --) -IF <BACK ; : AGAIN ( addr --) THERE 0 JMP, <BACK ; : REPEAT ( addr -- ) AGAIN THEN ; I also now understand why Chuck abandoned making IF and WHILE and UNTIL consume their arguments when he started coding for his own hardware. The extra trouble to do that slows the empty loops down by ~3X. Machine Forth while loop runs in .9 seconds COMPILER \ name space that has compiler directives NEW. HEX 2000 ORIGIN. TARGET PROG: DEMO1 \ .9 seconds FFFF BEGIN 1- WHILE REPEAT DROP NEXT, \ Return to Forth console END. \ Usage from Forth command line: \ DEMO1 RUN By comparison creating the equivalent loop in ASMForth, which uses the normal Forth convention of consuming arguments, but compiles to native code, the loop runs in 2.8 seconds HEX CODE DOWHILE FFFF # BEGIN 1- DUP WHILE REPEAT DROP ;CODE 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 20, 2022 Author Share Posted November 20, 2022 One big breakthrough on MachForth comes from the creation of SUPERSAVE. The MachForth compiler is now a 20k dictionary because I include the Assembler, dev. tools like DUMP and ELAPSE for timing code. It also uses WORDLISTS to manage Forth words that are the same but do VERY different things to compile native code versus threaded code. It also reserves the entire 8K of low RAM for the program image, so our little machine is getting pretty full. I recompiled a version of MachForth under the SuperCart version of Camel99 and now there is room to write a serious program in MachForth. 2 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted November 21, 2022 Share Posted November 21, 2022 Which means that you kind of accomplish the same thing as the p-system, which could't have existed properly in a low-memory device like the TI 99/4A unless there was the p-code card, which acts like a 48 Kbyte ROM-disk with the operating system plus a 12 Kbyte code storage for the p-machine interpreter and BIOS. With later RAM solutions similar things can be accomplished with "standard" hardware. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted November 21, 2022 Author Share Posted November 21, 2022 3 hours ago, apersson850 said: Which means that you kind of accomplish the same thing as the p-system, which could't have existed properly in a low-memory device like the TI 99/4A unless there was the p-code card, which acts like a 48 Kbyte ROM-disk with the operating system plus a 12 Kbyte code storage for the p-machine interpreter and BIOS. With later RAM solutions similar things can be accomplished with "standard" hardware. I forgot that the P-code card had 48K of storage. That is nice. I am pretty much walking that path but using the Cartridge memory space and the SAMS card. P-system was there first. 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted November 22, 2022 Share Posted November 22, 2022 (edited) The only difference is that the p-code card uses DSR memory space (4000H-5FFFH). Because it's a PEB card, of course. 12 K ROM squeezed into 8 K address space, by bank-switching the upper 4 K. The 48 K ROM-disk is really a GROM-disk, so it's accessed via the normal system of address and data ports. They just reside in DSR space instead in this case, and only if the p-code card is enabled. It's probably because TI choose to use GROM chips for this code repository that some people have got the idea that GPL is involved in the p-system in some way. Anyway, this means that the p-system doesn't use the cartridge space at all. If you have memory products that reside there, they can be used by your program without messing up anything else in the p-system. Edited November 22, 2022 by apersson850 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 3, 2023 Author Share Posted March 3, 2023 I have been "pushing a rope" for two years with Machine Forth Here is part of the intro paragraph for the manual I am writing for my latest flavour of machine Forth called ASMForthII. The author has spent a considerable amount of time adapting Chuck Moore’s Machine Forth concept to the TMS9900 and the results are good but not great. The hypothesis is that the lack-lustre performance of Machine Forth on the 9900 is due to the hardware mis-match between the 9900 and the F21 Forth CPU. Machine Forth is actually the Assembly language for Chuck’s F21. The conclusion is that like the original machine Forth, any machine Forth must leverage underlying hardware to be efficient. So all that to say that a "machine Forth" for TI-99 needs to use the registers and other features of the machine. The end result is a very low level compiler with a near one-to-one correspondence with the machine instructions plus hi-level branching and looping, the convenience of the HOST Forth system for console I/O and debugging and disk. It was @Reciprocating Bill and @lucien2 who showed the sieve benchmark in Assembler and with GCC that showed me what was really possible. Current Feature List Forth-like syntax (fetch/store architecture) (OK not a feature for everybody) 8 free registers, but you can also use the DATA stack for extra parameters or to push registers as needed Seamlessly interleaves with Forth RPN Assembly language mnemonics DATA stack with PUSH POP pseudo-instructions Return stack with RPUSH RPOP pseudo-instructions Structured branching and looping Fast Nest-able FOR NEXT down-counting loop Nest-able sub-routines with : ; Tail-call optimization under programmer control with -; operator Standard Forth CODE ;CODE words are directly callable from Forth. They can call ASMFORTH subroutines. Compiler is an E/A5 program Future Compile stand alone E/A5 programs Create a STDIO library for VDP and keyboard Create a DISKIO library Here is the Sieve program using Forth for data and console I/O and ASMForth for the computation as the language looks now. The ASMForth section is a translation of the business end of @Reciprocating Bill 's program. It compiles in 3.8 seconds and runs in 10 seconds. Edit: Removed one instruction and used #CMP. 9.8 seconds Spoiler \ SIEVE in ASMFORTH for Camel99 Forth Feb 2023 Brian Fox \ based on code by @Reciprocating Bill atariage.com \ Original notes by BIll. \ * SIEVE OF ERATOSTHENES ------------------------------------------ \ * WSM 4/2022 \ * TMS9900 assembly adapted from BYTE magazine 9/81 and 1/83 issues \ * 10 iterations 6.4 seconds on 16-bit console \ * ~10 seconds on stock console \ * ASMForth version runs in 10 seconds HOST DECIMAL 8190 CONSTANT SIZE HEX 2000 CONSTANT FLAGS \ array in Low RAM ASMFORTH : FILLW ( addr size cell --) \ nestable sub-routine R0 POP \ size R1 POP \ base of array BEGIN TOS R1 @+ ! \ write ones to FLAGS R0 2- NC UNTIL DROP ; HEX CODE DO-PRIME ( -- n) FLAGS # SIZE # 0101 # FILLW \ inits R0 OFF \ clear loop index R3 OFF \ 0 constant FLAGS R5 #! \ array base address 0 # \ counter on top of Forth stack SIZE # FOR R5 @+ R3 CMPB \ FLAGS C@+ byte-compared to R3 (ie: 0) <> IF \ not equal to zero ? R0 R1 ! \ I -> R1 R1 2* R1 3 #+ \ R1 2* 3+ R0 R2 ! \ I -> R2 ( R2 is K index) R1 R2 + \ PRIME K +! BEGIN R2 SIZE #CMP \ K SIZE compare < WHILE R3 FLAGS (R2) C! \ reset byte FLAGS(R2) R1 R2 + \ PRIME K +! REPEAT TOS 1+ \ increment count of primes THEN R0 1+ \ bump index register NEXT ;CODE HOST ( Switch back to Host Forth ) DECIMAL : PRIMES ( -- ) PAGE ." 10 Iterations" 10 0 DO DO-PRIME CR . ." primes" LOOP CR ." Done!" ; 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 3, 2023 Author Share Posted March 3, 2023 Effect of Tail Call Optimization in ASMForth. I was pretty simple to port this nesting demo that I found on the internet. The results are in the source code for other computers and TI-99 with Camel99 , TurboForth and ASMForth. I figured out how to do tail-call optimization in threaded Forth and it has a similar improvement to native code (BL) subroutines. ASMForth is pushing R11 onto the return on entry and popping it back before doing RT. \ For ASMForth II \ Amstrad 6128+ Z80A 4Mhz Uniforth Nesting 1Mil 3:26 \ ZX Spectrum 2+ FIG-Forth 1.1a Nesting 1Mil 3:15 \ C64 (normal) Forth64 Nesting 1Mil 6:20 \ PDP11 FIG-Forth 1.3 Nesting 1Mil 0:49 \ TI99 Camel99 Forth Nesting 1Mil 2:30.7 \ w/tail-call optimization 1.54.6 \ TurboForth 1.21 Nesting 1Mil 2:29 \ ASMForth II Nesting 1Mil 1:28.7 \ W/tail-call optimization 0:54.23 HOST INCLUDE DSK1.ELAPSE ASMFORTH : BOTTOM ; : 1st BOTTOM BOTTOM ; : 2nd 1st 1st ; : 3rd 2nd 2nd ; : 4th 3rd 3rd ; : 5th 4th 4th ; : 6th 5th 5th ; : 7th 6th 6th ; : 8th 7th 7th ; : 9th 8th 8th ; : 10th 9th 9th ; : 11th 10th 10th ; : 12th 11th 11th ; : 13th 12th 12th ; : 14th 13th 13th ; : 15th 14th 14th ; : 16th 15th 15th ; : 17th 16th 16th ; : 18th 17th 17th ; : 19th 18th 18th ; : 20th 19th 19th ; CODE RUN 20th ;CODE HOST : 1MILLION CR ." 1 million nest/unnest operations" RUN ; CR .( enter 1million or 32million ) \ ELAPSE 1MILLION \ recompile with tailcall optimization operator ( -; ) ASMFORTH : BOTTOM ; \ can't optimze this one because there is no function in it. : 1ST BOTTOM BOTTOM -; : 2ND 1ST 1ST -; : 3RD 2ND 2ND -; : 4TH 3RD 3RD -; : 5TH 4TH 4TH -; : 6TH 5TH 5TH -; : 7TH 6TH 6TH -; : 8TH 7TH 7TH -; : 9TH 8TH 8TH -; : 10TH 9TH 9TH -; : 11TH 10TH 10TH -; : 12TH 11TH 11TH -; : 13TH 12TH 12TH -; : 14TH 13TH 13TH -; : 15TH 14TH 14TH -; : 16TH 15TH 15TH -; : 17TH 16TH 16TH -; : 18TH 17TH 17TH -; : 19TH 18TH 18TH -; : 20TH 19TH 19TH -; CODE RUN 20TH ;CODE HOST : 1MILLION2 CR ." Optimized 1M nest/unnest operations" RUN ; 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 7, 2023 Author Share Posted March 7, 2023 I am writing a manual for ASMForth II and using code examples. I am borrowing ideas from Chuck Moore's CPU which had hardware registers called TOS (top of stack) and NOS (next on stack). In the case of the 9900 NOS is actually *SP. To indicate automatic popping NOS which is *SP+, I now use NOS^ . So this little example shows ASMForth versus 9900 Assembly Language. AsmForth Addition Example: 45 # 7 # \ push 2 numbers onto the DATA stack. NOS=45 TOS=7 NOS^ TOS + \ Add NOS to TOS. NOS^ pops the stack automatically This compiles the following TI Assembler code: DECT SP MOV R4,*SP LI R4,45 DECT SP MOV R4,*SP LI R4,7 A *SP+,R4 Writing the doc is helping me refine the "language". I should have something for public consumption by the weekend. I think it is viable contender to replace a Forth Assembler since that's really all it is, with some macros added. If you don't want to use the data stack you could use registers like this: 45 R0 #! 7 R1 #! R0 R1 + 5 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 8, 2023 Author Share Posted March 8, 2023 For anyone brave enough here is the repository for ASMForth II. GitHub - bfox9900/ASMFORTH: Experimental Assembler using Forth like syntax The manual is rough but if you have more questions you can read the source code or just ask me. The demo programs give enough examples for you to play around. Please tell me about bugs that you find and ANY suggestions you think would improve it. The bin folder has the two files that are needed to be loaded from E/A Option 5. It is just: Camel99 kernel + Tools + Assembler + ASMFORTH II rolled up into an E/A5 program. I will be making a front-end on the Assembler to allow vectoring ASMForth code to different memory which will allow making stand alone binaries as I was doing with Machine Forth which will pretty neat. I have to do a VDP/Console library which I will write in ASMForth rather than Assembler just because I now can. (I think) Of course you are free to make your own. There are macros for the string words COUNT and /STRING in ASMForth. Make yourself EMIT and TYPE to hit the screen. I gotta stop for awhile. 4 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 8, 2023 Author Share Posted March 8, 2023 I am adding a few extra Forth primitives to ASMForth and updating the manual. I am beginning to like the clarity of this notation to write stack operations. Here is 2DUP in ASMForth : 2DUP ( a b -- a b a b ) SP -4 #+ \ make room for 2 cells 4TH NOS ! TOS 3RD ! ; 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 8, 2023 Author Share Posted March 8, 2023 One minor annoyance with the 9900 is dealing with bytes and swapping them in register to do math operations. I have ignored that up until now thinking I would leave it to the programmer to do the swapping as necessary. I think I made ASMForth just a bit smarter with this code. It could also be going too far beyond "Assembler". The C! compiler tests for symbolic addressing in the destination argument. If it is, then we take the args and compile the MOVB instruction. If we have a register destination then it compiles the MOVB and follows it with an 8 bit SRL on that same register. This way if you are using bytes in the TOS register or on the Forth stack they work as expected. The Assembler MOVB, word is always there if you didn't want this. \ Add some smarts to C! : C! ( c dst -- ) DUP SYMBOL? IF MOVB, EXIT THEN \ do MOVB and get out \ dst must be a register DUP>R \ save copy of DST register MOVB, \ compile instruction R> 8 SRL, \ perform swap byte on that register ; Here is a little test. I think the logic is correct. Not good for every circumstance. I think I go back to MOVB = C! for now until I think of something better. \ BYTEOPS.FTH demo code \ The 9900 handles bytes in weird way. \ C! will shift the bytes if the destination argument is a register. HOST HEX CREATE X AABB , CREATE Y 0000 , ASMFORTH CODE TEST-C! ( -- n n n) DUP \ free up TOS TOS OFF \ clr tos X @@ TOS MOVB, \ assembler version DUP X @@ TOS C! \ C! has bit shifting X @@ Y @@ C! \ C! has no bit shifting mem 2 mem DUP Y @@ TOS C! \ with bit shifting ;CODE TEST-C! .S Y ? 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted March 9, 2023 Share Posted March 9, 2023 3 hours ago, TheBF said: One minor annoyance with the 9900 is dealing with bytes and swapping them in register to do math operations. I have ignored that up until now thinking I would leave it to the programmer to do the swapping as necessary. I think I made ASMForth just a bit smarter with this code. It could also be going too far beyond "Assembler". The C! compiler tests for symbolic addressing in the destination argument. If it is, then we take the args and compile the MOVB instruction. If we have a register destination then it compiles the MOVB and follows it with an 8 bit SRL on that same register. This way if you are using bytes in the TOS register or on the Forth stack they work as expected. The Assembler MOVB, word is always there if you didn't want this. \ Add some smarts to C! : C! ( c dst -- ) DUP SYMBOL? IF MOVB, EXIT THEN \ do MOVB and get out \ dst must be a register DUP>R \ save copy of DST register MOVB, \ compile instruction R> 8 SRL, \ perform swap byte on that register ; Here is a little test. I think the logic is correct. Not good for every circumstance. I think I go back to MOVB = C! for now until I think of something better. \ BYTEOPS.FTH demo code \ The 9900 handles bytes in weird way. \ C! will shift the bytes if the destination argument is a register. HOST HEX CREATE X AABB , CREATE Y 0000 , ASMFORTH CODE TEST-C! ( -- n n n) DUP \ free up TOS TOS OFF \ clr tos X @@ TOS MOVB, \ assembler version DUP X @@ TOS C! \ C! has bit shifting X @@ Y @@ C! \ C! has no bit shifting mem 2 mem DUP Y @@ TOS C! \ with bit shifting ;CODE TEST-C! .S Y ? I know I haven’t been paying enough attention to your ASMForth posts, but is this C! in the ASSEMBLER (or maybe ASMFORTH ) vocabulary (word list) and different from the C! in the FORTH vocabulary? ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 9, 2023 Author Share Posted March 9, 2023 Just now, Lee Stewart said: I know I haven’t been paying enough attention to your ASMForth posts, but is this C! in the ASSEMBLER (or maybe ASMFORTH ) vocabulary (word list) and different from the C! in the FORTH vocabulary? ...lee Yes. At the moment after my moment of "cleverness" failed. It is just a synonym for MOVB again. I am finding that getting too cute with syntax just adds needless complication. Basically ASMforth is just changing the nomenclature from TI mnemonics to Forth-like menomics and adding some handy "pseudo-instructions" ie: macros. I will push it and see how far it goes. But I think it has some merit. I am just making a post on making the FOR NEXT loop more versatile. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 9, 2023 Author Share Posted March 9, 2023 Freeing FOR NEXT I started on a VDP driver and learned that it was a mistake to force FOR NEXT to take it's argument from TOS. I kept the one I had but renamed it #FOR so it takes a literal number, like the other instructions with # sign. The magic of the Forth Assembler is that it uses the DATA stack to pass pieces of an assembly language instruction on the data stack as arguments. The instruction mnemonic is the actual "Assembler". It picks up the pieces on the DATA stack and combines them correctly to make an instruction. It's magic. So... Why couldn't FOR take any valid argument (Register/Addressing mode) for its loop limit? It can! Here is the new code for the FOR NEXT loop. : FOR ( arg --) RPUSH BEGIN ; : NEXT ( -- ) RP @ 1- NC UNTIL RDROP ; The code for RPUSH is: : RPUSH, ( src -- ) RP DECT, ( src) *RP MOV, ; You can see that it is an incomplete MOV instruction. The source argument is passed to it on the Forth data stack. So now this little FOR NEXT loop can do this. CODE FASTER \ 14.18 seconds 100 R0 #! \ loop limit in a register R0 FOR \ pass Register to FOR :-) R0 FOR R0 FOR NEXT NEXT NEXT ;CODE AND this: HOST VARIABLE X 100 X ! ASMFORTH CODE TESTARG X @@ FOR \ pass a variable to FOR 🙂 X @@ FOR X @@ FOR NEXT NEXT NEXT ;CODE 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 9, 2023 Author Share Posted March 9, 2023 (edited) Holy Crap. It works. VDP Driver in ASMForth and test program that reads the screen, clears the screen and puts it back. Edit: Had to add a couple of DROP instructions that I forgot I needed at the end of DELAY and VC@ Spoiler \ VDPLIB.FTH library for ASMForth II 2023 Mar Brian Fox HOST HEX 8800 EQU VDPRD 8802 EQU VDPSTS 8C00 EQU VDPWD 8C02 EQU VDPWA ASMFORTH \ VDPA! takes arg from TOS but leaves it on the stack : VDPA! ( Vaddr -- Vaddr) \ set vdp address (read mode) R1 STWP, 0 LIMI, 9 (R1) VDPWA @@ C! \ write odd byte from TOS (ie: R4) TOS VDPWA @@ C! \ MOV writes the even byte to the port address ; : VC@ ( addr -- c) VDPA! TOS OFF VDPRD @@ 9 (R1) C! \ read data into odd byte of R4 DROP ; : VC! ( c Vaddr -- ) TOS 4000 #OR VDPA! 9 (R1) VDPWD @@ C! \ Odd byte R4, write to screen DROP \ refill TOS ; HEX \ * VDP write to register. Kept the TI name : VWTR ( c reg -- ) \ Usage: 5 7 VWTR TOS >< NOS^ TOS + \ combine 2 bytes to one cell TOS 8000 #OR VDPA! DROP ; : VFILL ( Vaddr cnt char -- ) TOS R5 ! \ R5 = CHAR R5 >< R0 POP \ cnt to R0 TOS POP \ Vaddr to TOS TOS 4000 #OR VDPA! VDPWD R3 #! R0 FOR R5 *R3 C! NEXT DROP ; : VREAD ( Vaddr addr n --) TOS R0 ! R5 POP TOS POP VDPA! VDPRD R3 #! R0 FOR *R3 *R5+ C! NEXT DROP ; : VWRITE ( addr Vaddr len -- ) TOS R0 ! TOS POP \ Vaddr in TOS TOS 4000 #OR VDPA! \ set write address TOS POP \ pop RAM addr into TOS VDPWD R3 #! R0 FOR *TOS+ *R3 C! NEXT DROP ; \ test code HEX \ high level words : READSCR 0 # 2000 # 3C0 # VREAD ; : WRITESCR 2000 # 0 # 3C0 # VWRITE ; : CLS 0 # 3C0 # BL # VFILL ; : DELAY TOS FOR NEXT DROP ; CODE TEST-R/W READSCR CLS FFFF # DELAY WRITESCR ;CODE ASMForth-VDP-drvr.mp4 Edited March 9, 2023 by TheBF Updated code 4 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 9, 2023 Author Share Posted March 9, 2023 (edited) I want the ability to create standalone E/A5 programs with this ASMForth tool. In that past I modified the entire Assembler program which is dumb. I didn't trust myself to get it correct any other way. So I thought it was time to make something smarter. This code is a preamble to control the actions of these dictionary management words: HERE C, ALLOT CREATE I discovered I had to fix CODE as well. With this little preamble and the magic of DEFER words we can compile the stock, unaltered Assembler in the dictionary with the >RAM directive. Then with >HEAP we alter the action of critical parts of the Assembler so code goes into the dictionary or the HEAP memory. Much clearer and more versatile. Edit: added the use of SYNONYM Spoiler \ MEMVECTOR.FTH Brian Fox Mar 9 2023 \ redirect Assembler code to different memory locations INCLUDE DSK1.TOOLS \ debugging only INCLUDE DSK1.DEFER INCLUDE DSK1.SYNONYM \ HEAP memory managers : HEAP ( -- addr) H @ ; : HALLOT ( n -- ) H +! ; : HC, ( n -- ) HEAP ! 1 HALLOT ; : H, ( n -- ) HEAP ! 2 HALLOT ; : HCREATE HEAP CONSTANT ; : HCODE HEADER HEAP , !CSP ; \ alias the RAM managers to avoid name conflicts SYNONYM <HERE> HERE SYNONYM <ALLOT> ALLOT SYNONYM <C,> C, SYNONYM <,> , SYNONYM <CREATE> CREATE SYNONYM <CODE> CODE DEFER HERE DEFER ALLOT DEFER C, DEFER , DEFER CREATE DEFER CODE : >RAM ['] <HERE> IS HERE ['] <ALLOT> IS ALLOT ['] <C,> IS C, ['] <,> IS , ['] <CREATE> IS CREATE ['] <CODE> IS CODE ; IMMEDIATE : >HEAP ['] HEAP IS HERE ['] HALLOT IS ALLOT ['] HC, IS C, ['] H, IS , ['] HCREATE IS CREATE ['] HCODE IS CODE ; IMMEDIATE \ Test >RAM INCLUDE DSK1.ASM9900 >HEAP CODE HEAPTEST TOS INCT, NEXT, ENDCODE >RAM CODE RAMTEST TOS INCT, NEXT, ENDCODE Edited March 9, 2023 by TheBF Updated code 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted October 3, 2023 Author Share Posted October 3, 2023 2nd Pass Optimizer for MachForth (I jumped back to my first native code compiler) One the performance enhancing tricks of making a Forth system is to cache the top of the DATA stack (TOS) in a machine register. This creates an "accumulator" register. Some operations are slower but on balance this can improve system performance by about 10% over using a memory only data stack. Math operations seem to benefit. When compiling Forth code to inline machine code, for such a system, one of the things that happens is there can be repeating patterns of DROP and DUP in sequence in the code. This sequence serves no purpose but it is there because of how various primitive Forth instructions need to refill the top of stack register from the memory stack (DROP operation) or they need to make room in the TOS register for a new value with the DUP operation. Let's use the example below. Store a number in a variable followed by a literal number going onto the stack to be used by the next code in the sequence. BEEF X ! DEAD Y ! The Store (!) operator generates this code: : ! ( n TOSaddr --) *SP+ *TOS MOV, *SP+ TOS MOV, ( DROP operation) ; Next the literal DEAD must go onto the stack. The code to do that must perform a DUP , meaning it saves the value in the TOS register onto the memory stack. : DOLITERAL SP DECT, ( these two intructions are DUP ) TOS *SP MOV, TOS SWAP LI, ( this loads the tos register with the literal in the source code) ; The end result is that we refill the TOS register from memory and then immediately put it back. I tried a number of ways to detect this situation while compiling code but there seemed to be a SNAFU in looping code where the DROP and the DUP were separated at opposite ends of the loop. The logic to detect that was more than my brain could handle. Suffice to say I think doing a 2nd pass is a more reliable way to get the job done and simpler to understand. The problem code is a simple set of 3 memory words. The solution is find them, remove them and move on. And the benefits really add up because you are removing 3 instructions per occurrence that you find. Here is what I came up with. Spoiler \ optimizer.fth for MachForth \ Stack machine primitives can create overhead by needlessly \ DROPing then DUPing the top of stack. \ This program scans for the troublesome code sequence \ and removes the 6 byte sequence wherever it is found. HOST HEX 2000 CONSTANT CODEIMAGE \ search for u in memory block (addr,len) \ return the new address and len or 0 if not found. : SCANW ( addr len u -- addr' len'|0) >R \ remember char BEGIN DUP WHILE ( len<>0) OVER @ R@ <> WHILE ( R@ <> u) 2 /STRING \ advance to next cell address REPEAT THEN R> DROP \ 32 bytes ; : D= ( d d -- ?) ROT = -ROT = AND ; : 2CONSTANT CREATE SWAP , , DOES> 2@ ; HEX C136 CONSTANT 'DROP' 0646 C584 2CONSTANT 'DUP' : FINDDROP ( addr len -- addr' len' ?) 'DROP' SCANW DUP 0> ; : DROP/DUP? ( addr len -- addr' len' ?) FINDDROP >R OVER CELL+ 2@ 'DUP' D= R> AND ; \ EXTRACT moves the binary program in memory \ to remove the DROP/DUP sequence : EXTRACT ( addr size -- ) >R \ save the size DUP 3 CELLS + \ compute new src address SWAP \ addr is the dst address R> \ compute new dst address ( src dst size ) MOVE \ move the code 3 CELLS NEGATE TDP +! \ adjust target program end pointer ; : PROGRAM ( -- addr len ) CODEIMAGE TDP @ OVER - ; : OPTIMIZER BEGIN PROGRAM DROP/DUP? WHILE EXTRACT REPEAT ; Now after compiling a machine forth program you just invoke the optimizer before saving the program image. Seems to work as planned. I will need to do more testing. 3 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 9 Author Share Posted January 9 I don't know where else to put this so it's going here. I am watching the guys configure GCC to our favorite machine and it has me thinking again about generating efficient 9900 code from Forth (like) syntax. I heard a talk by Chuck Moore on how he is experimenting using more registers in his current Forth systems. I started down that path with ASMForth because registers are how you get performance from a register machine. In the process I noticed that when you use the raw instructions as Forth primitives you have so much freedom. Assuming the top of stack is cached in a register we can do this: ! becomes MOV +! becomes Add etc. Something cool then happens in that you can convert some stack operations into simpler machine code. So where DUP in this architecture is: SP INCT, TOS *SP MOV, DUP can be replaced with TOS TOS SWAP means we reverse the order of the arguments that we feed to the instruction. (I think this will work. It might need a flag somewhere to signal we are swapped) TOS *SP Where OVER is something like: (notice we DUP first) SP INCT, TOS *SP MOV, 2 (SP) TOS MOV, OVER can be replaced by *SP TOS since we just want to use the 2nd stack item but not destroy it. The phrase "OVER =" becomes: *SP TOS CMP, This is a far as I got in this line of thought. I think ROT might not be so easy to convert but I will explore this as time permits. And all of this begs the question of can we just allocated 3 registers for the top of the stack and use logic to remember what item is in what register? That makes my head hurt at this stage. 3 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted January 10 Share Posted January 10 This merits another OMG. Or three! I'm trying to understand. You are converting OVER to a setup for the next word which must be an instruction. Got to get my brain around how it's all just assembler directives, not threaded tokens. Then is the word OVER not available for immediate use? What then is the convenient way to get the runtime effect of OVER: TOS *SP+ MOV, -4 (SP) MOV, ( assuming you have stack growing upwards in MachForth?) Machine Forth is still very exciting, and hard to understand. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 10 Author Share Posted January 10 Correct. Machine Forth is a compiler like C or Pascal. However since it is compiling from the Forth console you can run the code afterwards like a Forth word. All you need is Forth way to jump into the code (BL *TOS) and word to return to Forth. CODE RUN ( entry_address -- ) *TOS BL, NEXT, ENDCODE After seeing Bill Sullivan's result coding the sieve in Assembler I realized you are never going to get full performance from the 9900 with stack operations. ( My ASMForth sieve based on his code with a few tweaks is actually a hair faster. ) ASMForth is me taking the Machine Forth idea that was for Chuck's CPUs and mutating it into something with a closer fit to 9900 but also using Forth-like syntax. ASMForthII uses registers like Assembler but you also have PUSH, POP, RPUSH and RPOP macros so you have the stack. Colon definitions automatically push R11 on entry and POP R11 before RT. The docs is here https://docs.google.com/document/d/1h-qVQeD6_b58DywrzGZphmSSxUNMKBbY7QCByMFKFOM/edit?usp=sharing Not the best read. Here is an ugly HELLO World. ASMForth is a work in progress. I want to get back closer to normal Forth syntax in future if possible. \ tiny hello world in ASMForth II \ Translated from hello.c by Tursi for comparison ASMFORTH HEX 8C02 CONSTANT VDPWA \ Write Address port 8C00 CONSTANT VDPWD \ Write Data port \ define the string CREATE TXT S" Hello World!" S, : VDPADDR TOS >< \ swap bytes TOS VDPWA C! \ VDP address LSB character store TOS >< \ swap bytes TOS VDPWA C! \ VDP address MSB + "write" bit character store DROP ; MACRO: EMIT+ ( addr -- addr++) VDPWD C! ;MACRO CODE MAIN 0 LIMI, \ disable interrupts \ set the VDP address to >0000 with write bit set 4000 # VDPADDR TXT # *TOS R0 C! \ byte count -> R0 R0 8 RSHIFT R0 1- \ for loop needs 1 less TOS 1+ \ skip past byte count R0 FOR \ get argument from R0 TOS @+ EMIT+ \ @+ is indirect auto-inc. NEXT DROP NEXT, ENDCODE \ usage: PAGE MAIN CR Here is the code generated DF00 0300 limi >0000 (24) DF04 0646 dect R6 (14) DF06 C584 mov R4,*R6 (30) DF08 0204 li R4,>4000 (20) DF0C 06A0 bl @>dece (32) DECE 0647 dect R7 (14) DED0 C5CB mov R11,*R7 RPush R11 DED2 06C4 swpb R4 (14) DED4 D804 movb R4,@>8c02 (38) DED8 06C4 swpb R4 (14) DEDA D804 movb R4,@>8c02 (38) DEDE C136 mov *R6+,R4 (30) > DEE0 C2F7 mov *R7+,R11 DEE2 045B b *R11 DF10 0646 dect R6 (14) DF12 C584 mov R4,*R6 (30) DF14 0204 li R4,>deb2 (20) DF18 D014 movb *R4,R0 (26) DF1A 0880 sra R0,8 (32) DF1C 0600 dec R0 (14) DF1E 0584 inc R4 (14) DF20 0647 dect R7 > DF22 C5CB mov R11,*R7 Rpush loop counter R11 DF24 C2C0 mov R0,R11 Set R11 loop counter for FOR NEXT DF26 D834 movb *R4+,@>8c00 DF2A 060B dec R11 R11 is the return stack CACHE DF2C 18FC joc >df26 DF2E C2F7 mov *R7+,R11 DF30 C136 mov *R6+,R4 DROP refills TOS DF32 045A b *R10 return to Forth ASMForth_helloworld.mp4 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.