Jump to content

TheBF

+AtariAge Subscriber
  • Content Count

    2,240
  • Joined

  • Last visited

Community Reputation

2,128 Excellent

About TheBF

  • Rank
    River Patroller

Profile Information

  • Gender
    Male
  • Location
    The Great White North
  • Interests
    Music/music production, The Forth programming language and implementations.
  • Currently Playing
    Guitar
  • Playing Next
    Trumpet ;)

Recent Profile Visitors

6,124 profile views
  1. I didn't address this. This inliner is actually creating headless "code" words in heap memory. So no memory is used for the name, count byte, Link and precedence field, which can really add up on small systems. The code created is like :NONAME words in that there is just an XT. That XT points to the next address which contains the beginning of the machine code. At the top of the loop we grab the next free address in heap to hold the XT (code field address as it is traditionally called) , point it to the next cell which is where the code will be compiled. At the bottom of the loop we just compile that XT into the word we are compiling with nothing more than comma "," after terminating the new code with NEXT, Pretty neat if I do say so myself.
  2. You are correct. This exercise began as a way to remove NEXT from in between CODE words. Then I realized data words could be sped up and now I realized that since I had compile native code that loops could also be changed to run native code. It is possible to take a Forth word apart and if you find another Forth word recursively drill down by calling "optimize" until you find the 1st occurrence of a code word and compile that code. Then return back up one level, keep doing that until you return to the top level and are at the ';' of the top level Forth word. (makes my head hurt) In the process you would recompile end to end ALL of the native code that runs when that word executes. It would be huge piece of code. VFX Forth is a native code compiler and it has software control to tell it how many levels to drill down when it optimizes ie: expands the code into a long routine with no sub-routine calls. The control names are: no inlining normal inlining agressive inlining absurd inlining Where absurd inlining unravels everything right down to the most primitive piece of code. It's not recommended. You can learn more here: https://www.mpeforth.com/vfxcom.htm
  3. Thanks. It's been a long time coming as I get my head back into these realms. It starts to give diminishing returns this way if I try to go much farther with optimizing. Technically I am getting close to the realm of GForth where the compiler does this kind of thing by default. I am tempted to try making that work but it gets pretty complex for our little 99. I remember years back getting a copy of GForth and compiling it for DOS. I used a commercial system called HsForth at the time which I paid $450 US dollars for in 1991 and purchased updates over the decade. It was pretty fast to me. I was shocked to discover that GForth-fast.exe ran significantly faster than HsForth. Never looked into it because my career was moving out of technology at that time. Now I know why it was faster. I guess that's progress. To his credit however Jim Kalihan provided an optimizer with HsForth that worked like INLINE[ ] but it was OPT" ". I was a huge addition to the compiler and it was a bit buggy on some code, but it gave about a 2X speedup on the code that I tested. He went down the rabbit-hole of trying make it optimize any Forth code. I have a native code Machine Forth cross-compiler that worked before I got distracted by all this VDP stuff, that I must get back to. While trying to fit 9900 to Chucks Machine Forth ideas I suddenly realized that I could make a "Forth Assembly language" for the 9900. A large number of instructions overlap so well 1+ 2+ 1- 2- "@" (indirect addressing) for "fetch. etc. I think MOV would do double duty as store and fetch but that's not clear yet. This would allow writing Forth syntax but the code would be real close to the metal 9900 code. The other thing I am now acutely aware of is how bloated sub-routines are on this machine. Unless your sub-routine is 4 instructions or more the compiler should just inline that darn code. So that means most of the Forth primitives become inline code. Fast but not very Forthy. Enough of my dreaming for today.
  4. I took a look my inline optimizer to see if it was possible optimize Forth loop structures as code. While I was at it, things were getting a little complicated so I reduced the business end of the process, copying kernel code snippets, into one word nice word call CODE, . I can now optimize DO LOOP , BEGIN UNTIL and BEGIN AGAIN with this version. I don't think I will go any further. In theory one could build a recursive descent compiler over the Forth code but I think that's above my pay grade. In order to keep the optimized loop info from getting mixed up with the Forth DATA stack, I make a little secondary LIFO called a control stack. This made it much simpler and I can do nested loops without losing my mind managing mix data on the Forth data stack. The video shows the difference in speed for these test 64K iteration loops: : COUNTDN FFFF BEGIN 1- DUP 0= UNTIL DROP ; : OPTCOUNTDN FFFF INLINE[ BEGIN 1- DUP 0= UNTIL DROP ] ; : FORTHLOOP FFFF 0 DO LOOP ; : OPTLOOP INLINE[ FFFF 0 DO LOOP ] ; optimized-loops.mp4
  5. My tax dollars at work. Courteousy of the National Flem Film Board and the CBC (Canadian Broadcorping Castration)
  6. I understand that vitamin D seems to help with this disease and it's important for us people living in the North with little sunshine. Be well.
  7. Wow that's is amazing. Is the extra 1/3 in the BASIC program mostly the labels and comments that come along for the ride or is the byte code implementation not very efficient? I have noticed this in my Forth compilers. The logic for building interpreters in the old days was to sacrifice speed for reduced code size. But the 9900 has so many higher level functions that you don't get this advantage until you make some significantly complex functions in your program and then re-use them a great deal. At the low level functions the 9900 is almost (not quite) 1:1 with the Forth VM.
  8. How much bigger is compiled BASIC than byte coded BASIC?
  9. For what it's worth... Camel99 Forth V2.66 has SAMS support including SAM pages as virtual memory residing in Low RAM. ED99 is a SAMS based editor.
  10. OK I wondered how much scrolling was involved. I saw some code that did 2000+ digits so then It would become a problem on TI-99. Forth has had an assembler since it's inception. It's typically 150 to 200 lines of Forth to an write an RPN assembler. And Forth's compiling means Macros come along for free. It's pretty cool to be able to test code snippets interactively at the command line before committing
  11. Update on the scroll. I re-wrote your code in Forth Assembler and compared it to my scroll that uses Forth for looping (quite slow) and ASM routines to r/w VDP RAM but I use a 2 line buffer. The video shows the results on Classic99 using a coarse elapsed timer. Screen capture shows the results. SCROLL-SLOW is almost 2X slower than mine which uses a lot of Forth. NEEDS MOV FROM DSK1.ASM9900 \ NEEDS .S FROM DSK1.TOOLS DECIMAL 24 CONSTANT ROWS 40 CONSTANT COLS \ text mode HEX 8800 CONSTANT VDPRD 8802 CONSTANT VDPSTS 8C00 CONSTANT VDPWD 8C02 CONSTANT VDPWA CODE SCROLL-SLOW ( -- ) R5 4000 LI, R3 COLS LI, BEGIN, R3 SWPB, \ !loop: 0 LIMI, R3 VDPWA @@ MOVB, R3 SWPB, R3 VDPWA @@ MOVB, VDPRD @@ R0 MOVB, R5 SWPB, R5 VDPWA @@ MOVB, R5 SWPB, R5 VDPWA @@ MOVB, R0 VDPWD @@ MOVB, 2 LIMI, R3 INC, R5 INC, R3 COLS ROWS * CI, EQ UNTIL, \ jne -!loop NEXT, ENDCODE
  12. I am not the world's expert on Assembly Language but I think the scroll routine is sub-optimal and in my experience this really affects long benchmark timings on the TI-99 as the scroll is a significant part of the time. Looks like you are moving one byte at a time? Typically it is better to read at least one line into a RAM buffer because the VDP can auto-increment the address for us. Then write the RAM buffer back to VDP, again leveraging the VDP auto-increment feature. In the extreme the optimal is read lines 2 to 24 to RAM in a big buffer and then write lines 2 to 24 to lines 1 to 23. One can also improve the address selection code, as was taught to me by others here, by using the fact that registers are in memory. This lets you use the odd address of a register as a source rather than using SWPB twice. li 5,>4000 *scroll li 3,cols !loop: swpb 3 limi 0 movb 3,@vdpwa swpb 3 movb 3,@vdpwa movb @vdprd,0 swpb 5 movb 5,@vdpwa swpb 5 movb 5,@vdpwa movb 0,@vdpwd limi 2 inc 3 inc 5 ci 3,cols*rows jne -!loop
  13. IMHO the Classic99 emulator has faster file i/o but seems very close to real hardware in instruction execution time. When do we get to see the code? :)
  14. This will be the last one. I promise. I woke up this morning realizing (maybe just a little obsessive...) that erasing the character patterns for characters A..G was just a easy with the VFILL operation. Then you can change the patterns of those letters and just print out the letters. DUH! HEX : RUN FF A08 40 2DUP 0 VFILL BOUNDS DO DUP I VC! 9 +LOOP BEGIN ." ABCDEFEDCBA" AGAIN ;
  15. Over in another topic Mr @vol was talking about making the interrupt poll the 9901 timer. I thought that was just a splendid idea so here is a version in Forth. Since Camel99 starts the timer when it boots I just needed to write the interrupt handler to read the timer. I think this is correct, but man counting at 21.3 uS per tick is REALLY fast! \ Interrupt polled 9901 timer NEEDS MOV, FROM DSK1.LOWTOOLS NEEDS INSTALL FROM DSK1.ISRSUPPORT DECIMAL \ ISR workspace registers \ R0,R1 32 bit timer variable \ R2, difference register \ R3 temp \ R4 previous time reading CREATE IWKSP 16 CELLS ALLOT IWKSP 16 CELLS 0 FILL CODE READ9901 0 LIMI, IWKSP LWPI, R2 CLR, R12 2 LI, \ load 9901 Timer CRU address -1 SBO, \ SET bit 0 TO 1, Enter timer mode R2 14 STCR, \ READ TIMER (14 bits) -1 SBZ, \ RESET bit 1, exit timer mode 2 LIMI, R4 R3 MOV, \ old reading -> temp R2 R4 MOV, \ save this read for next time R3 R2 SUB, \ compute ticks since last read R2 ABS, R2 R1 ADD, \ add ticks to timer registers OC IF, R0 INC, \ deal with overflow to make 32bit value ENDIF, HEX 83E0 LWPI, \ return to GPL workspace RT, ENDCODE REMOVE-TOOLS : T ( -- ) IWKSP [email protected] ; \ read the workspace as memory : COLD 0 INSTALL COLD ; \ disable interrupt before leaving Forth ISR' READ9901 INSTALL : TEST PAGE BEGIN 10 10 AT-XY T DU. ?TERMINAL UNTIL ; 9901 ISR TIMER.mp4
×
×
  • Create New...