Jump to content

TheBF

+AtariAge Subscriber
  • Posts

    4,470
  • Joined

  • Last visited

Posts posted by TheBF

  1. Just wondering people do in other parts of atariage and I saw your post.

     

    I could be that TRS80 expects hardware handshake to be respected by the sending side so transmission stops when TRS80 is putting data into memory.

     

    You might make some progress by setting up an RS232 on a PC and send text files to the TRS80 with TeraTerm and change the handshake options

    to see what your Tandy software responds to. It sounds like the software is not using an interrupt on the receive side but rather is polling RS232 with software.

    That is always a problem for high-speed receive. 

     

    I have spent way to many hours debugging RS232 30 years ago. :) 

     

    Not very helpful but it's all I got. 

    Best of luck. 

     

     

  2. 6 hours ago, Vorticon said:

    Dang it! That @ symbol always gets me.

    <sidebar>

       @Vorticon I don't know if helps, but since you have written a considerable number of lines of Forth...

       I think of @ in Assembler like the fetch (@) operator in Forth when it's applied to the <src> operand.

       The analogy doesn't quite work as well when it's the <dst> operand but it might give you a memory aid.

    </sidebar> 

     

    • Like 2
  3. 1 hour ago, Lee Stewart said:

    Forgive me for piling on, but because we ALC programmers know we need to preserve R11 when we start/continue a BL-RT cascade, we usually put the “save return” at a the routine’s beginning with the “restore return” at the end just before RT, I would push the link (PUSHL) at the beginning and return after popping the link  (RTPL) at the end. Here are the macros for xas99.py:

       .defm PUSHL
           DECT RSP
           MOV  R11,*RSP
       .endm
    
       .defm RTPL
           MOV  *RSP+,R11
           B    *R11
       .endm

     

     

     

    ...lee

     

    A macro that I find handy lets me PUSH or POP any register to/from the stack.

    This can be useful when you just need  few more free registers temporarily and don't want to use a separate workspace. 

    It was a more common style of programming on Intel boxes where we only had AX BX CX DX as general purpose registers

    ( and even CX and DX had special purposes) but it's still a handy tool even on 9900 once you have implemented a stack IMHO.  

     

    So would this be correct to make generic PUSH and POP macros in xas99?
     

    .defm PUSH
           DECT RSP
           MOV #1,*RSP
    .endm
    
    
    .defm POP
           MOV  *RSP+,#1
    .endm

     

  4. 2 minutes ago, Lee Stewart said:

     

    Of course, all of the Forths do this, including yours (CAMEL99 Forth) and mine (fbForth). I do it locally (a second return stack) in the Floating Point Library because I am using a different workspace.

     

    ...lee

    Indeed. Stacks are amazingly handy once you have them. (case in point: FbForth now has three!)

    I just noticed there still seems to be R11 -> label stuff out there. 

    Perhaps because the E/A manual does it that way? 

     

    Maybe seeing how simple it is will allow new entrants to try a stack. 

    And with the fancy new tools we have today of course a PUSH and POP macro for any register are just a few lines away. :)

     

    Perhaps someone could demonstrate such macros with Ralph's Assembler? 

     

  5. 39 minutes ago, apersson850 said:

    A quickie if you only need to go two levels down is to save the return address in another register. Then you wouldn't save R11 in SUB1, but save it in R12 in SUB2. Then SUB2 can end just by B *R12.

    With my brain this would have to go under the "Don't optimize too soon" file.

    • Like 1
  6. I know this is old hat for the pros out there but for people new to Assembler this might be new.

     

    An alternative is to make a return stack something like this.

    The size is 2 bytes smaller in the subroutines and they all share the same memory locations for saving R11.

    And any subroutine can call any subroutine.  With 16 bytes you can BL 8 levels deep. 

     

    Edit: Corrected per @apersson850  catch. 

     

    * return stack for TI99 Assembly language
              
              DEF START 
    
    RP        EQU  7
    
    RSTACK    BES  16
    
    SUB1    DECT RP 
            MOV R11,*RP 
    *          ....     code goes here 
            MOV *RP+,R11
            B *R11  
    
    SUB2    DECT RP 
            MOV R11,*RP 
    
            BL @SUB1           subroutine calling a subroutine 
    
            MOV *RP+,R11
            B *R11  
    
    START   LI R7,RSTACK        run this in your startup code  
          
            BL @SUB1        
            BL @SUB2            it just works  
            END 
             

     

    • Like 1
    • Thanks 1
  7. Question: How many Assembly Language coders are using a return stack for nesting sub-routines?

     

    I have been looking at some code and it uses a bunch of memory locations to save R11 when entering  a sub-routine and to restore R11 on exit.

    This uses 4 bytes on entry and 6 bytes on return 

    SAVER1    DATA  0
    SAVER2    DATA  0
    SAVER3    DATA  0 
    
    MYCODE    MOV R11,@SAVER1      4 BYTES
    *             ...
      
              MOV @SAVER1,R11      4 Bytes 
              B *R11               2 bytes 

     

  8. What the heck. On the non-zero probability that someone cares here's my channel :)

     

    This video shows how to link Assembler object files into memory and run them from Camel99 Forth. 

    (Now that I see it after all this time, I should do a follow up on saving Forth executable programs that call object code)

     

     

     

    • Like 3
  9. 3 hours ago, Vorticon said:

    So here's my issue. If I am to LWPI >83E0, I need to save the pointer to my current workspace, something like STWP R1. All good so far. Now how do I get back to the current workspace after the scan routine is run? LWPI requires an immediate address which is in R1, but it won't accept indirect addressing like LWPI *R1 or if I store R1 say at SAVWP, LWPI @SAVWP does not work either...

    Is the PASCAL workspace in a fixed location?

    If so you just restored it manually with a followup LWPI after you called the ROM routine.

     

    KSCAN  LWPI >83E0           can't change WS with BLWP as R13-R15 are in use
           MOV  R11,@OLDR11     save GPL R11
           BL   @>000E          call keyboard scanning routine
           MOV  @OLDR11,R11     restore GPL R11
           LWPI <PASCWKSP>

     

  10. Heresy of Heresies :) 

     

    Over on Reddit Forth there was a discussion on local variables.

    Stephen Pelc of VFX Forth said that their experiments show as much as 50% slowdown using locals instead of the data stack.

    One of the posts was made by a Forth implementer (Zeptoforth) who said that he has switched to using locals a lot and it has NO effect on the speed of his programs. 

    His thinking is that locals would only be much slower on a compiler that converts stack data into register assignments but on regular Forth compilers it is neutral.

     

    That made me wonder...

    I had one version of "cheap" locals that I knew was not optimal and one that used 9900 index addressing into the return stack. 

    The downsize of the indexing version was you needed to create two names: 1 to fetch and 1 to store.

     

    In my past tests the Forth version of BENCHIE using a fast VALUE ran in 24.3 seconds. 

    The non-optimal locals version ran in 48 seconds... but I had never done the test with the better version. 

     

    Turns out that guy on Reddit was correct. In fact the locals version ran a bit faster on Camel99 Forth. :( 

    I think this is because of the 9900 property that less instructions is almost always faster.

     

    In this case local fetch is:

    MOV n(RP),TOS 

     

    Store to local is 

    MOV TOS,n(RP)

    That's about as good as it can get.

     

    So it makes me think I could make some kind of defining word that creates a double CFA word.

    By default the local does a fetch from return stack to the data stack.

    Then make a word like TO for values that compiles the store code address when we assign to a local.

     

    Here is the experiment code. 

     

    Spoiler
    \ Benchie.fth from the internet
    
    \ tForth (20 MHz T8): 196 bytes 0.198 sec
    \ iForth (33 MHz '386): 175 bytes 0.115 sec
    \ iForth (40 MHz '486DLC): 172 bytes 0.0588 sec
    \ iForth (66 MHz '486): 172 bytes 0.0323 sec
    \ RTX2000: 89 bytes 0.098 sec (no Headers)
    \ HSF2000 (1.6GHz AMD Sempron) ?? bytes  0.22 secs
    \ 8051 ANS Forth (12 MHz 80C535): 126 bytes 15,8 sec (met uservariabelen)
    \ HSF2000 2014 with a 2.1 Ghz Intel  0.05 seconds.
    \         increased loop size X10   0.16
    
    \ CAMEL99 v2.7    
    \ W/FAST VALUES      24.21
    \ W/locals           24.08 
    \ TurboForth V1.2.1  24.6  (for reference)
    
    NEEDS ELAPSE FROM DSK1.ELAPSE
    NEEDS DUMP   FROM DSK1.TOOLS
    NEEDS VALUE  FROM DSK1.VALUES 
    
    
    HERE
    HEX
    CODE LOCALS ( n --) \ build a stack frame n cells deep
    \ *pushes the original RP onto top of rstack for fast collapse
    \ RP R0 MOV, TOS 1 SLA, TOS RP SUB,   R0 RPUSH,     TOS POP,
      C007 ,    0A14 ,   61C4 ,    0647 , C5C0 ,  C136 ,  NEXT,  
    ENDCODE
    
    CODE /LOCALS  ( -- ) \ collapse stack frame
        C1D7 , NEXT, \ *RP RP MOV, NEXT,
    ENDCODE
    
    \ Local variable compilers make named code words
    : GETTER  ( n --) \ create name that returns a contents of a local
    \           TOS PUSH,  ( n) 2* (RP) TOS MOV,  NEXT,  ;
      CODE     0646 , C584 , C127 , CELLS ,       NEXT,  ;
    
    : SETTER ( n --) \ create name that sets contents of a local
    \      TOS SWAP CELLS (RP) MOV, TOS POP, 
      CODE    C9C4 ,   CELLS ,    C136 ,  NEXT,  ;
    
    : ADDER  ( n -- ) \ defines a local for +! operation
    \      TOS SWAP CELLS (RP) ADD, TOS POP, 
      CODE    A9C4 ,   CELLS ,    C136 ,   NEXT,  ;
    
    \ defines a "setter" and a "getter"   
    : LOCAL:  ( n ) DUP GETTER  SETTER  ;
    
    
    \ conventional BENCHIE 
    HEX
    100 CONSTANT MASK
      5 CONSTANT FIVE
        VALUE BVAR
    : BENCHIE
            MASK 0
            DO       \ locals work inside do loop   
                1
                BEGIN
                  DUP SWAP DUP ROT DROP 1 AND
                  IF FIVE +
                  ELSE 1-
                  THEN TO BVAR
                  BVAR DUP MASK AND
                UNTIL
                DROP
            LOOP
    ;  \ 24.21 seconds 
    
    \ BENCHIE with locals 
    \ create two names. one to fetch, one to store 
    \        fetch   store  
    1 LOCAL: BVAR    BVAR! 
    2 LOCAL: NDX     NDX! 
    
    : BENCHIE2 
            1 LOCALS \ define outside do loop  
            MASK 0
            DO       \ locals work inside do loop   
                1
                BEGIN
                  DUP SWAP DUP ROT DROP 1 AND
                  IF FIVE +
                  ELSE 1-
                  THEN BVAR!
                  BVAR DUP MASK AND
                UNTIL
                DROP
            LOOP
            /LOCALS
    ;  \ 24.08 seconds 
    

     

     

     

    • Like 2
    • Thanks 1
  11. 19 hours ago, TheMole said:

    Might the idea come from the common assertion that reading from the VDP is incredibly slow (compared to writing to VRAM)? Is that not true either then?

    Unfortunately VDP RAM speed is not that different than using expansion RAM.

    I wrote some tests for my own understanding where I did string manipulation in VDP vs RAM using the same kind of Forth code for both.

    The difference in speed was only about 12% slower in VDP RAM vs Expansion RAM if I recall correctly.

    The difference would be bigger using only Assembler code. 

  12. To give you some size of code perspective Mike on what Lee described there is an "entry" routine and an "EXIT" in every Forth colon definition.

     

    In memory it looks like this:

     

    <enter> <cfa> <cfa> .... <cfa>  <exit> 

     

    It's not important here to understand the dirty details but you can see how much code runs for each Forth word. 

    (The ALC is my dressed up Forth Assembler to help my feeble brain so it has some "pseudo-instructions" like POP etc)

     

    The <enter> above is the address of a short piece of code but it still takes some time

    l: _enter   IP RPUSH,        \ push IP register onto the return stack
                W IP MOV,        \ move PFA into Forth IP register
               _next JMP,
    

     

    Then Forth's RETURN looks like this. 

    l: _exit        IP RPOP,      \ pop an new IP address off return stack 
    l: _next   *IP+ W  MOV,       \ move CFA into Working register & incr IP
                  *W+  R5 MOV,    \ move contents of CFA to R5 & INCR W
                  *R5  B,         \ branch to the address in R5

     

    CODE words have overhead.  They look like this. 

    <addr_of next_cell> <instruction> ... <instruction> <NEXT>  

    At the end they run NEXT like you would use RT in native ALC but next is 3 instructions. 

     

    So you can see if you write a code word with one instruction  like the Forth word +  It still has to run those last three instructions in NEXT every time it's finished.

    So that's why indirect threaded Forth, which is what this is called, can be 4 to 10 times slower than pure ALC on short routines. 

    However in a big application it is usually closer to 2 to 3 times slower.

     

    "Thus endeth the lesson" as the Episcopalians say. :)

     

     (I didn't get this stuff for a long time so that's why I wrote this up for you) 

    • Like 1
  13. Correct. Machine Forth is a compiler like C or Pascal. However since it is compiling from the Forth console you can run the code afterwards like a Forth word.

    All you need is Forth way to jump into the code (BL *TOS) :) and word to return to Forth. 

    CODE RUN   ( entry_address -- )  *TOS  BL,  NEXT,  ENDCODE 

     

    After seeing Bill Sullivan's result coding the sieve in Assembler I realized you are never going to get full performance from the 9900 with stack operations.

    ( My ASMForth sieve based on his code with a few tweaks is actually a hair faster. :) ) 

     

    ASMForth is me taking the Machine Forth idea that was for Chuck's CPUs and mutating it into something with a closer fit to 9900 but also using Forth-like syntax.

    ASMForthII uses registers like Assembler but you also have PUSH, POP, RPUSH and RPOP macros so you have the stack. 

     

    Colon definitions automatically push R11 on entry and POP R11 before RT.

     

    The docs is here https://docs.google.com/document/d/1h-qVQeD6_b58DywrzGZphmSSxUNMKBbY7QCByMFKFOM/edit?usp=sharing

    Not the best read. :) 

     

    Here is an ugly HELLO World. ASMForth is a work in progress.  I want to get back closer to normal Forth syntax in future if possible.

    \ tiny hello world in ASMForth II
    \ Translated from hello.c by Tursi for comparison
    
    ASMFORTH
    HEX
    8C02 CONSTANT VDPWA   \ Write Address port 
    8C00 CONSTANT VDPWD   \ Write Data port
    
    \ define the string
    CREATE TXT  S" Hello World!" S,
    
    : VDPADDR 
        TOS ><        \ swap bytes 
        TOS VDPWA C!  \ VDP address LSB character store 
        TOS ><        \ swap bytes 
        TOS VDPWA C!  \ VDP address MSB + "write" bit character store 
        DROP 
    ;
    
    MACRO: EMIT+  ( addr -- addr++)  VDPWD C!  ;MACRO
    
    CODE MAIN 
        0 LIMI,        \ disable interrupts 
    \ set the VDP address to >0000 with write bit set
        4000 # VDPADDR
        TXT # 
        *TOS R0 C!     \ byte count -> R0
        R0 8 RSHIFT 
        R0 1-          \ for loop needs 1 less
        TOS 1+         \ skip past byte count 
        R0 FOR         \ get argument from R0 
           TOS @+ EMIT+  \ @+ is indirect auto-inc.
        NEXT
        DROP
    
        NEXT,
    ENDCODE 
    
    \ usage: PAGE MAIN CR
    
    

     

    Here is the code generated 

       DF00  0300  limi >0000                  (24)
        DF04  0646  dect R6                     (14)
       DF06  C584  mov  R4,*R6                 (30)
       DF08  0204  li   R4,>4000               (20)
       DF0C  06A0  bl   @>dece                 (32)
       DECE  0647  dect R7                     (14)
       DED0  C5CB  mov  R11,*R7                RPush R11
       DED2  06C4  swpb R4                     (14)
       DED4  D804  movb R4,@>8c02              (38)
       DED8  06C4  swpb R4                     (14)
       DEDA  D804  movb R4,@>8c02              (38)
       DEDE  C136  mov  *R6+,R4                (30)
    >  DEE0  C2F7  mov  *R7+,R11
       DEE2  045B  b    *R11
    
       DF10  0646  dect R6                     (14)
       DF12  C584  mov  R4,*R6                 (30)
       DF14  0204  li   R4,>deb2               (20)
       DF18  D014  movb *R4,R0                 (26)
       DF1A  0880  sra  R0,8                   (32)
       DF1C  0600  dec  R0                     (14)
       DF1E  0584  inc  R4                     (14)
    
       DF20  0647  dect R7
    >  DF22  C5CB  mov  R11,*R7        Rpush loop counter R11
       DF24  C2C0  mov  R0,R11         Set R11 loop counter for FOR NEXT
       DF26  D834  movb *R4+,@>8c00
       DF2A  060B  dec  R11            R11 is the return stack CACHE
       DF2C  18FC  joc  >df26
       DF2E  C2F7  mov  *R7+,R11
       DF30  C136  mov  *R6+,R4        DROP refills TOS
       DF32  045A  b    *R10           return to Forth

     

    • Like 1
  14. I don't know where else to put this so it's going here. 

    I am watching the guys configure GCC to our favorite machine and it has me thinking again about generating efficient 9900 code from Forth (like) syntax.

     

    I heard a talk by Chuck Moore on how he is experimenting using more registers in his current Forth systems.

    I started down that path with ASMForth because registers are how you get performance from a register machine.

     

    In the process I noticed that when you use the raw instructions as Forth primitives you have so much freedom.

    Assuming the top of stack is cached in a register we can do this: 

     

      !   becomes MOV

    +!   becomes Add

    etc.

     

    Something cool then happens in that you can convert some stack operations into simpler machine code. 

     

    So where DUP in this architecture is: 

        SP INCT,
    TOS *SP MOV, 

     

    DUP  can be replaced with 

    TOS TOS  

     

    SWAP means we reverse the order of the arguments  that we feed to the instruction. 

    (I think this will work. It might need a flag somewhere to signal we are swapped)

    TOS *SP 

     

    Where OVER is something like: (notice we DUP first)

                 SP INCT,
            TOS *SP MOV,
         2 (SP) TOS MOV, 

     

    OVER can be replaced by     

    *SP TOS 

     

    since we just want to use the 2nd stack item but not destroy it. 

     

    The phrase "OVER ="  becomes:

     

         *SP TOS CMP, 

     

    This is a far as I got in this line of thought.

    I think ROT might not be so easy to convert but I will explore this as time permits. 

     

    And all of this begs the question of can we just allocated 3 registers for the top of the stack and use logic to remember what item is in what register?

    That makes my head hurt at this stage.

     

     

     

    • Like 3
  15. 1 minute ago, khanivore said:

    Yeah, these are RTL printouts, pretty cryptic alright.  I think RTL is based on lisp.  GCC matches the RTL against "predicates" in the machine description file which then emit the opcodes.

    Interesting, LISP. Yes now I see it. 

    Ok so all I need to do is make an RPN version of RTL! 🤣

    • Like 1
    • Haha 1
  16. ; movhi-451
    ; OP0 : (mem/c:HI (plus:HI (reg/f:HI 10 r10)
    ;         (const_int 2 [0x2])) [4 %sfp+2 S2 A16])code=[mem:HI]
    ; OP1 : (reg:HI 2 r2)code=[reg:HI]
    

    As an outside observer, lay person, I am guessing that this is the language used to program what code GCC emits. ?

    (I am enjoying watching the show. Thanks)

     

    (PS don't tell me Forth is cryptic anymore) :)

     

     

    • Like 1
  17. 14 minutes ago, Lee Stewart said:

    ...which seems an exercise in futility in fbForth because they are already part of the language.

     

    ...lee

    Yes I was puzzling on that myself since you have VMBW, VMBR and the lot of them in the dictionary.

     

    The one thing I toyed with was giving all those VDP words dual access.  The primary version would be a native sub-routine and the Forth words would BL the sub-routines.

    I do this now with a sub-routine to set the VDP address in read or write mode.  I don't expose the name, but you can get the address with "carnal knowledge" of VC! and VC@.

     

    ' VC@ 2 CELLS + CONSTANT RMODE
    ' VC! 2 CELLS + CONSTANT WMODE

     

    I didn't have room in 8K  a few years ago, but I have learned how to be more efficient so it's possible now if I don't expose the sub-routine names for all of them.

    Might give it a go.

     

     

    • Like 3
  18. 3 hours ago, Tursi said:

    I am a big fan of the dimension hopping theory, though. It's so much easier to assume that's what it was, and not a brain storage failure. ;)

    <rant>

    I know you are joking but holy crap is this attitude ever present on Twitter!

    Any excuse at "theories" (sic) to avoid learning about reality.

    </rant>

    Ok I'm good now. :)

     

    • Like 1
    • Haha 1
  19. 2 hours ago, Lee Stewart said:

     

     

    fbForth has its own version of all of the E/A utilities (except for the loader and linker, of course), which are copied to low RAM. If you load the E/A utilities into low RAM (which you can certainly do), you will overwrite those utilities, as well as the Forth block buffers and a host of other low-level stuff fbForth requires to function at all.

     

    ...lee

    This might be a good use for SAMS page or two so that you could select the block buffers or the utilities under program control. ?

    (As long as you don't have to use the utilities on the block buffers) 

  20. 12 hours ago, dhe said:

    DHE's testing service is always open!

    LOL. Thank you. It's good to have someone else to look over my shoulder. 

    I have been on vacation but I am back at it.

    I have a sticky bit.  The 99 text files are limited to 80 characters but if we insert text at the cursor rather than line by line I have to start chopping lines.

    Not sure how best to do that right now but I am noodling on it.

     

     

    • Like 1
×
×
  • Create New...