TheBF

January 16

Just wondering people do in other parts of atariage and I saw your post.

I could be that TRS80 expects hardware handshake to be respected by the sending side so transmission stops when TRS80 is putting data into memory.

You might make some progress by setting up an RS232 on a PC and send text files to the TRS80 with TeraTerm and change the handshake options

to see what your Tandy software responds to. It sounds like the software is not using an interrupt on the receive side but rather is polling RS232 with software.

That is always a problem for high-speed receive.

I have spent way to many hours debugging RS232 30 years ago.

Not very helpful but it's all I got.

Best of luck.

January 15

6 hours ago, Vorticon said:

Dang it! That @ symbol always gets me.

@Vorticon I don't know if helps, but since you have written a considerable number of lines of Forth...

I think of @ in Assembler like the fetch (@) operator in Forth when it's applied to the <src> operand.

The analogy doesn't quite work as well when it's the <dst> operand but it might give you a memory aid.

</sidebar>

January 15

1 hour ago, Lee Stewart said:
Forgive me for piling on, but because we ALC programmers know we need to preserve R11 when we start/continue a BL-RT cascade, we usually put the “save return” at a the routine’s beginning with the “restore return” at the end just before RT, I would push the link (PUSHL) at the beginning and return after popping the link (RTPL) at the end. Here are the macros for xas99.py:
   .defm PUSHL
       DECT RSP
       MOV  R11,*RSP
   .endm

   .defm RTPL
       MOV  *RSP+,R11
       B    *R11
   .endm
 
...lee

A macro that I find handy lets me PUSH or POP any register to/from the stack.

This can be useful when you just need few more free registers temporarily and don't want to use a separate workspace.

It was a more common style of programming on Intel boxes where we only had AX BX CX DX as general purpose registers

( and even CX and DX had special purposes) but it's still a handy tool even on 9900 once you have implemented a stack IMHO.

So would this be correct to make generic PUSH and POP macros in xas99?

.defm PUSH
       DECT RSP
       MOV #1,*RSP
.endm


.defm POP
       MOV  *RSP+,#1
.endm

January 14

2 minutes ago, Lee Stewart said:

Of course, all of the Forths do this, including yours (CAMEL99 Forth) and mine (fbForth). I do it locally (a second return stack) in the Floating Point Library because I am using a different workspace.

...lee

Indeed. Stacks are amazingly handy once you have them. (case in point: FbForth now has three!)

I just noticed there still seems to be R11 -> label stuff out there.

Perhaps because the E/A manual does it that way?

Maybe seeing how simple it is will allow new entrants to try a stack.

And with the fancy new tools we have today of course a PUSH and POP macro for any register are just a few lines away.

Perhaps someone could demonstrate such macros with Ralph's Assembler?

January 14

39 minutes ago, apersson850 said:

A quickie if you only need to go two levels down is to save the return address in another register. Then you wouldn't save R11 in SUB1, but save it in R12 in SUB2. Then SUB2 can end just by B *R12.

With my brain this would have to go under the "Don't optimize too soon" file.

January 14

Thank you @apersson850.

I almost never work on conventional Assembler as you can see.

Replaced the BSS statement.

January 14

I know this is old hat for the pros out there but for people new to Assembler this might be new.

An alternative is to make a return stack something like this.

The size is 2 bytes smaller in the subroutines and they all share the same memory locations for saving R11.

And any subroutine can call any subroutine. With 16 bytes you can BL 8 levels deep.

Edit: Corrected per @apersson850 catch.

* return stack for TI99 Assembly language
          
          DEF START 

RP        EQU  7

RSTACK    BES  16

SUB1    DECT RP 
        MOV R11,*RP 
*          ....     code goes here 
        MOV *RP+,R11
        B *R11  

SUB2    DECT RP 
        MOV R11,*RP 

        BL @SUB1           subroutine calling a subroutine 

        MOV *RP+,R11
        B *R11  

START   LI R7,RSTACK        run this in your startup code  
      
        BL @SUB1        
        BL @SUB2            it just works  
        END

January 14

Question: How many Assembly Language coders are using a return stack for nesting sub-routines?

I have been looking at some code and it uses a bunch of memory locations to save R11 when entering a sub-routine and to restore R11 on exit.

This uses 4 bytes on entry and 6 bytes on return

SAVER1    DATA  0
SAVER2    DATA  0
SAVER3    DATA  0 

MYCODE    MOV R11,@SAVER1      4 BYTES
*             ...
  
          MOV @SAVER1,R11      4 Bytes 
          B *R11               2 bytes

January 14

What the heck. On the non-zero probability that someone cares here's my channel

This video shows how to link Assembler object files into memory and run them from Camel99 Forth.

(Now that I see it after all this time, I should do a follow up on saving Forth executable programs that call object code)

January 13

3 hours ago, Vorticon said:

So here's my issue. If I am to LWPI >83E0, I need to save the pointer to my current workspace, something like STWP R1. All good so far. Now how do I get back to the current workspace after the scan routine is run? LWPI requires an immediate address which is in R1, but it won't accept indirect addressing like LWPI *R1 or if I store R1 say at SAVWP, LWPI @SAVWP does not work either...

Is the PASCAL workspace in a fixed location?

If so you just restored it manually with a followup LWPI after you called the ROM routine.

KSCAN  LWPI >83E0           can't change WS with BLWP as R13-R15 are in use
       MOV  R11,@OLDR11     save GPL R11
       BL   @>000E          call keyboard scanning routine
       MOV  @OLDR11,R11     restore GPL R11
       LWPI <PASCWKSP>

January 13

1 hour ago, Stuart said:

Need a BL to @000EH, not a BLWP. Need to LWPI the GPL workspace at @83E0H before you make the call I believe. Not sure about the rest of the code ...!

What he said, like the example I gave you.

January 12

Heresy of Heresies

Over on Reddit Forth there was a discussion on local variables.

Stephen Pelc of VFX Forth said that their experiments show as much as 50% slowdown using locals instead of the data stack.

One of the posts was made by a Forth implementer (Zeptoforth) who said that he has switched to using locals a lot and it has NO effect on the speed of his programs.

His thinking is that locals would only be much slower on a compiler that converts stack data into register assignments but on regular Forth compilers it is neutral.

That made me wonder...

I had one version of "cheap" locals that I knew was not optimal and one that used 9900 index addressing into the return stack.

The downsize of the indexing version was you needed to create two names: 1 to fetch and 1 to store.

In my past tests the Forth version of BENCHIE using a fast VALUE ran in 24.3 seconds.

The non-optimal locals version ran in 48 seconds... but I had never done the test with the better version.

Turns out that guy on Reddit was correct. In fact the locals version ran a bit faster on Camel99 Forth.

I think this is because of the 9900 property that less instructions is almost always faster.

In this case local fetch is:

MOV n(RP),TOS

Store to local is

MOV TOS,n(RP)

That's about as good as it can get.

So it makes me think I could make some kind of defining word that creates a double CFA word.

By default the local does a fetch from return stack to the data stack.

Then make a word like TO for values that compiles the store code address when we assign to a local.

Here is the experiment code.

Spoiler

\ Benchie.fth from the internet

\ tForth (20 MHz T8): 196 bytes 0.198 sec
\ iForth (33 MHz '386): 175 bytes 0.115 sec
\ iForth (40 MHz '486DLC): 172 bytes 0.0588 sec
\ iForth (66 MHz '486): 172 bytes 0.0323 sec
\ RTX2000: 89 bytes 0.098 sec (no Headers)
\ HSF2000 (1.6GHz AMD Sempron) ?? bytes  0.22 secs
\ 8051 ANS Forth (12 MHz 80C535): 126 bytes 15,8 sec (met uservariabelen)
\ HSF2000 2014 with a 2.1 Ghz Intel  0.05 seconds.
\         increased loop size X10   0.16

\ CAMEL99 v2.7    
\ W/FAST VALUES      24.21
\ W/locals           24.08 
\ TurboForth V1.2.1  24.6  (for reference)

NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS DUMP   FROM DSK1.TOOLS
NEEDS VALUE  FROM DSK1.VALUES 


HERE
HEX
CODE LOCALS ( n --) \ build a stack frame n cells deep
\ *pushes the original RP onto top of rstack for fast collapse
\ RP R0 MOV, TOS 1 SLA, TOS RP SUB,   R0 RPUSH,     TOS POP,
  C007 ,    0A14 ,   61C4 ,    0647 , C5C0 ,  C136 ,  NEXT,  
ENDCODE

CODE /LOCALS  ( -- ) \ collapse stack frame
    C1D7 , NEXT, \ *RP RP MOV, NEXT,
ENDCODE

\ Local variable compilers make named code words
: GETTER  ( n --) \ create name that returns a contents of a local
\           TOS PUSH,  ( n) 2* (RP) TOS MOV,  NEXT,  ;
  CODE     0646 , C584 , C127 , CELLS ,       NEXT,  ;

: SETTER ( n --) \ create name that sets contents of a local
\      TOS SWAP CELLS (RP) MOV, TOS POP, 
  CODE    C9C4 ,   CELLS ,    C136 ,  NEXT,  ;

: ADDER  ( n -- ) \ defines a local for +! operation
\      TOS SWAP CELLS (RP) ADD, TOS POP, 
  CODE    A9C4 ,   CELLS ,    C136 ,   NEXT,  ;

\ defines a "setter" and a "getter"   
: LOCAL:  ( n ) DUP GETTER  SETTER  ;


\ conventional BENCHIE 
HEX
100 CONSTANT MASK
  5 CONSTANT FIVE
    VALUE BVAR
: BENCHIE
        MASK 0
        DO       \ locals work inside do loop   
            1
            BEGIN
              DUP SWAP DUP ROT DROP 1 AND
              IF FIVE +
              ELSE 1-
              THEN TO BVAR
              BVAR DUP MASK AND
            UNTIL
            DROP
        LOOP
;  \ 24.21 seconds 

\ BENCHIE with locals 
\ create two names. one to fetch, one to store 
\        fetch   store  
1 LOCAL: BVAR    BVAR! 
2 LOCAL: NDX     NDX! 

: BENCHIE2 
        1 LOCALS \ define outside do loop  
        MASK 0
        DO       \ locals work inside do loop   
            1
            BEGIN
              DUP SWAP DUP ROT DROP 1 AND
              IF FIVE +
              ELSE 1-
              THEN BVAR!
              BVAR DUP MASK AND
            UNTIL
            DROP
        LOOP
        /LOCALS
;  \ 24.08 seconds

January 10

26 minutes ago, SteveB said:

Why didn't we get 64k words = 128 kbyte by design? The A0 line looks useless this way...

The CPU can only address 32K words. Buss bits: A0..A14

January 10

19 hours ago, TheMole said:

Might the idea come from the common assertion that reading from the VDP is incredibly slow (compared to writing to VRAM)? Is that not true either then?

Unfortunately VDP RAM speed is not that different than using expansion RAM.

I wrote some tests for my own understanding where I did string manipulation in VDP vs RAM using the same kind of Forth code for both.

The difference in speed was only about 12% slower in VDP RAM vs Expansion RAM if I recall correctly.

The difference would be bigger using only Assembler code.

January 10

And to be fair on a sizable piece of code like VMBW those three instructions at the end in the NEXT routine don't mean very much.

The loop takes way more time. So Lee's utilities run at machine speed.

January 10

To give you some size of code perspective Mike on what Lee described there is an "entry" routine and an "EXIT" in every Forth colon definition.

In memory it looks like this:

<enter> <cfa> <cfa> .... <cfa>  <exit>

It's not important here to understand the dirty details but you can see how much code runs for each Forth word.

(The ALC is my dressed up Forth Assembler to help my feeble brain so it has some "pseudo-instructions" like POP etc)

The <enter> above is the address of a short piece of code but it still takes some time

l: _enter   IP RPUSH,        \ push IP register onto the return stack
            W IP MOV,        \ move PFA into Forth IP register
           _next JMP,

Then Forth's RETURN looks like this.

l: _exit        IP RPOP,      \ pop an new IP address off return stack 
l: _next   *IP+ W  MOV,       \ move CFA into Working register & incr IP
              *W+  R5 MOV,    \ move contents of CFA to R5 & INCR W
              *R5  B,         \ branch to the address in R5

CODE words have overhead. They look like this.

<addr_of next_cell> <instruction> ... <instruction> <NEXT>

At the end they run NEXT like you would use RT in native ALC but next is 3 instructions.

So you can see if you write a code word with one instruction like the Forth word + It still has to run those last three instructions in NEXT every time it's finished.

So that's why indirect threaded Forth, which is what this is called, can be 4 to 10 times slower than pure ALC on short routines.

However in a big application it is usually closer to 2 to 3 times slower.

"Thus endeth the lesson" as the Episcopalians say.

(I didn't get this stuff for a long time so that's why I wrote this up for you)

January 10

Correct. Machine Forth is a compiler like C or Pascal. However since it is compiling from the Forth console you can run the code afterwards like a Forth word.

All you need is Forth way to jump into the code (BL *TOS) and word to return to Forth.

CODE RUN   ( entry_address -- )  *TOS  BL,  NEXT,  ENDCODE

After seeing Bill Sullivan's result coding the sieve in Assembler I realized you are never going to get full performance from the 9900 with stack operations.

( My ASMForth sieve based on his code with a few tweaks is actually a hair faster. )

ASMForth is me taking the Machine Forth idea that was for Chuck's CPUs and mutating it into something with a closer fit to 9900 but also using Forth-like syntax.

ASMForthII uses registers like Assembler but you also have PUSH, POP, RPUSH and RPOP macros so you have the stack.

Colon definitions automatically push R11 on entry and POP R11 before RT.

The docs is here https://docs.google.com/document/d/1h-qVQeD6_b58DywrzGZphmSSxUNMKBbY7QCByMFKFOM/edit?usp=sharing

Not the best read.

Here is an ugly HELLO World. ASMForth is a work in progress. I want to get back closer to normal Forth syntax in future if possible.

\ tiny hello world in ASMForth II
\ Translated from hello.c by Tursi for comparison

ASMFORTH
HEX
8C02 CONSTANT VDPWA   \ Write Address port 
8C00 CONSTANT VDPWD   \ Write Data port

\ define the string
CREATE TXT  S" Hello World!" S,

: VDPADDR 
    TOS ><        \ swap bytes 
    TOS VDPWA C!  \ VDP address LSB character store 
    TOS ><        \ swap bytes 
    TOS VDPWA C!  \ VDP address MSB + "write" bit character store 
    DROP 
;

MACRO: EMIT+  ( addr -- addr++)  VDPWD C!  ;MACRO

CODE MAIN 
    0 LIMI,        \ disable interrupts 
\ set the VDP address to >0000 with write bit set
    4000 # VDPADDR
    TXT # 
    *TOS R0 C!     \ byte count -> R0
    R0 8 RSHIFT 
    R0 1-          \ for loop needs 1 less
    TOS 1+         \ skip past byte count 
    R0 FOR         \ get argument from R0 
       TOS @+ EMIT+  \ @+ is indirect auto-inc.
    NEXT
    DROP

    NEXT,
ENDCODE 

\ usage: PAGE MAIN CR

Here is the code generated

   DF00  0300  limi >0000                  (24)
    DF04  0646  dect R6                     (14)
   DF06  C584  mov  R4,*R6                 (30)
   DF08  0204  li   R4,>4000               (20)
   DF0C  06A0  bl   @>dece                 (32)
   DECE  0647  dect R7                     (14)
   DED0  C5CB  mov  R11,*R7                RPush R11
   DED2  06C4  swpb R4                     (14)
   DED4  D804  movb R4,@>8c02              (38)
   DED8  06C4  swpb R4                     (14)
   DEDA  D804  movb R4,@>8c02              (38)
   DEDE  C136  mov  *R6+,R4                (30)
>  DEE0  C2F7  mov  *R7+,R11
   DEE2  045B  b    *R11

   DF10  0646  dect R6                     (14)
   DF12  C584  mov  R4,*R6                 (30)
   DF14  0204  li   R4,>deb2               (20)
   DF18  D014  movb *R4,R0                 (26)
   DF1A  0880  sra  R0,8                   (32)
   DF1C  0600  dec  R0                     (14)
   DF1E  0584  inc  R4                     (14)

   DF20  0647  dect R7
>  DF22  C5CB  mov  R11,*R7        Rpush loop counter R11
   DF24  C2C0  mov  R0,R11         Set R11 loop counter for FOR NEXT
   DF26  D834  movb *R4+,@>8c00
   DF2A  060B  dec  R11            R11 is the return stack CACHE
   DF2C  18FC  joc  >df26
   DF2E  C2F7  mov  *R7+,R11
   DF30  C136  mov  *R6+,R4        DROP refills TOS
   DF32  045A  b    *R10           return to Forth

January 9

I don't know where else to put this so it's going here.

I am watching the guys configure GCC to our favorite machine and it has me thinking again about generating efficient 9900 code from Forth (like) syntax.

I heard a talk by Chuck Moore on how he is experimenting using more registers in his current Forth systems.

I started down that path with ASMForth because registers are how you get performance from a register machine.

In the process I noticed that when you use the raw instructions as Forth primitives you have so much freedom.

Assuming the top of stack is cached in a register we can do this:

! becomes MOV

+! becomes Add

etc.

Something cool then happens in that you can convert some stack operations into simpler machine code.

So where DUP in this architecture is:

    SP INCT,
TOS *SP MOV,

DUP can be replaced with

TOS TOS

SWAP means we reverse the order of the arguments that we feed to the instruction.

(I think this will work. It might need a flag somewhere to signal we are swapped)

TOS *SP

Where OVER is something like: (notice we DUP first)

             SP INCT,
        TOS *SP MOV,
     2 (SP) TOS MOV,

OVER can be replaced by

*SP TOS

since we just want to use the 2nd stack item but not destroy it.

The phrase "OVER =" becomes:

     *SP TOS CMP,

This is a far as I got in this line of thought.

I think ROT might not be so easy to convert but I will explore this as time permits.

And all of this begs the question of can we just allocated 3 registers for the top of the stack and use logic to remember what item is in what register?

That makes my head hurt at this stage.

January 9

1 minute ago, khanivore said:

Yeah, these are RTL printouts, pretty cryptic alright. I think RTL is based on lisp. GCC matches the RTL against "predicates" in the machine description file which then emit the opcodes.

Interesting, LISP. Yes now I see it.

Ok so all I need to do is make an RPN version of RTL! 🤣

January 9

; movhi-451
; OP0 : (mem/c:HI (plus:HI (reg/f:HI 10 r10)
;         (const_int 2 [0x2])) [4 %sfp+2 S2 A16])code=[mem:HI]
; OP1 : (reg:HI 2 r2)code=[reg:HI]

As an outside observer, lay person, I am guessing that this is the language used to program what code GCC emits. ?

(I am enjoying watching the show. Thanks)

(PS don't tell me Forth is cryptic anymore)

January 9

14 minutes ago, Lee Stewart said:

...which seems an exercise in futility in fbForth because they are already part of the language.

...lee

Yes I was puzzling on that myself since you have VMBW, VMBR and the lot of them in the dictionary.

The one thing I toyed with was giving all those VDP words dual access. The primary version would be a native sub-routine and the Forth words would BL the sub-routines.

I do this now with a sub-routine to set the VDP address in read or write mode. I don't expose the name, but you can get the address with "carnal knowledge" of VC! and VC@.

' VC@ 2 CELLS + CONSTANT RMODE
' VC! 2 CELLS + CONSTANT WMODE

I didn't have room in 8K a few years ago, but I have learned how to be more efficient so it's possible now if I don't expose the sub-routine names for all of them.

Might give it a go.

January 9

3 hours ago, Tursi said:

I am a big fan of the dimension hopping theory, though. It's so much easier to assume that's what it was, and not a brain storage failure.

<rant>

I know you are joking but holy crap is this attitude ever present on Twitter!

Any excuse at "theories" (sic) to avoid learning about reality.

</rant>

Ok I'm good now.

January 9

2 hours ago, Lee Stewart said:

fbForth has its own version of all of the E/A utilities (except for the loader and linker, of course), which are copied to low RAM. If you load the E/A utilities into low RAM (which you can certainly do), you will overwrite those utilities, as well as the Forth block buffers and a host of other low-level stuff fbForth requires to function at all.

...lee

This might be a good use for SAMS page or two so that you could select the block buffers or the utilities under program control. ?

(As long as you don't have to use the utilities on the block buffers)

January 8

Not sure the best way to do this in FbForth, but the short answer is yes.

You can save up to 8K chunks of RAM as a "program" file and then drop them into memory.

I have some utilities for that.

FbForth has BSAVE and BLOAD which might do the job.

Consult Lee's excellent docs.

January 8

12 hours ago, dhe said:

DHE's testing service is always open!

LOL. Thank you. It's good to have someone else to look over my shoulder.

I have been on vacation but I am back at it.

I have a sticky bit. The 99 text files are limited to 80 characters but if we insert text at the cursor rather than line by line I have to start chopping lines.

Not sure how best to do that right now but I am noodling on it.

Profiles

Forums

Blogs

Gallery

Events

Store

Posts posted by TheBF