-
Posts
4,470 -
Joined
-
Last visited
Content Type
Profiles
Forums
Blogs
Gallery
Events
Store
Posts posted by TheBF
-
-
6 hours ago, Vorticon said:
Dang it! That @ symbol always gets me.
<sidebar>
@Vorticon I don't know if helps, but since you have written a considerable number of lines of Forth...
I think of @ in Assembler like the fetch (@) operator in Forth when it's applied to the <src> operand.
The analogy doesn't quite work as well when it's the <dst> operand but it might give you a memory aid.
</sidebar>
- 2
-
1 hour ago, Lee Stewart said:
Forgive me for piling on, but because we ALC programmers know we need to preserve R11 when we start/continue a BL-RT cascade, we usually put the “save return” at a the routine’s beginning with the “restore return” at the end just before RT, I would push the link (PUSHL) at the beginning and return after popping the link (RTPL) at the end. Here are the macros for xas99.py:
.defm PUSHL DECT RSP MOV R11,*RSP .endm .defm RTPL MOV *RSP+,R11 B *R11 .endm
...lee
A macro that I find handy lets me PUSH or POP any register to/from the stack.
This can be useful when you just need few more free registers temporarily and don't want to use a separate workspace.
It was a more common style of programming on Intel boxes where we only had AX BX CX DX as general purpose registers
( and even CX and DX had special purposes) but it's still a handy tool even on 9900 once you have implemented a stack IMHO.
So would this be correct to make generic PUSH and POP macros in xas99?
.defm PUSH DECT RSP MOV #1,*RSP .endm .defm POP MOV *RSP+,#1 .endm
-
2 minutes ago, Lee Stewart said:
Of course, all of the Forths do this, including yours (CAMEL99 Forth) and mine (fbForth). I do it locally (a second return stack) in the Floating Point Library because I am using a different workspace.
...lee
Indeed. Stacks are amazingly handy once you have them. (case in point: FbForth now has three!)
I just noticed there still seems to be R11 -> label stuff out there.
Perhaps because the E/A manual does it that way?
Maybe seeing how simple it is will allow new entrants to try a stack.
And with the fancy new tools we have today of course a PUSH and POP macro for any register are just a few lines away.
Perhaps someone could demonstrate such macros with Ralph's Assembler?
-
39 minutes ago, apersson850 said:
A quickie if you only need to go two levels down is to save the return address in another register. Then you wouldn't save R11 in SUB1, but save it in R12 in SUB2. Then SUB2 can end just by B *R12.
With my brain this would have to go under the "Don't optimize too soon" file.
- 1
-
Thank you @apersson850.
I almost never work on conventional Assembler as you can see.
Replaced the BSS statement.
-
I know this is old hat for the pros out there but for people new to Assembler this might be new.
An alternative is to make a return stack something like this.
The size is 2 bytes smaller in the subroutines and they all share the same memory locations for saving R11.
And any subroutine can call any subroutine. With 16 bytes you can BL 8 levels deep.
Edit: Corrected per @apersson850 catch.
* return stack for TI99 Assembly language DEF START RP EQU 7 RSTACK BES 16 SUB1 DECT RP MOV R11,*RP * .... code goes here MOV *RP+,R11 B *R11 SUB2 DECT RP MOV R11,*RP BL @SUB1 subroutine calling a subroutine MOV *RP+,R11 B *R11 START LI R7,RSTACK run this in your startup code BL @SUB1 BL @SUB2 it just works END
- 1
- 1
-
Question: How many Assembly Language coders are using a return stack for nesting sub-routines?
I have been looking at some code and it uses a bunch of memory locations to save R11 when entering a sub-routine and to restore R11 on exit.
This uses 4 bytes on entry and 6 bytes on return
SAVER1 DATA 0 SAVER2 DATA 0 SAVER3 DATA 0 MYCODE MOV R11,@SAVER1 4 BYTES * ... MOV @SAVER1,R11 4 Bytes B *R11 2 bytes
-
What the heck. On the non-zero probability that someone cares here's my channel
This video shows how to link Assembler object files into memory and run them from Camel99 Forth.
(Now that I see it after all this time, I should do a follow up on saving Forth executable programs that call object code)
- 3
-
3 hours ago, Vorticon said:
So here's my issue. If I am to LWPI >83E0, I need to save the pointer to my current workspace, something like STWP R1. All good so far. Now how do I get back to the current workspace after the scan routine is run? LWPI requires an immediate address which is in R1, but it won't accept indirect addressing like LWPI *R1 or if I store R1 say at SAVWP, LWPI @SAVWP does not work either...
Is the PASCAL workspace in a fixed location?
If so you just restored it manually with a followup LWPI after you called the ROM routine.
KSCAN LWPI >83E0 can't change WS with BLWP as R13-R15 are in use MOV R11,@OLDR11 save GPL R11 BL @>000E call keyboard scanning routine MOV @OLDR11,R11 restore GPL R11 LWPI <PASCWKSP>
-
1 hour ago, Stuart said:
Need a BL to @000EH, not a BLWP. Need to LWPI the GPL workspace at @83E0H before you make the call I believe. Not sure about the rest of the code ...!
What he said, like the example I gave you.
-
Heresy of Heresies
Over on Reddit Forth there was a discussion on local variables.
Stephen Pelc of VFX Forth said that their experiments show as much as 50% slowdown using locals instead of the data stack.
One of the posts was made by a Forth implementer (Zeptoforth) who said that he has switched to using locals a lot and it has NO effect on the speed of his programs.
His thinking is that locals would only be much slower on a compiler that converts stack data into register assignments but on regular Forth compilers it is neutral.
That made me wonder...
I had one version of "cheap" locals that I knew was not optimal and one that used 9900 index addressing into the return stack.
The downsize of the indexing version was you needed to create two names: 1 to fetch and 1 to store.
In my past tests the Forth version of BENCHIE using a fast VALUE ran in 24.3 seconds.
The non-optimal locals version ran in 48 seconds... but I had never done the test with the better version.
Turns out that guy on Reddit was correct. In fact the locals version ran a bit faster on Camel99 Forth.
I think this is because of the 9900 property that less instructions is almost always faster.
In this case local fetch is:
MOV n(RP),TOS
Store to local is
MOV TOS,n(RP)
That's about as good as it can get.
So it makes me think I could make some kind of defining word that creates a double CFA word.
By default the local does a fetch from return stack to the data stack.
Then make a word like TO for values that compiles the store code address when we assign to a local.
Here is the experiment code.
Spoiler\ Benchie.fth from the internet \ tForth (20 MHz T8): 196 bytes 0.198 sec \ iForth (33 MHz '386): 175 bytes 0.115 sec \ iForth (40 MHz '486DLC): 172 bytes 0.0588 sec \ iForth (66 MHz '486): 172 bytes 0.0323 sec \ RTX2000: 89 bytes 0.098 sec (no Headers) \ HSF2000 (1.6GHz AMD Sempron) ?? bytes 0.22 secs \ 8051 ANS Forth (12 MHz 80C535): 126 bytes 15,8 sec (met uservariabelen) \ HSF2000 2014 with a 2.1 Ghz Intel 0.05 seconds. \ increased loop size X10 0.16 \ CAMEL99 v2.7 \ W/FAST VALUES 24.21 \ W/locals 24.08 \ TurboForth V1.2.1 24.6 (for reference) NEEDS ELAPSE FROM DSK1.ELAPSE NEEDS DUMP FROM DSK1.TOOLS NEEDS VALUE FROM DSK1.VALUES HERE HEX CODE LOCALS ( n --) \ build a stack frame n cells deep \ *pushes the original RP onto top of rstack for fast collapse \ RP R0 MOV, TOS 1 SLA, TOS RP SUB, R0 RPUSH, TOS POP, C007 , 0A14 , 61C4 , 0647 , C5C0 , C136 , NEXT, ENDCODE CODE /LOCALS ( -- ) \ collapse stack frame C1D7 , NEXT, \ *RP RP MOV, NEXT, ENDCODE \ Local variable compilers make named code words : GETTER ( n --) \ create name that returns a contents of a local \ TOS PUSH, ( n) 2* (RP) TOS MOV, NEXT, ; CODE 0646 , C584 , C127 , CELLS , NEXT, ; : SETTER ( n --) \ create name that sets contents of a local \ TOS SWAP CELLS (RP) MOV, TOS POP, CODE C9C4 , CELLS , C136 , NEXT, ; : ADDER ( n -- ) \ defines a local for +! operation \ TOS SWAP CELLS (RP) ADD, TOS POP, CODE A9C4 , CELLS , C136 , NEXT, ; \ defines a "setter" and a "getter" : LOCAL: ( n ) DUP GETTER SETTER ; \ conventional BENCHIE HEX 100 CONSTANT MASK 5 CONSTANT FIVE VALUE BVAR : BENCHIE MASK 0 DO \ locals work inside do loop 1 BEGIN DUP SWAP DUP ROT DROP 1 AND IF FIVE + ELSE 1- THEN TO BVAR BVAR DUP MASK AND UNTIL DROP LOOP ; \ 24.21 seconds \ BENCHIE with locals \ create two names. one to fetch, one to store \ fetch store 1 LOCAL: BVAR BVAR! 2 LOCAL: NDX NDX! : BENCHIE2 1 LOCALS \ define outside do loop MASK 0 DO \ locals work inside do loop 1 BEGIN DUP SWAP DUP ROT DROP 1 AND IF FIVE + ELSE 1- THEN BVAR! BVAR DUP MASK AND UNTIL DROP LOOP /LOCALS ; \ 24.08 seconds
- 2
- 1
-
-
19 hours ago, TheMole said:
Might the idea come from the common assertion that reading from the VDP is incredibly slow (compared to writing to VRAM)? Is that not true either then?
Unfortunately VDP RAM speed is not that different than using expansion RAM.
I wrote some tests for my own understanding where I did string manipulation in VDP vs RAM using the same kind of Forth code for both.
The difference in speed was only about 12% slower in VDP RAM vs Expansion RAM if I recall correctly.
The difference would be bigger using only Assembler code.
-
And to be fair on a sizable piece of code like VMBW those three instructions at the end in the NEXT routine don't mean very much.
The loop takes way more time. So Lee's utilities run at machine speed.
- 2
-
To give you some size of code perspective Mike on what Lee described there is an "entry" routine and an "EXIT" in every Forth colon definition.
In memory it looks like this:
<enter> <cfa> <cfa> .... <cfa> <exit>
It's not important here to understand the dirty details but you can see how much code runs for each Forth word.
(The ALC is my dressed up Forth Assembler to help my feeble brain so it has some "pseudo-instructions" like POP etc)
The <enter> above is the address of a short piece of code but it still takes some time
l: _enter IP RPUSH, \ push IP register onto the return stack W IP MOV, \ move PFA into Forth IP register _next JMP,
Then Forth's RETURN looks like this.
l: _exit IP RPOP, \ pop an new IP address off return stack l: _next *IP+ W MOV, \ move CFA into Working register & incr IP *W+ R5 MOV, \ move contents of CFA to R5 & INCR W *R5 B, \ branch to the address in R5
CODE words have overhead. They look like this.
<addr_of next_cell> <instruction> ... <instruction> <NEXT>
At the end they run NEXT like you would use RT in native ALC but next is 3 instructions.
So you can see if you write a code word with one instruction like the Forth word + It still has to run those last three instructions in NEXT every time it's finished.
So that's why indirect threaded Forth, which is what this is called, can be 4 to 10 times slower than pure ALC on short routines.
However in a big application it is usually closer to 2 to 3 times slower.
"Thus endeth the lesson" as the Episcopalians say.
(I didn't get this stuff for a long time so that's why I wrote this up for you)
- 1
-
Correct. Machine Forth is a compiler like C or Pascal. However since it is compiling from the Forth console you can run the code afterwards like a Forth word.
All you need is Forth way to jump into the code (BL *TOS) and word to return to Forth.
CODE RUN ( entry_address -- ) *TOS BL, NEXT, ENDCODE
After seeing Bill Sullivan's result coding the sieve in Assembler I realized you are never going to get full performance from the 9900 with stack operations.
( My ASMForth sieve based on his code with a few tweaks is actually a hair faster. )
ASMForth is me taking the Machine Forth idea that was for Chuck's CPUs and mutating it into something with a closer fit to 9900 but also using Forth-like syntax.
ASMForthII uses registers like Assembler but you also have PUSH, POP, RPUSH and RPOP macros so you have the stack.
Colon definitions automatically push R11 on entry and POP R11 before RT.
The docs is here https://docs.google.com/document/d/1h-qVQeD6_b58DywrzGZphmSSxUNMKBbY7QCByMFKFOM/edit?usp=sharing
Not the best read.
Here is an ugly HELLO World. ASMForth is a work in progress. I want to get back closer to normal Forth syntax in future if possible.
\ tiny hello world in ASMForth II \ Translated from hello.c by Tursi for comparison ASMFORTH HEX 8C02 CONSTANT VDPWA \ Write Address port 8C00 CONSTANT VDPWD \ Write Data port \ define the string CREATE TXT S" Hello World!" S, : VDPADDR TOS >< \ swap bytes TOS VDPWA C! \ VDP address LSB character store TOS >< \ swap bytes TOS VDPWA C! \ VDP address MSB + "write" bit character store DROP ; MACRO: EMIT+ ( addr -- addr++) VDPWD C! ;MACRO CODE MAIN 0 LIMI, \ disable interrupts \ set the VDP address to >0000 with write bit set 4000 # VDPADDR TXT # *TOS R0 C! \ byte count -> R0 R0 8 RSHIFT R0 1- \ for loop needs 1 less TOS 1+ \ skip past byte count R0 FOR \ get argument from R0 TOS @+ EMIT+ \ @+ is indirect auto-inc. NEXT DROP NEXT, ENDCODE \ usage: PAGE MAIN CR
Here is the code generated
DF00 0300 limi >0000 (24) DF04 0646 dect R6 (14) DF06 C584 mov R4,*R6 (30) DF08 0204 li R4,>4000 (20) DF0C 06A0 bl @>dece (32) DECE 0647 dect R7 (14) DED0 C5CB mov R11,*R7 RPush R11 DED2 06C4 swpb R4 (14) DED4 D804 movb R4,@>8c02 (38) DED8 06C4 swpb R4 (14) DEDA D804 movb R4,@>8c02 (38) DEDE C136 mov *R6+,R4 (30) > DEE0 C2F7 mov *R7+,R11 DEE2 045B b *R11 DF10 0646 dect R6 (14) DF12 C584 mov R4,*R6 (30) DF14 0204 li R4,>deb2 (20) DF18 D014 movb *R4,R0 (26) DF1A 0880 sra R0,8 (32) DF1C 0600 dec R0 (14) DF1E 0584 inc R4 (14) DF20 0647 dect R7 > DF22 C5CB mov R11,*R7 Rpush loop counter R11 DF24 C2C0 mov R0,R11 Set R11 loop counter for FOR NEXT DF26 D834 movb *R4+,@>8c00 DF2A 060B dec R11 R11 is the return stack CACHE DF2C 18FC joc >df26 DF2E C2F7 mov *R7+,R11 DF30 C136 mov *R6+,R4 DROP refills TOS DF32 045A b *R10 return to Forth
- 1
-
I don't know where else to put this so it's going here.
I am watching the guys configure GCC to our favorite machine and it has me thinking again about generating efficient 9900 code from Forth (like) syntax.
I heard a talk by Chuck Moore on how he is experimenting using more registers in his current Forth systems.
I started down that path with ASMForth because registers are how you get performance from a register machine.
In the process I noticed that when you use the raw instructions as Forth primitives you have so much freedom.
Assuming the top of stack is cached in a register we can do this:
! becomes MOV
+! becomes Add
etc.
Something cool then happens in that you can convert some stack operations into simpler machine code.
So where DUP in this architecture is:
SP INCT, TOS *SP MOV,
DUP can be replaced with
TOS TOS
SWAP means we reverse the order of the arguments that we feed to the instruction.
(I think this will work. It might need a flag somewhere to signal we are swapped)
TOS *SP
Where OVER is something like: (notice we DUP first)
SP INCT, TOS *SP MOV, 2 (SP) TOS MOV,
OVER can be replaced by
*SP TOS
since we just want to use the 2nd stack item but not destroy it.
The phrase "OVER =" becomes:
*SP TOS CMP,
This is a far as I got in this line of thought.
I think ROT might not be so easy to convert but I will explore this as time permits.
And all of this begs the question of can we just allocated 3 registers for the top of the stack and use logic to remember what item is in what register?
That makes my head hurt at this stage.
- 3
-
1 minute ago, khanivore said:
Yeah, these are RTL printouts, pretty cryptic alright. I think RTL is based on lisp. GCC matches the RTL against "predicates" in the machine description file which then emit the opcodes.
Interesting, LISP. Yes now I see it.
Ok so all I need to do is make an RPN version of RTL! 🤣
- 1
- 1
-
; movhi-451 ; OP0 : (mem/c:HI (plus:HI (reg/f:HI 10 r10) ; (const_int 2 [0x2])) [4 %sfp+2 S2 A16])code=[mem:HI] ; OP1 : (reg:HI 2 r2)code=[reg:HI]
As an outside observer, lay person, I am guessing that this is the language used to program what code GCC emits. ?
(I am enjoying watching the show. Thanks)
(PS don't tell me Forth is cryptic anymore)
- 1
-
14 minutes ago, Lee Stewart said:
...which seems an exercise in futility in fbForth because they are already part of the language.
...lee
Yes I was puzzling on that myself since you have VMBW, VMBR and the lot of them in the dictionary.
The one thing I toyed with was giving all those VDP words dual access. The primary version would be a native sub-routine and the Forth words would BL the sub-routines.
I do this now with a sub-routine to set the VDP address in read or write mode. I don't expose the name, but you can get the address with "carnal knowledge" of VC! and VC@.
' VC@ 2 CELLS + CONSTANT RMODE ' VC! 2 CELLS + CONSTANT WMODE
I didn't have room in 8K a few years ago, but I have learned how to be more efficient so it's possible now if I don't expose the sub-routine names for all of them.
Might give it a go.
- 3
-
3 hours ago, Tursi said:
I am a big fan of the dimension hopping theory, though. It's so much easier to assume that's what it was, and not a brain storage failure.
<rant>
I know you are joking but holy crap is this attitude ever present on Twitter!
Any excuse at "theories" (sic) to avoid learning about reality.
</rant>
Ok I'm good now.
- 1
- 1
-
2 hours ago, Lee Stewart said:
fbForth has its own version of all of the E/A utilities (except for the loader and linker, of course), which are copied to low RAM. If you load the E/A utilities into low RAM (which you can certainly do), you will overwrite those utilities, as well as the Forth block buffers and a host of other low-level stuff fbForth requires to function at all.
...lee
This might be a good use for SAMS page or two so that you could select the block buffers or the utilities under program control. ?
(As long as you don't have to use the utilities on the block buffers)
-
Not sure the best way to do this in FbForth, but the short answer is yes.
You can save up to 8K chunks of RAM as a "program" file and then drop them into memory.
I have some utilities for that.
FbForth has BSAVE and BLOAD which might do the job.
Consult Lee's excellent docs.
- 2
- 1
-
12 hours ago, dhe said:
DHE's testing service is always open!
LOL. Thank you. It's good to have someone else to look over my shoulder.
I have been on vacation but I am back at it.
I have a sticky bit. The 99 text files are limited to 80 characters but if we insert text at the cursor rather than line by line I have to start chopping lines.
Not sure how best to do that right now but I am noodling on it.
- 1
TRS-80 Model 1 RS-232 data rates
in Tandy Computers
Posted
Just wondering people do in other parts of atariage and I saw your post.
I could be that TRS80 expects hardware handshake to be respected by the sending side so transmission stops when TRS80 is putting data into memory.
You might make some progress by setting up an RS232 on a PC and send text files to the TRS80 with TeraTerm and change the handshake options
to see what your Tandy software responds to. It sounds like the software is not using an interrupt on the receive side but rather is polling RS232 with software.
That is always a problem for high-speed receive.
I have spent way to many hours debugging RS232 30 years ago.
Not very helpful but it's all I got.
Best of luck.