fbForth fbForth—TI Forth with File-based Block I/O [Post #1 UPDATED: 04/13/2021]

Recommended Posts

I believe the DSRLNK in question starts the scan at 0x1200 [R12 is first set to 0x1100 and 0x0100 is added to R12 before the test starts]  then loops back around to 0x1000 [R12 set to 0x0f00 then 0x0100 is added to R12] , thus skipping the floppy controller in favor of higher CRU bases. This was also done to hit the HFDC and other devices at 0x1000 before the floppy controller.  There are programs where changing the first scanned address was beneficial long ago.  If this is indeed the DSRLNK version I shared long ago, I probably had forgotten about this modification.  Good eagle eye.

Thanks. Nice to know some history.  I have another question.

I see things being saved but then not restored or used from the saved locations.

Any ideas on the thinking behind that?

Share on other sites
10 hours ago, Lee Stewart said:

Spoiler
```
HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
: SQRTUD ( ud -- n )
DUP >R         \ MSW to return stack
IF
2 SRL       \ /4 if MSW > 0
4000 +      \ correct for missing MSW
THEN
>R             \ loop limit to return stack
-1 -1          \ index and root starts
BEGIN
2+ DUP      \ add 2 to root and dup
ROT + DUP   \ root+index and dup
R U<        \ index < limit?
WHILE
SWAP        \ yes..reorder index and root
REPEAT         \ next round
R> DROP DROP   \ drop limit and index
1 SRL          \ correct root (small now..no need for 32-bit arithmetic)
R> IF
1 SLA       \ *2 to correct root for initial /4
THEN
;
DECIMAL```

OK—here is the Forth Assembler version of the above unsigned integer square root word, SQRTUD :

```HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
ASM: SQRTUD   ( ud -- n )
2 @(SP) R1 MOV,   \ LSW to R1 as loop limit
*SP+ R0 MOV,      \ pop MSW to R0
NE IF,            \ MSW not 0?
R1 2 SRL,      \ loop limit /4
R1 4000 AI,    \ add shifted bit from MSW
THEN,
R2 SETO,          \ set running root to -1
R3 SETO,          \ set loop index to -1
BEGIN,            \ root-building loop
R2 INCT,       \ running root+2
R2 R3 A,       \ add running root to index
R3 R1 C,       \ compare index to limit
HE UNTIL,         \ leave loop when logical higher or equal
R2 1 SRL,         \ /2 to correct root
R0 0 CI,          \ MSW 0?
NE IF,            \ no
R2 1 SLA,      \ *2 to correct root for /4
THEN,
R2 *SP MOV,       \ root to stack
;ASM
DECIMAL```

I will post the machine code version later.

...lee

Share on other sites

And—the machine code version of SQRTUD :

```HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
CODE: SQRTUD   ( ud -- n )
C069 0002 C039 1303 0921 0221 4000 0702 0703 05C2
A0C2 8043 1AFC 0912 0280 0000 1301 0A12 C642
;CODE
DECIMAL```

I probably should call this word by a different name because of its specialized nature. It will only work with 1 in the MSW of the double number input.

...lee

Share on other sites

For those wanting to test the beta version of build 13, here is fbForth 2.0:A13

Included are inverted (ends in “9.bin”), non-inverted (ends in “8.bin”) and individual bank (end in “b0.bin” – “b3.bin”) binaries, as well as the current FBLOCKS file. The individual binaries are bank-switched in inverted order, so, if you need to assemble your own composite binary, the programmed (inverted) order is b0, b1, b2, b3. Reversing this order to b3, b2, b1, b0, makes the composite binary, effectively, non-inverted—clear as mud, right?

...lee

Share on other sites
7 hours ago, Lee Stewart said:

For those wanting to test the beta version of build 13, here is fbForth 2.0:A13

Included are inverted (ends in “9.bin”), non-inverted (ends in “8.bin”) and individual (end in “b0.bin” – “b3.bin”) binaries, as well as the current FBLOCKS file. The individual binaries are bank-switched in inverted order, so, if you need to assemble your own composite binary, the programmed (inverted) order is b0, b1, b2, b3. Reversing this order to b3, b2, b1, b0, makes the composite binary, effectively, non-inverted—clear as mud, right?

...lee

Not a clue.

Share on other sites

Oooo. K. Haha. Great work

Share on other sites

I've been waiting for "that shoe to drop". Great stuff!

• 1
• 1

Share on other sites
On 9/22/2020 at 10:01 PM, Lee Stewart said:

OK—here is the Forth Assembler version of the above unsigned integer square root word, SQRTUD :

Spoiler
```
HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
ASM: SQRTUD   ( ud -- n )
2 @(SP) R1 MOV,   \ LSW to R1 as loop limit
*SP+ R0 MOV,      \ pop MSW to R0
NE IF,            \ MSW not 0?
R1 2 SRL,      \ loop limit /4
R1 4000 AI,    \ add shifted bit from MSW
THEN,
R2 SETO,          \ set running root to -1
R3 SETO,          \ set loop index to -1
BEGIN,            \ root-building loop
R2 INCT,       \ running root+2
R2 R3 A,       \ add running root to index
R3 R1 C,       \ compare index to limit
HE UNTIL,         \ leave loop when logical higher or equal
R2 1 SRL,         \ /2 to correct root
R0 0 CI,          \ MSW 0?
NE IF,            \ no
R2 1 SLA,      \ *2 to correct root for /4
THEN,
R2 *SP MOV,       \ root to stack
;ASM
DECIMAL```

I just could not let it go, so here is a full (well, almost full) 32-bit version of SQRTUD in Forth Assembler:

Spoiler
```HEX
\ 0 <= ud <= 0xFFFE0001 (4294836225 = 65535^2)
\ Registers:   R0,R1 = udh,udl (loop limit)
\              R2,R3 = idxh,idxl
\              R4,R5 = rooth,rootl (& opening 0 test)
\              R6    = loop flag
ASM: SQRTUD   ( ud -- n )

*SP+ R0 MOV,      \ pop udh to R0
*SP R1 MOV,       \ udl to R1
EQ IF,            \ udl = 0?
R0 R5 MOV,     \ udh = 0, also?
THEN,
\ this will work if at least one of R0, R1 <> 0
NE IF,
R0 FFFE CI,       \ check for max
HE IF,
R0 FFFE LI,    \ too high or at max..
R1 0001 LI,    \ ..force max
THEN,
R2 CLR,           \ set loop..
R3 CLR,           \ ..index idx to 0
R4 CLR,           \ set running..
R5 0001 LI,       \ ..root to 1
BEGIN,            \ root-building loop
R5 INCT,       \ root+2
OC IF,
R4 INC,     \ correct rooth
THEN,
R5 R3 A,       \ add rootl to idxl
OC IF,
R2 INC,     \ correct idxh
THEN,
R4 R2 A,       \ add rooth to idxh
\ compare index to limit
R6 CLR,        \ assume false
R2 R0 C,       \ compare idxh & udh
H IF,
R6 SETO,    \ idxh > udh. set true
ELSE,
EQ IF,
\ equal - check low words
R3 R1 C,      \ compare idxl & udl
HE IF,
R6 SETO,   \ idxl >= udl. set true
THEN,
THEN,
THEN,
R6 INC,        \ if 0, was true, else, false
EQ UNTIL,         \ leave loop when logical higher or equal
R5 1 SRL,         \ /2 to correct rootl
R4 1 SRL,         \ /2 to correct rooth
OC IF,
R5 8000 AI,    \ add LSb shifted out of rooth to MSb of rootl
THEN,
THEN,
R5 *SP MOV,       \ root to stack
;ASM
DECIMAL```

I said, “almost full 32-bit,” because to get back a 16-bit answer, the maximum square is limited to 4,294,836,225 (0xFFFE0001). It is very fast except when the MSW is very large. The maximum square takes 4 seconds! If anyone cares, I will post a machine code version tomorrow—might, anyway.

...lee

Share on other sites

I was going to modify the one I had to use 32 bit operators but knew it was a rabbit hole.

Nice work by you.

• 1
• 1

Share on other sites

Lee,

Here is something you could try to compare your new and old versions.

Spoiler
```\ Sieve of Erathosenes for FbForth

1 CONSTANT 1
2 CONSTANT 2

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < IF ." Out of memory"  ABORT THEN ;

: SEEPRIMES ( -- )
CR ." Primes: "
2 DO
I HERE + [email protected] 0= IF I . THEN
?TERMINAL IF ." Primes halted" ABORT THEN
LOOP ;

\ byte array uses unallocated memory at HERE
DECIMAL
: PRIMES ( n -- )
?MEM
CR ." Running..."
HERE OVER ERASE
1 0 HERE + C!       \ mark as prime like 'C' version
1 1 HERE + C!
2                  \ start at 2
BEGIN
OVER OVER DUP * >
WHILE
DUP HERE + [email protected]  0=
IF  OVER OVER DUP *
DO
1 I HERE + C!
DUP +LOOP
THEN
1+
REPEAT
CR ." Complete."
CR
DROP
CR ." Press ENTER to see primes:" KEY 13 =
IF   SEEPRIMES   THEN
;
```

Share on other sites
2 hours ago, TheBF said:

I was going to modify the one I had to use 32 bit operators but knew it was a rabbit hole.

Nice work by you.

’Twas a bit of a rabbit hole, indeed. The final penny to drop was needing to accommodate the “on carry” test for the first pass through the loop. I could not see anything wrong with my code, but it was going off into the weeds. It finally dawned on me that, in the very first pass for both the running root and the loop index, the carry bit was getting set (-1 to +1 and -1 to 0, respectively).

...lee

Share on other sites
1 hour ago, TheBF said:

Lee,  Here is something you could try to compare your new and old versions.

Reveal hidden contents
```
\ Sieve of Erathosenes for FbForth

1 CONSTANT 1
2 CONSTANT 2

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < IF ." Out of memory"  ABORT THEN ;

: SEEPRIMES ( -- )
CR ." Primes: "
2 DO
I HERE + [email protected] 0= IF I . THEN
?TERMINAL IF ." Primes halted" ABORT THEN
LOOP ;

\ byte array uses unallocated memory at HERE
DECIMAL
: PRIMES ( n -- )
?MEM
CR ." Running..."
HERE OVER ERASE
1 0 HERE + C!       \ mark as prime like 'C' version
1 1 HERE + C!
2                  \ start at 2
BEGIN
OVER OVER DUP * >
WHILE
DUP HERE + [email protected]  0=
IF  OVER OVER DUP *
DO
1 I HERE + C!
DUP +LOOP
THEN
1+
REPEAT
CR ." Complete."
CR
DROP
CR ." Press ENTER to see primes:" KEY 13 =
IF   SEEPRIMES   THEN
;
```

Not sure what you are suggesting I test here. A couple of notes, though: (1) fbForth and TI Forth both have 0, 1, 2, 3 defined as constants in the kernel. (2) Minor nit— SEEPRIMES is missing an input ‘n’ in the stack effects.

...lee

Share on other sites
13 hours ago, Lee Stewart said:

The maximum square takes 4 seconds!

This kind of got me wondering... if doing it mathematically takes 4 seconds, would a dumb binary search be faster? In theory it should be able to get down to the answer within 33 iterations. It'd be straight-forward... square the current number, and depending on whether the result is bigger or smaller than the target, step the appropriate direction. I might be missing something - code I get, math theory not so much.

Share on other sites
21 minutes ago, Tursi said:

This kind of got me wondering... if doing it mathematically takes 4 seconds, would a dumb binary search be faster? In theory it should be able to get down to the answer within 33 iterations. It'd be straight-forward... square the current number, and depending on whether the result is bigger or smaller than the target, step the appropriate direction. I might be missing something - code I get, math theory not so much.

Like this:

Share on other sites
1 hour ago, Lee Stewart said:

Not sure what you are suggesting I test here. A couple of notes, though: (1) fbForth and TI Forth both have 0, 1, 2, 3 defined as constants in the kernel. (2) Minor nit— SEEPRIMES is missing an input ‘n’ in the stack effects.

...lee

I wondered if there is any performance difference between the old version and the new version.

Share on other sites
1 minute ago, TheBF said:

I wondered if there is any performance difference between the old version and the new version.

There surely is. The two-word manipulations alone would insure that. But, you are right, I should test it.

...lee

Share on other sites
1 minute ago, Lee Stewart said:

There surely is. The two-word manipulations alone would insure that. But, you are right, I should test it.

...lee

If you mean OVER OVER, maybe you have 2DUP.  ?

Share on other sites
1 minute ago, TheBF said:

If you mean OVER OVER, maybe you have 2DUP.  ?

No, I mean double-word math, testing for carry, etc.

...lee

Share on other sites
Just now, Lee Stewart said:

No, I mean double-word math, testing for carry, etc.

...lee

Sorry we are cross paths.  I was referring to the sieve benchmark.  Didn't make it clear.

Share on other sites
4 minutes ago, TheBF said:

Sorry we are cross paths.  I was referring to the sieve benchmark.  Didn't make it clear.

Are you referring to testing fbForth 2.0:A13 against 2.0:12? Otherwise, I really am lost.

...lee

Share on other sites
1 minute ago, Lee Stewart said:

Are you referring to testing fbForth 2.0:A13 against 2.0:12? Otherwise, I really am lost.

...lee

Yes. Running the sieve benchmark on both versions of FbForth.

Perhaps nothing in that area has changed, but I recall 2.0:12 being significantly slower than my system and Turbo Forth.

Share on other sites
1 minute ago, TheBF said:

Yes. Running the sieve benchmark on both versions of FbForth.

Perhaps nothing in that area has changed, but I recall 2.0:12 being significantly slower than my system and Turbo Forth.

Yeah, I will try it with each version, with and without operation of the Forth ISR.

...lee

Share on other sites

Per @atrax27407’s request, here is the RPK file for the beta A13 build:

...lee

Share on other sites

I have been AWOL for a while working on my version of DSR Link code.  In the process I found two small things that can be "improved" in insanemultitasker version that was

my starting place.

1.  Removing a MOV and a SRL  by using the empty R6 register to read and copy into the NAMBUF

Oops.  This was an mistake.

2.  Moving the init R2 to >4000 outside of the scanning loop. It is invariant in the loop so no need to set it every time.

This is still valid

I put the changes into the TI ASM code along with comments of how I understood this thing.

This code is not tested and as noted I don't use the conventional Assembler much anymore so treat this as faulty syntax but the changes have been shown to work in my Forth Assembler version.

This stuff is probably common knowledge for many but perhaps the comments will be helpful for another newbie somewhere down the road.

🙁

Share on other sites

Lee we had discussed creating faster dictionary searches using a hashing method like PolyForth.  I found this book by Dr. Ting called inside Forth83, which is a fantastic little Forth for DOS by Laxen & Perry.

It used a 4 way hash on the dictionary which is described herein.

Here is the link to the book: http://www.forth.org/OffeteStore/1003_InsideF83.pdf

Page 73 shows real code on how they Laxen and Perry did it.

It is quite amazing what happens when this is applied.  HsForth for DOS which I use to for my cross-compiler had an add-on that hashed the dictionary and compile times were almost 10X faster.

I don't use it these days because it takes extra dictionary memory but it was great for 4.7MHz PCs.

In this case we could expect something on the order or 3..4 times I suspect.

I am working on how I could make this work with Camel Forth. Not sure I can fit it all in an 8K kernel, but I am playing with concepts in situ.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.