Jump to content
Lee Stewart

fbForth—TI Forth with File-based Block I/O [Post #1 UPDATED: 11/16/2019]

Recommended Posts

6 hours ago, InsaneMultitasker said:

I believe the DSRLNK in question starts the scan at 0x1200 [R12 is first set to 0x1100 and 0x0100 is added to R12 before the test starts]  then loops back around to 0x1000 [R12 set to 0x0f00 then 0x0100 is added to R12] , thus skipping the floppy controller in favor of higher CRU bases. This was also done to hit the HFDC and other devices at 0x1000 before the floppy controller.  There are programs where changing the first scanned address was beneficial long ago.  If this is indeed the DSRLNK version I shared long ago, I probably had forgotten about this modification.  Good eagle eye. 

 

Thanks. Nice to know some history.  I have another question.

I see things being saved but then not restored or used from the saved locations.

Any ideas on the thinking behind that?

 

 

Share this post


Link to post
Share on other sites
10 hours ago, Lee Stewart said:

 

Spoiler

HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
: SQRTUD ( ud -- n ) 
   DUP >R         \ MSW to return stack
   IF
      2 SRL       \ /4 if MSW > 0
      4000 +      \ correct for missing MSW
   THEN
   >R             \ loop limit to return stack
   -1 -1          \ index and root starts
   BEGIN        
      2+ DUP      \ add 2 to root and dup
      ROT + DUP   \ root+index and dup
      R U<        \ index < limit?
   WHILE        
      SWAP        \ yes..reorder index and root
   REPEAT         \ next round
   R> DROP DROP   \ drop limit and index
   1 SRL          \ correct root (small now..no need for 32-bit arithmetic)
   R> IF
      1 SLA       \ *2 to correct root for initial /4
   THEN
;
DECIMAL

 

 

 

OK—here is the Forth Assembler version of the above unsigned integer square root word, SQRTUD :

HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
ASM: SQRTUD   ( ud -- n )
   2 @(SP) R1 MOV,   \ LSW to R1 as loop limit
   *SP+ R0 MOV,      \ pop MSW to R0
   NE IF,            \ MSW not 0?
      R1 2 SRL,      \ loop limit /4
      R1 4000 AI,    \ add shifted bit from MSW
   THEN,
   R2 SETO,          \ set running root to -1
   R3 SETO,          \ set loop index to -1
   BEGIN,            \ root-building loop
      R2 INCT,       \ running root+2
      R2 R3 A,       \ add running root to index
      R3 R1 C,       \ compare index to limit
   HE UNTIL,         \ leave loop when logical higher or equal
   R2 1 SRL,         \ /2 to correct root
   R0 0 CI,          \ MSW 0?
   NE IF,            \ no
      R2 1 SLA,      \ *2 to correct root for /4
   THEN,
   R2 *SP MOV,       \ root to stack
;ASM
DECIMAL

I will post the machine code version later.

 

...lee

Share this post


Link to post
Share on other sites

And—the machine code version of SQRTUD :

HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
CODE: SQRTUD   ( ud -- n )
   C069 0002 C039 1303 0921 0221 4000 0702 0703 05C2
   A0C2 8043 1AFC 0912 0280 0000 1301 0A12 C642 
;CODE   
DECIMAL

 

I probably should call this word by a different name because of its specialized nature. It will only work with 1 in the MSW of the double number input.

 

...lee

Share this post


Link to post
Share on other sites

For those wanting to test the beta version of build 13, here is fbForth 2.0:A13

 

fbForth200_A13_20200922.zip

 

Included are inverted (ends in “9.bin”), non-inverted (ends in “8.bin”) and individual bank (end in “b0.bin” – “b3.bin”) binaries, as well as the current FBLOCKS file. The individual binaries are bank-switched in inverted order, so, if you need to assemble your own composite binary, the programmed (inverted) order is b0, b1, b2, b3. Reversing this order to b3, b2, b1, b0, makes the composite binary, effectively, non-inverted—clear as mud, right? :ponder:

 

...lee

Share this post


Link to post
Share on other sites
7 hours ago, Lee Stewart said:

For those wanting to test the beta version of build 13, here is fbForth 2.0:A13

 

fbForth200_A13_20200922.zip 85.74 kB · 0 downloads

 

Included are inverted (ends in “9.bin”), non-inverted (ends in “8.bin”) and individual (end in “b0.bin” – “b3.bin”) binaries, as well as the current FBLOCKS file. The individual binaries are bank-switched in inverted order, so, if you need to assemble your own composite binary, the programmed (inverted) order is b0, b1, b2, b3. Reversing this order to b3, b2, b1, b0, makes the composite binary, effectively, non-inverted—clear as mud, right? :ponder:

 

...lee

Not a clue. :) 

Share this post


Link to post
Share on other sites
On 9/22/2020 at 10:01 PM, Lee Stewart said:

OK—here is the Forth Assembler version of the above unsigned integer square root word, SQRTUD :

Spoiler

HEX
\ 0 <= ud <= 0x1FFFF (13171)
\ d = sqrt(d^2) = sqrt(4*d'^2) = 2*d'
ASM: SQRTUD   ( ud -- n )
   2 @(SP) R1 MOV,   \ LSW to R1 as loop limit
   *SP+ R0 MOV,      \ pop MSW to R0
   NE IF,            \ MSW not 0?
      R1 2 SRL,      \ loop limit /4
      R1 4000 AI,    \ add shifted bit from MSW
   THEN,
   R2 SETO,          \ set running root to -1
   R3 SETO,          \ set loop index to -1
   BEGIN,            \ root-building loop
      R2 INCT,       \ running root+2
      R2 R3 A,       \ add running root to index
      R3 R1 C,       \ compare index to limit
   HE UNTIL,         \ leave loop when logical higher or equal
   R2 1 SRL,         \ /2 to correct root
   R0 0 CI,          \ MSW 0?
   NE IF,            \ no
      R2 1 SLA,      \ *2 to correct root for /4
   THEN,
   R2 *SP MOV,       \ root to stack
;ASM
DECIMAL

 

 

 

I just could not let it go, so here is a full (well, almost full) 32-bit version of SQRTUD in Forth Assembler:

Spoiler
HEX
\ 0 <= ud <= 0xFFFE0001 (4294836225 = 65535^2)
\ Registers:   R0,R1 = udh,udl (loop limit)
\              R2,R3 = idxh,idxl
\              R4,R5 = rooth,rootl (& opening 0 test)
\              R6    = loop flag
ASM: SQRTUD   ( ud -- n )
   
   *SP+ R0 MOV,      \ pop udh to R0
   *SP R1 MOV,       \ udl to R1
   EQ IF,            \ udl = 0?
      R0 R5 MOV,     \ udh = 0, also?
   THEN,
   \ this will work if at least one of R0, R1 <> 0
   NE IF,
      R0 FFFE CI,       \ check for max
      HE IF,            
         R0 FFFE LI,    \ too high or at max..
         R1 0001 LI,    \ ..force max
      THEN,
      R2 CLR,           \ set loop..
      R3 CLR,           \ ..index idx to 0
      R4 CLR,           \ set running..
      R5 0001 LI,       \ ..root to 1
      BEGIN,            \ root-building loop
         R5 INCT,       \ root+2
         OC IF,
            R4 INC,     \ correct rooth
         THEN,
         R5 R3 A,       \ add rootl to idxl
         OC IF,
            R2 INC,     \ correct idxh
         THEN,
         R4 R2 A,       \ add rooth to idxh
         \ compare index to limit
         R6 CLR,        \ assume false
         R2 R0 C,       \ compare idxh & udh
         H IF,
            R6 SETO,    \ idxh > udh. set true
         ELSE,
            EQ IF,
               \ equal - check low words
               R3 R1 C,      \ compare idxl & udl
               HE IF,
                  R6 SETO,   \ idxl >= udl. set true
               THEN,
            THEN,
         THEN,
         R6 INC,        \ if 0, was true, else, false
      EQ UNTIL,         \ leave loop when logical higher or equal
      R5 1 SRL,         \ /2 to correct rootl
      R4 1 SRL,         \ /2 to correct rooth
      OC IF,
         R5 8000 AI,    \ add LSb shifted out of rooth to MSb of rootl
      THEN,
   THEN,
   R5 *SP MOV,       \ root to stack
;ASM
DECIMAL

 

 

I said, “almost full 32-bit,” because to get back a 16-bit answer, the maximum square is limited to 4,294,836,225 (0xFFFE0001). It is very fast except when the MSW is very large. The maximum square takes 4 seconds! If anyone cares, I will post a machine code version tomorrow—might, anyway. |:)

 

...lee

  • Like 2

Share this post


Link to post
Share on other sites

I was going to modify the one I had to use 32 bit operators but knew it was a rabbit hole. :)  

Nice work by you.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites

Lee,

 

Here is something you could try to compare your new and old versions.

 

 

Spoiler
\ Sieve of Erathosenes for FbForth

1 CONSTANT 1
2 CONSTANT 2

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < IF ." Out of memory"  ABORT THEN ;

: SEEPRIMES ( -- )
        CR ." Primes: "
        2 DO
            I HERE + [email protected] 0= IF I . THEN
            ?TERMINAL IF ." Primes halted" ABORT THEN
        LOOP ;

\ byte array uses unallocated memory at HERE
DECIMAL
: PRIMES ( n -- )
        ?MEM
        CR ." Running..."
        HERE OVER ERASE
        1 0 HERE + C!       \ mark as prime like 'C' version
        1 1 HERE + C!
        2                  \ start at 2
        BEGIN
           OVER OVER DUP * >
        WHILE
           DUP HERE + [email protected]  0=
           IF  OVER OVER DUP *
               DO
                  1 I HERE + C!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        CR ." Press ENTER to see primes:" KEY 13 =
        IF   SEEPRIMES   THEN
;

 

 

Share this post


Link to post
Share on other sites
2 hours ago, TheBF said:

I was going to modify the one I had to use 32 bit operators but knew it was a rabbit hole. :)  

Nice work by you.

 

’Twas a bit of a rabbit hole, indeed. The final penny to drop was needing to accommodate the “on carry” test for the first pass through the loop. I could not see anything wrong with my code, but it was going off into the weeds. It finally dawned on me that, in the very first pass for both the running root and the loop index, the carry bit was getting set (-1 to +1 and -1 to 0, respectively).

 

...lee

Share this post


Link to post
Share on other sites
1 hour ago, TheBF said:

Lee,  Here is something you could try to compare your new and old versions.

  Reveal hidden contents

\ Sieve of Erathosenes for FbForth

1 CONSTANT 1
2 CONSTANT 2

HEX
: FREEMEM  ( -- n) FF00 HERE - ;
: ?MEM     ( n -- )  FREEMEM OVER < IF ." Out of memory"  ABORT THEN ;

: SEEPRIMES ( -- )
        CR ." Primes: "
        2 DO
            I HERE + [email protected] 0= IF I . THEN
            ?TERMINAL IF ." Primes halted" ABORT THEN
        LOOP ;

\ byte array uses unallocated memory at HERE
DECIMAL
: PRIMES ( n -- )
        ?MEM
        CR ." Running..."
        HERE OVER ERASE
        1 0 HERE + C!       \ mark as prime like 'C' version
        1 1 HERE + C!
        2                  \ start at 2
        BEGIN
           OVER OVER DUP * >
        WHILE
           DUP HERE + [email protected]  0=
           IF  OVER OVER DUP *
               DO
                  1 I HERE + C!
               DUP +LOOP
           THEN
           1+
        REPEAT
        CR ." Complete."
        CR
        DROP
        CR ." Press ENTER to see primes:" KEY 13 =
        IF   SEEPRIMES   THEN
;

 

 

 

Not sure what you are suggesting I test here. A couple of notes, though: (1) fbForth and TI Forth both have 0, 1, 2, 3 defined as constants in the kernel. (2) Minor nit— SEEPRIMES is missing an input ‘n’ in the stack effects.

 

...lee

  • Like 1

Share this post


Link to post
Share on other sites
13 hours ago, Lee Stewart said:

The maximum square takes 4 seconds!

This kind of got me wondering... if doing it mathematically takes 4 seconds, would a dumb binary search be faster? In theory it should be able to get down to the answer within 33 iterations. It'd be straight-forward... square the current number, and depending on whether the result is bigger or smaller than the target, step the appropriate direction. I might be missing something - code I get, math theory not so much. ;)

 

  • Like 1

Share this post


Link to post
Share on other sites
21 minutes ago, Tursi said:

This kind of got me wondering... if doing it mathematically takes 4 seconds, would a dumb binary search be faster? In theory it should be able to get down to the answer within 33 iterations. It'd be straight-forward... square the current number, and depending on whether the result is bigger or smaller than the target, step the appropriate direction. I might be missing something - code I get, math theory not so much. ;)

 

Like this:

https://codebase64.org/doku.php?id=base:fast_sqrt

  • Like 2

Share this post


Link to post
Share on other sites
1 hour ago, Lee Stewart said:

 

Not sure what you are suggesting I test here. A couple of notes, though: (1) fbForth and TI Forth both have 0, 1, 2, 3 defined as constants in the kernel. (2) Minor nit— SEEPRIMES is missing an input ‘n’ in the stack effects.

 

...lee

I wondered if there is any performance difference between the old version and the new version.

 

Share this post


Link to post
Share on other sites
1 minute ago, TheBF said:

I wondered if there is any performance difference between the old version and the new version.

 

There surely is. The two-word manipulations alone would insure that. But, you are right, I should test it.

 

...lee

Share this post


Link to post
Share on other sites
1 minute ago, Lee Stewart said:

There surely is. The two-word manipulations alone would insure that. But, you are right, I should test it.

 

...lee

If you mean OVER OVER, maybe you have 2DUP.  ?

Share this post


Link to post
Share on other sites
1 minute ago, TheBF said:

If you mean OVER OVER, maybe you have 2DUP.  ?

No, I mean double-word math, testing for carry, etc.

 

...lee

Share this post


Link to post
Share on other sites
Just now, Lee Stewart said:

No, I mean double-word math, testing for carry, etc.

 

...lee

Sorry we are cross paths.  I was referring to the sieve benchmark.  Didn't make it clear.

Share this post


Link to post
Share on other sites
4 minutes ago, TheBF said:

Sorry we are cross paths.  I was referring to the sieve benchmark.  Didn't make it clear.

 

Are you referring to testing fbForth 2.0:A13 against 2.0:12? Otherwise, I really am lost.

 

...lee

Share this post


Link to post
Share on other sites
1 minute ago, Lee Stewart said:

 

Are you referring to testing fbForth 2.0:A13 against 2.0:12? Otherwise, I really am lost.

 

...lee

Yes. Running the sieve benchmark on both versions of FbForth.

Perhaps nothing in that area has changed, but I recall 2.0:12 being significantly slower than my system and Turbo Forth.

 

Share this post


Link to post
Share on other sites
1 minute ago, TheBF said:

Yes. Running the sieve benchmark on both versions of FbForth.

Perhaps nothing in that area has changed, but I recall 2.0:12 being significantly slower than my system and Turbo Forth.

 

Yeah, I will try it with each version, with and without operation of the Forth ISR.

 

...lee

  • Like 1

Share this post


Link to post
Share on other sites

I have been AWOL for a while working on my version of DSR Link code.  In the process I found two small things that can be "improved" in insanemultitasker version that was 

my starting place.

 

1.  Removing a MOV and a SRL  by using the empty R6 register to read and copy into the NAMBUF

Oops.  This was an mistake.

 

2.  Moving the init R2 to >4000 outside of the scanning loop. It is invariant in the loop so no need to set it every time.

This is still valid

 

I put the changes into the TI ASM code along with comments of how I understood this thing.

This code is not tested and as noted I don't use the conventional Assembler much anymore so treat this as faulty syntax but the changes have been shown to work in my Forth Assembler version.

 

This stuff is probably common knowledge for many but perhaps the comments will be helpful for another newbie somewhere down the road. 

 

🙁

  • Like 1

Share this post


Link to post
Share on other sites

Lee we had discussed creating faster dictionary searches using a hashing method like PolyForth.  I found this book by Dr. Ting called inside Forth83, which is a fantastic little Forth for DOS by Laxen & Perry.

It used a 4 way hash on the dictionary which is described herein.

 

Here is the link to the book: http://www.forth.org/OffeteStore/1003_InsideF83.pdf

 

Page 73 shows real code on how they Laxen and Perry did it.

 

It is quite amazing what happens when this is applied.  HsForth for DOS which I use to for my cross-compiler had an add-on that hashed the dictionary and compile times were almost 10X faster.

I don't use it these days because it takes extra dictionary memory but it was great for 4.7MHz PCs. :) 

In this case we could expect something on the order or 3..4 times I suspect.

 

I am working on how I could make this work with Camel Forth. Not sure I can fit it all in an 8K kernel, but I am playing with concepts in situ. 

  • Thanks 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...