Jump to content
Lee Stewart

fbForth—TI Forth with File-based Block I/O [Post #1 UPDATED: 04/13/2021]

Recommended Posts

I said I would add the above CF utilities to FBLOCKS ere long; but, perhaps it would be better to put them in their own blocks file—say, CFUTILS. What do you think?

 

...lee

Share this post


Link to post
Share on other sites

I am back on revising the fbForth 2.0 Manual and am going around in circles trying to explain a set of words for Forth Assembler that contain one repurposed word, ;CODE and several new words. I do not apologize for the complexity of many of the glossary entries because I want the programmer to have as much information as possible in one place—I am not getting any younger! However, in this particular case, my pile of discarded revisions keeps getting bigger. It finally occurred to me that I am approaching this situation in the wrong way. My quandary results from my including the TI Forth format for backward compatibility while also explaining the new, fbForth 2.0 format for cleaner code when, perhaps, I should be deprecating the TI Forth format and explaining (maybe in footnotes) that the TI Forth format is still supported, but refer the programmer to the TI Forth Manual for the details.

 

Here are the relevant constructs in fbForth 2.0 followed by their equivalents (roughly) in TI Forth, where alc is Assembly Language Code and mc is Machine Code:

 

fbForth 2.0 TI Forth

-------------------------------------------------------------------------- ----------------------------------------------------------------------

ASM: <newword> [<alc> ...] ;ASM CODE <newword> [<alc> ...] NEXT,

CODE: <newword> [<mc> ...] ;CODE CODE <newword> [[<mc> , ] ...] NEXT,

: <newword> <BUILDS ... DOES>ASM: [<alc> ...] ;ASM : <newword> <BUILDS ... ;CODE [<alc> ...] NEXT,

: <newword> <BUILDS ... DOES>CODE: [<mc> ...] ;CODE : <newword> <BUILDS ... ;CODE [[<mc> , ] ...] NEXT,

 

Any suggestions?

 

...lee

Share this post


Link to post
Share on other sites

Referencing the old manual instead of duplicating it seems completely reasonable. Especially if there is a better copy of this ( http://ftp.whtech.com/programming/Forth/TexSoft-TIForth.pdf) around.

 

It would be easier for someone to pick up if the deprecated ( supported but not recommended ) words were maybe only referenced in a list/addendum, but not covered in the primary text that explains the features of fbForth. Backward compatibility doesn't include forward compatibility. I have to argue that at work sometimes... This is your mission, but my perception is: fbForth supports TI Forth constructs for those historic pieces of code, but it supports newer and better features for new development.

 

[email protected]

Share this post


Link to post
Share on other sites

Look in the Development Resources pinned thread. My edition of that manual is there. That was what I did before embarking on my fbForth journey.

 

...lee

  • Like 2

Share this post


Link to post
Share on other sites
While reacting to another thread about @TheBF’s CAMEL99 Forth development (Defining CHARs with binary numbers), I realized that fbForth 2.0’s VCHAR was not what I intended. I decided to also look at HCHAR and did not like what I saw. What I originally intended was to duplicate TI Forth’s implementation of both. I succeeded with HCHAR but not with VCHAR . See the table below for comparisons of their implementation in several TI-99/4A languages (EOS = End Of Screen):



<--------------------------EOS Result----------------------------->

Language VCHAR HCHAR

----------------- --------------------------------- -------------------------------

TI Basic Continues to EOS, wraps, incre- Continues to EOS, wraps, incre-

ments column and fills screen ments row and fills screen

TI Extended Basic Continues to EOS, wraps, incre- Continues to EOS, wraps, incre-

ments column and fills screen ments row and fills screen

TI Forth Writes char once, corrects to Fills VRAM with n copies of

screen at >0000 for next write, char without regard to EOS

continues to EOS, wraps, incre-

ments column and fills screen

fbForth 2.0 Wraps to top and fills column Fills VRAM with n copies of

char without regard to EOS

TurboForth Writes char once, corrects by one Fills VRAM with n copies of

screen for successive writes char without regard to EOS

until within screen space, con-

tinues to EOS, wraps, incre-

ments column and fills screen

CAMEL99 Forth Stops at bottom of screen column Not yet seen code--probably

stops at right of screen row


This is not the end of the story, however. The TI authors of TI Forth probably intended HCHAR to also duplicate TIB and XB’s implementation, but obviously failed. Though we Forth developers tend to trust that the Forth programmer is aware of the limitations of each Forth word, that is a bit much to expect with these TI-99/4A graphics words, I think. The expectation that the behavior of the earlier TI languages (TIB and XB) would be followed or, at the very least, the EOS would be respected, seems reasonable to me. So, rather than duplicate TI Forth, I intend to correct in the next build of fbForth 2.0 not only the behavior of VCHAR but also to change the behavior of HCHAR to conform to that of TIB and XB.


...lee


[Edits in this color.]

Share this post


Link to post
Share on other sites

HCHAR and VCHAR behaviour in TF conform to TIB/XB. I seem to remember VCHAR, whilst being much faster than GPL was disappointingly slow and the code not pretty. If you have ideas for improving I'd be interested.

 

http://turboforth.net/source/Bank1/1-03-Graphics.html

 

Not entirely true. (BTW, a portion of that HTML page is impossible to read. I think your text-to-html conversion program needs a tweak. Fortunately, I have your source code.)

 

If the computed screen position of the first character is outside screen space, VCHAR happily writes the character at screen-size intervals until it gets within screen space, then wrapping until the character count is satisfied. TIB and XB will throw errors if the first character position is outside screen space.

 

HCHAR does not check at all for where in VRAM it is writing. TIB and XB throw errors outside screen space. They also wrap the output to the top of screen space. TI Forth does the same as TurboForth, as does fbForth 2.0 currently.

 

Though the screen usually starts at VRAM = >0000, it can start at any 1KiB boundary, TI Forth developers at TI were pretty careful about using SCRN_START , SCRN_WIDTH and SCRN_END to find and manipulate screen space. They were careful in this regard in VCHAR until the code that checks and corrects for writing to non-screen space. Just as with TurboForth and fbForth 2.0, the first character instance is written regardless of whether it happens to land within screen space. Unlike TurboForth, TI Forth does “ char_pos scrn_size /MOD ” for the next character position, which immediately brings the character position within screen space—but, only if the screen starts at VRAM = >0000, rendering useless the developers’ care in calculating the screen location. They might as well have ignored SCRN_START and saved some code. [Edit: I was wrong about the /MOD operation. It works just fine because it is working on a relative screen position. It is corrected by adding SCRN_START before writing the byte.]

 

Re improving the ALC for VCHAR , I am not sure anything short of inlining the code for _vsbw will be any faster than what you have.

 

...lee

Share this post


Link to post
Share on other sites

 

Re improving the ALC for VCHAR , I am not sure anything short of inlining the code for _vsbw will be any faster than what you have.

 

 

The only thing I can think of is to pre-determine if a line will span more than one column. If not, call a fast tight-loop character plot routine, if yes, call a slower routine which checks for end of screen. Is it worth it though...?

Share this post


Link to post
Share on other sites

 

The only thing I can think of is to pre-determine if a line will span more than one column. If not, call a fast tight-loop character plot routine, if yes, call a slower routine which checks for end of screen. Is it worth it though...?

 

I am not sure it is. I will definitely cogitate on it. I probably should get back to revising the fbForth 2.0 manual, however. I am getting sidetracked as usual!

 

...lee

  • Like 1

Share this post


Link to post
Share on other sites

Three posts back, I was wrong about the /MOD correction for the screen position only working for SCRN_START = 0. See that post for my comment and strikeouts.

 

...lee

Share this post


Link to post
Share on other sites

 

 

Re improving the ALC for VCHAR , I am not sure anything short of inlining the code for _vsbw will be any faster than what you have.

 

...lee

 

When I looked at it I thought it was pretty optimal as well.

 

Here are some timings that I did today to get a Forth version to run a little faster.

First a reference in BASIC

10 CALL CLEAR
20 FOR C=33 TO 83
30 CALL VCHAR(1,1,C,768)	
40 NEXT C
50 REM runtime: 63 secs

Forth Version of the 50 iteration test ran on Turbo Forth in 5.5 secs.

: TEST
         PAGE
         83 33 DO
         0 0 I 768 VCHAR
         LOOP ;

My first version Forth version of VCHAR based on the Turbo Forth algorithm is below

and it ran "TEST" in 23 seconds. 2.5x BASIC speed but 1/4 of TF speed.

VARIABLE T
: VCHAR  ( x y char cnt -- ) ( parameter order not ideal so we shuffle)
         C/SCR @ 1-  VLIM !  ( chars / screen less 1 goes to VLIM)
         >R >R               ( -- x y ) ( push char & cnt to rstack)
         >VPOS               ( -- vdpaddr)  ( calc the Video position in memory)
         R> SWAP             ( -- char vadr) ( get the char and reverse order)
         R> 0                ( -- char vadr cnt index) ( all that crap to get this)
         ?DO                 ( -- char vadr) ( let 'er rip)
            2DUP VC!         ( write char to video memory)
            C/[email protected] +
            DUP T @ >
            IF  T @ - THEN
         LOOP
         2DROP ;

Then I wrote the VWRAP to do the increment and compare the variable in ASM so the final version looks like this

and that runs in 13 seconds so 2.4x slower than TF but 4.8 times faster than BASIC.

 

I also like the idea of telling the programmer they have put in too many bytes.

I cannot think of a reason why VCHAR needs to accept 32K bytes for a little screen.

I suppose if it wrote pixels it would make sense, but it's way to slow for that.

: VCHAR  ( x y char cnt -- ) ( parameter order not ideal so we shuffle)
         DUP ?EOSCR
         C/SCR @ 1-  VLIM !  ( chars / screen less 1 goes to VLIM)
         >R >R               ( -- x y ) ( push char & cnt to rstack)
         >VPOS               ( -- vdpaddr)  ( calc the Video position in memory)
         R> SWAP             ( -- char vadr) ( get the char and reverse order)
         R> 0                ( -- char vadr cnt index) ( all that crap to get this)
         ?DO                 ( -- char vadr) ( let 'er rip)
            2DUP VC!         ( write char to video memory)
            VWRAP            ( This 5 instruction word doubles the speed)
         LOOP
         2DROP ;

*EDIT* It sucks when your humanity shows.

Apr 13

 

I had an error in all these VCHARs that caused them to write one line less than they should have so these times are light by 4%.

But the magnitudes are in the right order.

Edited by TheBF

Share this post


Link to post
Share on other sites

After sleeping on it an alternative algorithm might be to read the entire screen into a buffer.

VCHAR into the buffer and write buffer back to VDP RAM.

 

This might be faster for big values and slower for small ones. (?)

Needs to be tested

 

BF

Share this post


Link to post
Share on other sites

After sleeping on it an alternative algorithm might be to read the entire screen into a buffer.

VCHAR into the buffer and write buffer back to VDP RAM.

 

This might be faster for big values and slower for small ones. (?)

Needs to be tested

 

BF

 

The following code shows that, in high-level fbForth, one cycle of reading the text screen (960 bytes) to a RAM buffer and writing it back to VRAM takes ~60 ms:

 

0 VARIABLE SBUF 958 ALLOT
: SCRWRT ( n --- )
0 DO
0 SBUF 960 VMBR
SBUF 0 960 VMBW
LOOP ;
1000 SCRWRT ok:0
A crossover character count could be determined and that number could be used in VCHAR to decide which way to go. Though, I suppose it might not be worth the effort, considering that there are different screen image sizes to manage—768, 960 and 1920 for Graphics, 40-column Text and 80-Column Text, respectively.
...lee

Share this post


Link to post
Share on other sites

OK—Here, in the following spoiler, are my current versions of VCHAR and HCHAR . They conform completely with the TIB and XB versions in that each throws an error if the starting screen position is outside screen space and each one wraps to the beginning of the screen if the end of screen is reached before the character count is satisfied. Please note that “S:” and “R:” in the comments precede the contents of the parameter stack and return stack, respectively:

 

 

 

: VCHAR ( x y cnt ch --- )
SWAP >R >R \ S:x y R:cnt ch
SCRN_WIDTH @ * + \ spos = y * scrn_width + x S:spos R:cnt ch
SCRN_END @ SCRN_START @ - 1- \ max_spos = scrn_end - scrn_start -1
OVER OVER \ S:spos max_spos spos max_spos R:cnt ch
> 0 ?ERROR \ abort if spos not within screen
R> ROT ROT R> \ S:ch spos max_spos R:cnt
0 DO ( cnt 0 DO) \ S:ch spos max_spos
>R \ S:ch spos R:max_spos
OVER OVER \ S:ch spos ch spos R:max_spos
SCRN_START @ + \ vaddr = scrn_start + spos S:ch spos ch vaddr R:max_spos
VSBW \ put ch onscreen S:ch spos R:max_spos
SCRN_WIDTH @ + \ inc spos to next row S:ch spos R:max_spos
DUP R > \ spos > max_spos?
IF \ S:ch spos
R - \ adjust spos back by max_spos
THEN \ S:ch spos
R> \ clean up return stack for loop S:ch spos max_spos
LOOP
DROP DROP DROP ;
: HCHAR ( x y cnt ch --- )
>R >R \ S:x y R:ch cnt
SCRN_WIDTH @ * + \ spos = y * scrn_width + x S:spos R:cnt ch
SCRN_START @ + \ S:spos1 R:ch cnt
DUP SCRN_END @ - 1+ \ S:spos1 spos1-scrn_end+1 R:ch cnt
0> 0 ?ERROR \ abort if spos1 not within screen
SCRN_END @ OVER - R - \ S:spos1 scrn_end-spos1-cnt R:ch cnt
0< \ S:spos1 scrn_end-spos1-cnt flag R:ch cnt
IF
SCRN_END @ OVER - \ S:spos1 cnt1 R:ch cnt
R> OVER - \ S:spos1 cnt1 cnt2 R:ch
SCRN_START @ SWAP \ S:spos1 cnt1 spos2 cnt2 R:ch
R VFILL \ S:spos1 cnt1 R:ch
R> VFILL
ELSE
R> R> \ S:spos1 cnt ch
VFILL
THEN ;

 

 

 

The “stackrobatics” in the above words will not be part of the ALC versions, thankfully!

 

...lee

Share this post


Link to post
Share on other sites

I am truly a heretic. I calculate if the results will be off screen before doing anything to the screen.

I know that's not like BASIC but it is way faster.

 

?EOSCR ABORTs with a message if the TOS is greater than the last VDP address for the mode.

 

 

: HCHAR ( col row char cnt -- ) ( parameter order not ideal so we shuffle)
SWAP >R >R ( swap char & cnt, push to return stack)
>VPOS ( -- vdpaddr )
R> 2DUP + ?EOSCR ( bring back count add to Vadr and see if it's too many)
R> VFILL ; ( bring back char and FILL Video memory)

 

Edited by TheBF

Share this post


Link to post
Share on other sites

OK—Here, in the following spoiler, are my current versions of VCHAR and HCHAR . They conform completely with the TIB and XB versions in that each throws an error if the starting screen position is outside screen space and each one wraps to the beginning of the screen if the end of screen is reached before the character count is satisfied. Please note that “S:” and “R:” in the comments precede the contents of the parameter stack and return stack, respectively:

 

 

 

: VCHAR ( x y cnt ch --- )
SWAP >R >R \ S:x y R:cnt ch
SCRN_WIDTH @ * + \ spos = y * scrn_width + x S:spos R:cnt ch
SCRN_END @ SCRN_START @ - 1- \ max_spos = scrn_end - scrn_start -1
OVER OVER \ S:spos max_spos spos max_spos R:cnt ch
> 0 ?ERROR \ abort if spos not within screen
R> ROT ROT R> \ S:ch spos max_spos R:cnt
0 DO ( cnt 0 DO) \ S:ch spos max_spos
>R \ S:ch spos R:max_spos
OVER OVER \ S:ch spos ch spos R:max_spos
SCRN_START @ + \ vaddr = scrn_start + spos S:ch spos ch vaddr R:max_spos
VSBW \ put ch onscreen S:ch spos R:max_spos
SCRN_WIDTH @ + \ inc spos to next row S:ch spos R:max_spos
DUP R > \ spos > max_spos?
IF \ S:ch spos
R - \ adjust spos back by max_spos
THEN \ S:ch spos
R> \ clean up return stack for loop S:ch spos max_spos
LOOP
DROP DROP DROP ;
: HCHAR ( x y cnt ch --- )
>R >R \ S:x y R:ch cnt
SCRN_WIDTH @ * + \ spos = y * scrn_width + x S:spos R:cnt ch
SCRN_START @ + \ S:spos1 R:ch cnt
DUP SCRN_END @ - 1+ \ S:spos1 spos1-scrn_end+1 R:ch cnt
0> 0 ?ERROR \ abort if spos1 not within screen
SCRN_END @ OVER - R - \ S:spos1 scrn_end-spos1-cnt R:ch cnt
0< \ S:spos1 scrn_end-spos1-cnt flag R:ch cnt
IF
SCRN_END @ OVER - \ S:spos1 cnt1 R:ch cnt
R> OVER - \ S:spos1 cnt1 cnt2 R:ch
SCRN_START @ SWAP \ S:spos1 cnt1 spos2 cnt2 R:ch
R VFILL \ S:spos1 cnt1 R:ch
R> VFILL
ELSE
R> R> \ S:spos1 cnt ch
VFILL
THEN ;

 

 

 

The “stackrobatics” in the above words will not be part of the ALC versions, thankfully!

 

...lee

 

Lee I notice that TI Forth uses the SCRN_START variable.

I have one I called VTOP but it seems a waste of space.

Is there ever a case where the screen has to be moved from starting at VDP address >0?

 

BF

Share this post


Link to post
Share on other sites

 

Lee I notice that TI Forth uses the SCRN_START variable.

I have one I called VTOP but it seems a waste of space.

Is there ever a case where the screen has to be moved from starting at VDP address >0?

 

BF

 

It is different for Bitmap, Split1 and Split2 modes. Otherwise, it is at the whim of the programmer.

 

Attached is a PDF of the spreadsheet I composed to help me keep track of things back when I converted the graphics primitives from high-level Forth to ALC. I believe everything is correct:

 

GraphicsModes.pdf

 

...lee

Share this post


Link to post
Share on other sites

So my algorithm is the same as yours only you are using the Return stack and I created a variable.

Since I am going to write this up with some training wheels for TI-BASIC programmers who want to try

Forth, I am going to keep it in hi-level Forth. so people can see and like TI-BASIC it will only

work in Graphics mode and also TEXT mode.

 

What really shocked me was when I timed TI-BASIC in a loop doing

CALL VCHAR(1,1,42,768) which would mean most of the time was spent in GPL,

the Forth only version was still about 2 times faster. (4x using VWRAP in ASM)

 

It makes me wonder what somebody at TI was thinking when they wrote GPL.

Error checking perhaps?

VARIABLE VLIM
: VCHAR  ( x y char cnt -- ) ( parameter order not ideal so we shuffle)
         C/SCR @ 1- VLIM !   
         >R >R               ( -- x y ) ( push char & cnt to rstack)
         >VPOS               ( -- vdpaddr)  ( calc the Video position in memory)
         R> SWAP             ( -- char vadr) ( get the char and reverse order)
         R> 0                ( -- char vadr cnt index) ( all that crap to get this)
         ?DO                 ( -- char vadr) ( let 'er rip)
            2DUP VC!         ( write char to video memory)
            C/[email protected] +
            DUP VLIM @ >     ( vertical wrap)
            IF  VLIM @ -
            THEN
         LOOP
         2DROP ;

By the way, do you use OVER OVER very much? CAMEL Forth uses 2DUP and 2DROP in the

number conversion routines so I wrote them in assembler rather than Forth as the released versions do.

 

Trying to keep up to Willsy's speed without coding everything in ASM. :)

CODE: 2DUP   ( n1 n2 -- n1 n2 n1' n2' )
             *SP W MOV,            \ 18 copy n1
              SP -4 ADDI,          \ 14
              TOS 2 (SP) MOV,      \ 22 copy n2 onto stack
              W *SP MOV,           \ 18 push onto new stack
              NEXT,                \
              END-CODE             \ 71

CODE: 2DROP   ( n n -- )
              SP INCT,             \ 10
              TOS POP,             \ 22
              NEXT,
              END-CODE
Edited by TheBF
  • Like 1

Share this post


Link to post
Share on other sites

How does this compare with your 2DROP ?:

 

ASM: 2DROP ( n n -- )

*SP+ *SP+ C, \ pop 2 cells from stack

;ASM

 

...lee

 

H-m-m-m...I just looked at the instruction timings and this uses 30 clock cycles and 7 memory accesses. Of course, it is even worse because the indirection accesses 8-bit memory twice! It does save 2 instruction bytes over the following, faster code, which uses only 20 clock cycles and 6 memory accesses and all on 16-bit RAM:

 

ASM: 2DROP ( n n -- )

SP INCT, \ pop 1 cell from stack

SP INCT, \ pop 1 cell from stack

;ASM

 

Why do you need to end with “ TOS POP, ”? Is there other housekeeping to be done? All I need to do in fbForth 2.0 is modify the address in the SP register. That is all TurboForth needs to do, as well.

 

...lee

 

[Edit in this color.]

Share this post


Link to post
Share on other sites

 

H-m-m-m...I just looked at the instruction timings and this uses 30 clock cycles and 7 memory accesses. Of course, it is even worse because the indirection accesses 8-bit memory twice! It does save 2 instruction bytes over the following, faster code, which uses only 20 clock cycles and 6 memory accesses and all on 16-bit RAM:

 

ASM: 2DROP ( n n -- )

SP INCT, \ pop 1 cell from stack

SP INCT, \ pop 1 cell from stack

;ASM

 

Why do you need to end with “ TOS POP, ”? Is there other housekeeping to be done? All I need to do in fbForth 2.0 is modify the address in the SP register. That is all TurboForth needs to do, as well.

 

...lee

 

[Edit in this color.]

 

 

Because I am cacheing the top of stack in R4.

It gives you good speedup on many Forth primitives especially words that don't change the stack.

The downside is as you see, you have cleanup after yourself and refill the cache register sometimes.

 

B

Share this post


Link to post
Share on other sites

 

 

Because I am cacheing the top of stack in R4.

It gives you good speedup on many Forth primitives especially words that don't change the stack.

The downside is as you see, you have cleanup after yourself and refill the cache register sometimes.

 

B

 

You can see the primitives I am using here Lee:

 

https://github.com/bfox9900/CAMEL99/blob/master/9900FAST.HSF

 

B

Share this post


Link to post
Share on other sites

 

H-m-m-m...I just looked at the instruction timings and this uses 30 clock cycles and 7 memory accesses. Of course, it is even worse because the indirection accesses 8-bit memory twice! It does save 2 instruction bytes over the following, faster code, which uses only 20 clock cycles and 6 memory accesses and all on 16-bit RAM:

 

ASM: 2DROP ( n n -- )

SP INCT, \ pop 1 cell from stack

SP INCT, \ pop 1 cell from stack

;ASM

 

Why do you need to end with “ TOS POP, ”? Is there other housekeeping to be done? All I need to do in fbForth 2.0 is modify the address in the SP register. That is all TurboForth needs to do, as well.

 

...lee

 

[Edit in this color.]

 

And if you are taking extra space for 2 INCT instructions, ADDI is faster no?

 

B

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...