Jump to content
IGNORED

BASIC speed


TheBF

Recommended Posts

I have never written a BASIC compiler or even an interpreter but I can understand how it would have lots of overhead to make it safe in the way that BASIC is designed to be. I still have grudges with TI over how they implemented the TI-BASIC and XB languages in GPL. Soooo slow. I spent so many hours in the '80s trying to make it go faster.


I found this video on youtube demonstrating the speed-up you get using a compiler for XB and it would have made me happy many years ago.





I am in the middle of writing a version of CAMEL Forth for the TI-99 and I wondered how it would compare on this simple test.

Here is the video I made using version .5 of the system.


I shows a bit more of what this ancient platform could have done if the engineers would have been free to make it properly.

Camel Forth V.5 Demo.mov

  • Like 1
Link to comment
Share on other sites

Cool. I am new here but I thought you might be on this site.

I used to love XB. I thought is was just great until I wrote something one day and showed my sister in law.

She said "Why is it so slow?" She was comparing it to the Commodore 64. I was P.O. ed. :-)

 

Nice to make your acquaintance. I was looking for stuff on youtube when I found your video.

Made me curious about how my code compared... of course.

 

It's got me thinking about putting a BASIC wrapper on top of Forth to make something more

palatable for people. It's been done in the past on other machines and it should go pretty fast.

 

So much code, so little time.

 

BF

Link to comment
Share on other sites

FYI, translated to fbForth 2.0, the first takes 5.5 seconds and the second ~0.4 second.

 

...lee

 

Hi Lee,

 

That's interesting. Is FBForth based on TI-Forth? If so then I believe the difference is mostly in the EMIT implementation.

If I recall TI EMIT called the system for some stuff and also provided a proper control key interpreter as well.

My version of EMIT is very sparse. I tried something weird to try and avoid multiplication in calculating the cursor position.

I keep track of the ROW as the VDP address and the column as an offset.

That way I only have to add them together in the word VPOS below so it's pretty quick.

I use multiplication for manually positioning the cursor with AT-XY however

 

I am intending to use this implementation for cross-compiler tutorial so I am trying to keep a lot of HI-level code

with simple support words in Assembler.

: EMIT ( char -- )
    VPOS C/SCR @ = IF SCROLL THEN  \ if we are at last character in the display, scroll
    (EMIT)                          \ put the character on the screen & inc. the column
    VCOL @ C/L @ =                  \ are we at end of line?
    IF (CR) THEN  ;                 \ do carriage return math

BF

Link to comment
Share on other sites

... Is fbForth based on TI-Forth?

 

fbForth is, indeed, based on TI Forth.

 

I believe the difference is mostly in the EMIT implementation.

 

Nope. My implementation of your routines is practically identical to yours. VC! becomes VSBW and [CHAR] becomes ASCII (same as TurboForth). The time difference may be that only the inner interpreter is in scratchpad RAM on the 16-bit bus. If you are running some routines in scratchpad RAM as does TurboForth, your CAMEL99 Forth will be faster.

 

If I recall TI EMIT called the system for some stuff and also provided a proper control key interpreter as well. ...

 

Indeed, EMIT calls the system ALC (in the spoiler below), which handles BEL, BS, LF, CR. Everything else is presumed to be printable code. As noted above, nothing in my implementation of your code uses EMIT .

 

 

 

;[*== EMIT routine CODE = -4 =================
*
EMT EQU $LO+$-LLVSPT
MOV R2,R1 copy char to R1 for VSBW
MOV @$ALTO(U),R0 alternate output device?
JEQ EMIT0 jump to video display output if not
*
* R0 now points to PAB for alternate output device, the one-byte buffer
* for which must immediately precede its PAB. PAB must have been set up
* to write one byte.
*
CLR R7 ALTOUT active
MOVB R7,@KYSTAT zero status byte
DEC R0 point to one-byte VRAM buffer in front of PAB
SWPB R1 char to MSB
BLWP @VSBW write char to buffer
INCT R0 point to Flag/Status byte
BLWP @VSBR read it
ANDI R1,>1F00 clear error bits without disturbing flag bits
BLWP @VSBW write it back to PAB
AI R0,8 Set up pointer to namelength byte of PAB
MOV R0,@SUBPTR copy to DSR subroutine name-length pointer
BLWP @DSRLNK put 1 byte to device
DATA >8
B @BKLINK return to caller
*
* Output is going to the video display
*
EMIT0 CI R1,7 Is it a bell?
JNE NOTBEL
CLR R2
MOVB R2,@KYSTAT
BLWP @GPLLNK
DATA >0036 Emit error tone
JMP EMEXIT
*
NOTBEL CI R1,8 Is it a backspace?
JNE NOTBS
LI R1,>2000
MOV @CURPO$(U),R0
BLWP @VSBW
JGT DECCUR
JMP EMEXIT
DECCUR DEC @CURPO$(U)
JMP EMEXIT
*
NOTBS CI R1,>A Is it a line feed?
JNE NOTLF
MOV @$SEND(U),R7
S @$SWDTH(U),R7
C @CURPO$(U),R7
JHE SCRLL
A @$SWDTH(U),@CURPO$(u)
JMP EMEXIT
SCRLL MOV LINK,R7
BL @SCROLL
MOV R7,LINK
JMP EMEXIT
*
*** SCROLLING ROUTINE
*
SCROLL EQU $LO+$-LLVSPT
MOV @$SSTRT(U),R0 VRAM addr
LI R1,LINBUF Line buffer
MOV @$SWDTH(U),R2 Count
A R2,R0 Start at line 2
SCROL1 BLWP @VMBR
S R2,R0 One line back to write
BLWP @VMBW
A R2,R0 Two lines ahead for next read
A R2,R0
C R0,@$SEND(U) End of screen?
JL SCROL1
MOV R2,R1 Blank bottom row of screen
LI R0,>2000 Blank
S @$SEND(U),R2
NEG R2 Now contains address of start of last line
MOV LINK,R6
BL @FILL1 Write the blanks
B *R6
*
NOTLF CI R1,>D Is it a carriage return?
JNE NOTCR
CLR R0
MOV @CURPO$(U),R1
MOV R1,R3
S @$SSTRT(U),R1 Adjusted for screen not at 0
MOV @$SWDTH(U),R2
DIV R2,R0
S R1,R3
MOV R3,@CURPO$(U)
JMP EMEXIT
*
NOTCR SWPB R1 Assume it is a printable character
MOV @CURPO$(U),R0
BLWP @VSBW
MOV @$SEND(U),R2
DEC R2
C R0,R2
JNE NOTCR1
MOV @$SEND(U),R0
S @$SWDTH(U),R0 Was last char on screen. Scroll
MOV R0,@CURPO$(U)
JMP SCRLL
NOTCR1 INC R0 No scroll necessary
MOV R0,@CURPO$(U)
*
EMEXIT B @BKLINK
;]*

 

 

 

...lee

Link to comment
Share on other sites

 

We all know the only thing people used C64 BASIC for was poking in assembly programs, so it was not a fair comparison. :-) Rewrite it in 9900 assembly and see how it compares then.

I have no experience with C64 BASIC, but the VIC20 BASIC was much faster than TI BASIC. The kid who lived next door had a VIC 20 and when he saw the TI99 his first comment was "why is it so slow?"

Link to comment
Share on other sites

 

fbForth is, indeed, based on TI Forth.

 

 

Nope. My implementation of your routines is practically identical to yours. VC! becomes VSBW and [CHAR] becomes ASCII (same as TurboForth). The time difference may be that only the inner interpreter is in scratchpad RAM on the 16-bit bus. If you are running some routines in scratchpad RAM as does TurboForth, your CAMEL99 Forth will be faster.

 

 

Indeed, EMIT calls the system ALC (in the spoiler below), which handles BEL, BS, LF, CR. Everything else is presumed to be printable code. As noted above, nothing in my implementation of your code uses EMIT .

 

 

 

;[*== EMIT routine CODE = -4 =================
*
EMT EQU $LO+$-LLVSPT
MOV R2,R1 copy char to R1 for VSBW
MOV @$ALTO(U),R0 alternate output device?
JEQ EMIT0 jump to video display output if not
*
* R0 now points to PAB for alternate output device, the one-byte buffer
* for which must immediately precede its PAB. PAB must have been set up
* to write one byte.
*
CLR R7 ALTOUT active
MOVB R7,@KYSTAT zero status byte
DEC R0 point to one-byte VRAM buffer in front of PAB
SWPB R1 char to MSB
BLWP @VSBW write char to buffer
INCT R0 point to Flag/Status byte
BLWP @VSBR read it
ANDI R1,>1F00 clear error bits without disturbing flag bits
BLWP @VSBW write it back to PAB
AI R0,8 Set up pointer to namelength byte of PAB
MOV R0,@SUBPTR copy to DSR subroutine name-length pointer
BLWP @DSRLNK put 1 byte to device
DATA >8
B @BKLINK return to caller
*
* Output is going to the video display
*
EMIT0 CI R1,7 Is it a bell?
JNE NOTBEL
CLR R2
MOVB R2,@KYSTAT
BLWP @GPLLNK
DATA >0036 Emit error tone
JMP EMEXIT
*
NOTBEL CI R1,8 Is it a backspace?
JNE NOTBS
LI R1,>2000
MOV @CURPO$(U),R0
BLWP @VSBW
JGT DECCUR
JMP EMEXIT
DECCUR DEC @CURPO$(U)
JMP EMEXIT
*
NOTBS CI R1,>A Is it a line feed?
JNE NOTLF
MOV @$SEND(U),R7
S @$SWDTH(U),R7
C @CURPO$(U),R7
JHE SCRLL
A @$SWDTH(U),@CURPO$(u)
JMP EMEXIT
SCRLL MOV LINK,R7
BL @SCROLL
MOV R7,LINK
JMP EMEXIT
*
*** SCROLLING ROUTINE
*
SCROLL EQU $LO+$-LLVSPT
MOV @$SSTRT(U),R0 VRAM addr
LI R1,LINBUF Line buffer
MOV @$SWDTH(U),R2 Count
A R2,R0 Start at line 2
SCROL1 BLWP @VMBR
S R2,R0 One line back to write
BLWP @VMBW
A R2,R0 Two lines ahead for next read
A R2,R0
C R0,@$SEND(U) End of screen?
JL SCROL1
MOV R2,R1 Blank bottom row of screen
LI R0,>2000 Blank
S @$SEND(U),R2
NEG R2 Now contains address of start of last line
MOV LINK,R6
BL @FILL1 Write the blanks
B *R6
*
NOTLF CI R1,>D Is it a carriage return?
JNE NOTCR
CLR R0
MOV @CURPO$(U),R1
MOV R1,R3
S @$SSTRT(U),R1 Adjusted for screen not at 0
MOV @$SWDTH(U),R2
DIV R2,R0
S R1,R3
MOV R3,@CURPO$(U)
JMP EMEXIT
*
NOTCR SWPB R1 Assume it is a printable character
MOV @CURPO$(U),R0
BLWP @VSBW
MOV @$SEND(U),R2
DEC R2
C R0,R2
JNE NOTCR1
MOV @$SEND(U),R0
S @$SWDTH(U),R0 Was last char on screen. Scroll
MOV R0,@CURPO$(U)
JMP SCRLL
NOTCR1 INC R0 No scroll necessary
MOV R0,@CURPO$(U)
*
EMEXIT B @BKLINK
;]*

 

 

 

...lee

 

You are correct Lee. I have NEXT in PAD RAM along with EXIT, DOCOL, ?BRANCH and BRANCH.

When I tested different benchmarks on CAMEL99 without using that speedup, things were 20% slower, so pretty much exactly right with your timing.

 

You are the man.

 

BF

Link to comment
Share on other sites

You are correct Lee. I have NEXT in PAD RAM along with EXIT, DOCOL, ?BRANCH and BRANCH.

When I tested different benchmarks on CAMEL99 without using that speedup, things were 20% slower, so pretty much exactly right with your timing.

 

You are the man.

 

BF

 

That might do it. We'll have to compare code sometime. The fbForth inner interpreter includes NEXT (actually, its ALC label is $NEXT) and the code fields of : [code field label = DOCOL] and EXIT ( ;S in fbForth)

, which are all in scratchpad RAM as are fbForth's workspace registers.

 

?BRANCH and BRANCH are not in scratchpad RAM in fbForth—and, I suppose those two words could be making the difference because they are certainly used extensively in the loops in your code—especially, the first example, which is almost twice as fast in your CAMEL99 Forth.

 

The second example is almost a dead heat because the loop branch only operates ten times instead of the 7680 times in the first example. I wish I could put more code in scratchpad RAM, but that would be a pretty big rewrite. I know Mark put quite a few of the oft-used words there and had to always be aware of the need to save/restore scratchpad space that conflicted with other functions.

 

...lee

Link to comment
Share on other sites

 

I have never written a BASIC compiler or even an interpreter but I can understand how it would have lots of overhead to make it safe in the way that BASIC is designed to be. I still have grudges with TI over how they implemented the TI-BASIC and XB languages in GPL. Soooo slow. I spent so many hours in the '80s trying to make it go faster.
I found this video on youtube demonstrating the speed-up you get using a compiler for XB and it would have made me happy many years ago.
I am in the middle of writing a version of CAMEL Forth for the TI-99 and I wondered how it would compare on this simple test.
Here is the video I made using version .5 of the system.
I shows a bit more of what this ancient platform could have done if the engineers would have been free to make it properly.

 

Hmm RXB doing something even more impressive then this video using XB, hard to beat the speed of this:

 

 

Or if that is not evidence enough here you go:

 

 

Or lastly try this in RXB:

100 CALL CLEAR
110 FOR L=49 TO 57
120 CALL HCHAR(1,1,L)
130 CALL MOVES("VV",767,0,1)
140 NEXT L
150 ! Test the speed is pretty fast.
Edited by RXB
  • Like 1
Link to comment
Share on other sites

 

That might do it. We'll have to compare code sometime. The fbForth inner interpreter includes NEXT (actually, its ALC label is $NEXT) and the code fields of :

 and EXIT ( ;S in fbForth) [code field label =  $SEMIS], which are all in scratchpad RAM as are fbForth's workspace registers.

 

?BRANCH and BRANCH are not in scratchpad RAM in fbForth—and, I suppose those two words could be making the difference because they are certainly used extensively in the loops in your code—especially, the first example, which is almost twice as fast in your CAMEL99 Forth.

 

<snip>

 

 

Ok this is interesting. So with DOCOL and EXIT in scratchpad RAM we are the same.

 

My DO/LOOP primitive code is actually not in scratchpad because it didn't work when I tried it quickly so only BEGIN UNTIL etc and IF/ELSE/THEN ARE getting help from ?BRANCH and BRANCH.

 

My DO/LOOP code borrows from Laxen and Perry via CAMEL Forth and is shown below.

I originally implemented it with the loop index and limit in R13 and R14 but I want to keep them chaste for my multi-tasker.

 

To really get the fastest Forth loops I find a FOR /NEXT implementation like E-forth is best with a simple down counter to zero.

Goes like crazy compared to DO/LOOPS. Even Chuck Moore stopped using DO LOOP but the legacy is too big for the

language to remove it completely.

 

Can you see fewer cycles in this code compared to yours?

(BTW the macros POP, PUSH, RPOP and RPUSH work exactly as expected. I wrote this for Intel first so tried to make the ASM a little bit Forth VM "universal".)

\ Adapted from CAMEL Forth MSP430
\ ; '83 and ANSI standard loops terminate when the boundary of
\ ; limit-1 and limit is crossed, in either direction.  This can
\ ; be conveniently implemented by making the limit 8000h, so that
\ ; arithmetic overflow logic can detect crossing.  I learned this
\ ; trick from Laxen & Perry F83.


\ CAMEL Forth tries to put loop index and limit in registers.
\ We have elected not to do this so we have free registers for
\ a TMS9900 specific, very fast cooperative TASK switcher.

\ NOT using do/loop in registers costs us about 8% slower looping
\ ====================================================================
CODE: <?DO> ( limit ndx -- )
             *SP TOS CMP,        \ compare 2 #s
              @@1 JNE,           \ if they are not the same jump to regular 'do.'  (BELOW)
              IP RPOP,           \ otherwise do a forth 'exit'
              TOS POP,           \ clean the parameter stack
              NEXT,

+CODE: <DO> ( limit indx -- )
@@1:          R0  8000 LI,      \ load "fudge factor" to LIMIT
             *SP+  R0 SUB,      \ LIMIT, compute 8000h-limit "fudge factor"
              R0  TOS ADD,      \ loop ctr = index+fudge
              R0  RPUSH,        \ rpush limit
              TOS RPUSH,        \ rpush index
              TOS POP,          \ refill TOS
              NEXT,
              END-CODE

CODE: <LOOP>
             *RP INC,           \ increment loop
@@2:          @@1 JNO,          \ if no overflow then loop again
              IP INCT,          \ move past (LOOP)'s in-line parameter
              *RP+ *RP+ CMP,    \ RP INC by 4 (1 cell, 30 clocks) Doesn't make much difference to loop speed.
              NEXT,
@@1:         *IP IP ADD,        \ jump back to top of loop  (branch)
              NEXT,
              END-CODE

+CODE: <+LOOP>
              TOS *RP ADD,      \ saving space by jumping into <loop>
              TOS POP,          \ refill TOS, (does not change overflow flag)
              @@2 JMP,
              END-CODE

CODE: I       ( -- n)
              TOS PUSH,            \ making space in TOS slows this down
              *RP TOS MOV,
              2 (RP) TOS SUB,      \ index = loopindex - fudge
              NEXT,
              END-CODE

CODE: J       ( -- n)
              TOS PUSH,
              4 (RP) TOS MOV,       \ outer loop index is on the rstack
              6 (RP) TOS SUB,       \ index = loopindex - fudge
              NEXT,
              END-CODE

CODE: LEAVE
              *RP+ *RP+ CMP,        \ collapse rstack frame in 1 CELL (TMS9900 trick)
               IP RPOP,             \ pop something else to do from the return stack
               NEXT,
               END-CODE

Edited by TheBF
Link to comment
Share on other sites

I STAND CORRECTED.

 

I just wrote a FOR NEXT loop and the speed difference between the above DO LOOP and FOR NEXT is almost non-existent.

 

On an Intel ITC Forth I see a 30% improvement. I am really surprised. But on the 9900 there is only 1 less instruction!

 

Benchmarks are always right.

Link to comment
Share on other sites

My head hurts! :-o

 

I can probably figure out how your macros work, but on the face of it, things look pretty similar.

 

For another thing, I have never used JMP, in TMS9900 Forth Assembler. Perhaps, I will give it a try sometime. The TI Forth Manual never explained how to use any of the jump codes. I have always used the structured assembler constructs from the TI Forth Assembler, which is certainly encouraged in the manual. However, the similar loop code in fbForth 2.0 is all written in straight-up TMS9900 Assembler, so it should not be very difficult to compare. It is in fbForth002_ResidentDictionary.a99, if you want to check on it before I try anything.

 

One note (you may already know this), in fbForth 2.0 (as in TI Forth) the loop crossover is different in the positive and negative directions from what they are in Forth 83.

 

...lee

Link to comment
Share on other sites

My head hurts! :-o

 

I can probably figure out how your macros work, but on the face of it, things look pretty similar.

 

For another thing, I have never used JMP, in TMS9900 Forth Assembler. Perhaps, I will give it a try sometime. The TI Forth Manual never explained how to use any of the jump codes. I have always used the structured assembler constructs from the TI Forth Assembler, which is certainly encouraged in the manual. However, the similar loop code in fbForth 2.0 is all written in straight-up TMS9900 Assembler, so it should not be very difficult to compare. It is in fbForth002_ResidentDictionary.a99, if you want to check on it before I try anything.

 

One note (you may already know this), in fbForth 2.0 (as in TI Forth) the loop crossover is different in the positive and negative directions from what they are in Forth 83.

 

...lee

 

Now my head will have to hurt while I wrap it around the loop cross over implications.

 

My macros are in the code window.

\ PUSH & POP on both stacks
: PUSH,         ( src -- )  SP DECT,  *SP   MOV, ;    \ 10+18 = 28  cycles
: POP,          ( dst -- )  *SP+      SWAP  MOV, ;    \ 22 cycles

: RPUSH,        ( src -- ) RP DECT,  *RP   MOV,  ;
: RPOP,         ( dst -- ) *RP+      SWAP  MOV,  ;

\ this one allows nested subroutine calls. Never really needed it
: CALL,         ( dst -- )   \ total cycles 44 to call,  34 to return
                R11 RPUSH,       \ save R11 on forth return stack
               ( addr) BL,       \ branch & link saves the PC in R11
                R11 RPOP, ;      \ R11 RPOP, is laid down by CALL, in the caller
                                 \ We have to lay it in the code after BL so
                                 \ when we return from the Branch&link, R11 is
                                 \ restored to the original value from the rstack

I copied the jump mechanism for my assembler from Win32Forth. It was always confusing going the other way, from conventional Assembler to Forth assembler so that was my solution.

Sorry it's confusing going the other way. I understand.

 

<time goes by...>

 

So I just peeked into your code and I think the difference in loop speed is the classic space vs speed trade off. Yours is saving lots of space by JMPing to BRANCH every chance you get which is smart on a TI-99.

My BRANCH is far away in scratch pad RAM so I just bit the bullet and took the space.

If I can shoehorn (LOOP) into scratch pad RAM I may have something a little faster again.

 

BF

Edited by TheBF
  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...