Jump to content

Photo

BASIC speed


15 replies to this topic

#1 TheBF OFFLINE  

TheBF

    Moonsweeper

  • 281 posts
  • Location:The Great White North

Posted Tue Feb 21, 2017 10:35 AM

I have never written a BASIC compiler or even an interpreter but I can understand how it would have lots of overhead to make it safe in the way that BASIC is designed to be. I still have grudges with TI over how they implemented the TI-BASIC and XB languages in GPL.  Soooo slow. I spent so many hours in the '80s trying to make it go faster.
 
I found this video on youtube demonstrating the speed-up you get using a compiler for XB and it would have made me happy many years ago. 
 
 
 
I am in the middle of writing a version of CAMEL Forth for the TI-99 and I wondered how it would compare on this simple test.
Here is the video I made using version .5 of the system.
 
I shows a bit more of what this ancient platform could have done if the engineers would have been free to make it properly.

Attached Files



#2 Opry99er OFFLINE  

Opry99er

    Quadrunner

  • 8,246 posts
  • Location:Cookeville, TN

Posted Tue Feb 21, 2017 11:41 AM

That is a video I made to showcase seniorfalcon's compiler several years ago. :)

It is a brilliant utility, and allows for super fast games to be produced in XB.


I look forward to your Forth implementation!!

#3 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,262 posts
  • Location:Silver Run, Maryland

Posted Tue Feb 21, 2017 2:49 PM

FYI, translated to fbForth 2.0, the first takes 5.5 seconds and the second ~0.4 second.

 

...lee



#4 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Tue Feb 21, 2017 3:29 PM

Cool.  I am new here but I thought you might be on this site.

I used to love XB.  I thought is was just great until I wrote something one day and showed my sister in law.

She said "Why is it so slow?"  She was comparing it to the Commodore 64.  I was P.O. ed. :-)

 

Nice to make your acquaintance.  I was looking for stuff on youtube when I found your video.

Made me curious about how my code compared... of course.

 

It's got me thinking about putting a BASIC wrapper on top of Forth to make something more

palatable for people.  It's been done in the past on other machines and it should go pretty fast.

 

So much code, so little time.

 

BF



#5 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Tue Feb 21, 2017 3:50 PM

FYI, translated to fbForth 2.0, the first takes 5.5 seconds and the second ~0.4 second.

 

...lee

 

Hi Lee,

 

That's interesting. Is FBForth based on TI-Forth? If so then I believe the difference is mostly in the EMIT implementation.

If I recall TI EMIT called the system for some stuff and also provided a proper control key interpreter as well.

My version of EMIT is very sparse.  I tried something weird to try and avoid multiplication in calculating the cursor position. 

I keep track of the ROW as the VDP  address and the column as an offset.

That way I only have to add them together in the word VPOS below so it's pretty quick.

 I use multiplication for manually positioning the cursor with AT-XY however

 

I am intending to use this implementation for cross-compiler tutorial so I am trying to keep a lot of HI-level code

with simple support words in Assembler. 

: EMIT ( char -- )
    VPOS C/SCR @ = IF SCROLL THEN  \ if we are at last character in the display, scroll
    (EMIT)                          \ put the character on the screen & inc. the column
    VCOL @ C/L @ =                  \ are we at end of line?
    IF (CR) THEN  ;                 \ do carriage return math

BF



#6 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,382 posts
  • Location:Castaic, California

Posted Tue Feb 21, 2017 4:53 PM

...

She said "Why is it so slow?"  She was comparing it to the Commodore 64.  I was P.O. ed. :-)

...

 

 

We all know the only thing people used C64 BASIC for was poking in assembly programs, so it was not a fair comparison.  :-)  Rewrite it in 9900 assembly and see how it compares then.



#7 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,262 posts
  • Location:Silver Run, Maryland

Posted Tue Feb 21, 2017 6:44 PM


...  Is fbForth based on TI-Forth?

 

fbForth is, indeed, based on TI Forth.

 

I believe the difference is mostly in the EMIT implementation.

 

Nope.  My implementation of your routines is practically identical to yours.  VC! becomes VSBW and [CHAR] becomes ASCII (same as TurboForth).  The time difference may be that only the inner interpreter is in scratchpad RAM on the 16-bit bus.  If you are running some routines in scratchpad RAM as does TurboForth, your CAMEL99 Forth will be faster.

 

If I recall TI EMIT called the system for some stuff and also provided a proper control key interpreter as well.  ...

 

Indeed, EMIT calls the system ALC (in the spoiler below), which handles BEL, BS, LF, CR.  Everything else is presumed to be printable code.  As noted above, nothing in my implementation of your code uses EMIT .

 

Spoiler

 

...lee



#8 senior_falcon OFFLINE  

senior_falcon

    Dragonstomper

  • 885 posts
  • Location:Lansing, NY, USA

Posted Tue Feb 21, 2017 7:20 PM

 

We all know the only thing people used C64 BASIC for was poking in assembly programs, so it was not a fair comparison.  :-)  Rewrite it in 9900 assembly and see how it compares then.

I have no experience with  C64 BASIC, but the VIC20 BASIC was much faster than TI BASIC.  The kid who lived next door had a VIC 20 and when he saw the TI99 his first comment was "why is it so slow?"



#9 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Tue Feb 21, 2017 8:49 PM



 

fbForth is, indeed, based on TI Forth.

 

 

Nope.  My implementation of your routines is practically identical to yours.  VC! becomes VSBW and [CHAR] becomes ASCII (same as TurboForth).  The time difference may be that only the inner interpreter is in scratchpad RAM on the 16-bit bus.  If you are running some routines in scratchpad RAM as does TurboForth, your CAMEL99 Forth will be faster.

 

 

Indeed, EMIT calls the system ALC (in the spoiler below), which handles BEL, BS, LF, CR.  Everything else is presumed to be printable code.  As noted above, nothing in my implementation of your code uses EMIT .

 

Spoiler

 

...lee

 

You are correct Lee.  I have  NEXT in PAD RAM along with EXIT, DOCOL, ?BRANCH and BRANCH.  

When I tested different benchmarks on CAMEL99 without using that speedup, things were 20% slower, so pretty much exactly right with your timing.

 

You are the man.

 

BF



#10 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,262 posts
  • Location:Silver Run, Maryland

Posted Tue Feb 21, 2017 10:23 PM



You are correct Lee.  I have  NEXT in PAD RAM along with EXIT, DOCOL, ?BRANCH and BRANCH.  

When I tested different benchmarks on CAMEL99 without using that speedup, things were 20% slower, so pretty much exactly right with your timing.

 

You are the man.

 

BF

 

That might do it.  We'll have to compare code sometime.  The fbForth inner interpreter includes NEXT (actually, its ALC label is $NEXT) and the code fields of : [code field label = DOCOL] and EXIT ( ;S in fbForth) [code field label =  $SEMIS], which are all in scratchpad RAM as are fbForth's workspace registers.

 

?BRANCH and BRANCH are not in scratchpad RAM in fbForth—and, I suppose those two words could be making the difference because they are certainly used extensively in the loops in your code—especially, the first example, which is almost twice as fast in your CAMEL99 Forth.

 

The second example is almost a dead heat because the loop branch only operates ten times instead of the 7680 times in the first example.  I wish I could put more code in scratchpad RAM, but that would be a pretty big rewrite.  I know Mark put quite a few of the oft-used words there and had to always be aware of the need to save/restore scratchpad space that conflicted with other functions.

 

...lee



#11 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,002 posts
  • Location:Uzbekistan (no, really!)

Posted Wed Feb 22, 2017 1:53 AM

Yes. File routines were the main offender. Calling disk io routines clobbers pad team in some locations and so does the floating point. Pad ram layout for TF is here:
http://turboforth.ne...es/pad_ram.html

#12 RXB OFFLINE  

RXB

    River Patroller

  • 2,684 posts
  • Location:Vancouver, Washington, USA

Posted Wed Feb 22, 2017 2:12 AM

 

I have never written a BASIC compiler or even an interpreter but I can understand how it would have lots of overhead to make it safe in the way that BASIC is designed to be. I still have grudges with TI over how they implemented the TI-BASIC and XB languages in GPL.  Soooo slow. I spent so many hours in the '80s trying to make it go faster.
 
I found this video on youtube demonstrating the speed-up you get using a compiler for XB and it would have made me happy many years ago. 
 
 
 
I am in the middle of writing a version of CAMEL Forth for the TI-99 and I wondered how it would compare on this simple test.
Here is the video I made using version .5 of the system.
 
I shows a bit more of what this ancient platform could have done if the engineers would have been free to make it properly.

 

Hmm RXB doing something even more impressive then this video using XB, hard to beat the speed of this:

 

 

Or if that is not evidence enough here you go:

 

 

Or lastly try this in RXB:

100 CALL CLEAR
110 FOR L=49 TO 57
120 CALL HCHAR(1,1,L)
130 CALL MOVES("VV",767,0,1)
140 NEXT L
150 ! Test the speed is pretty fast.

Edited by RXB, Wed Feb 22, 2017 2:26 AM.


#13 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Wed Feb 22, 2017 12:17 PM



 

That might do it.  We'll have to compare code sometime.  The fbForth inner interpreter includes NEXT (actually, its ALC label is $NEXT) and the code fields of : [code field label = DOCOL] and EXIT ( ;S in fbForth) [code field label =  $SEMIS], which are all in scratchpad RAM as are fbForth's workspace registers.

 

?BRANCH and BRANCH are not in scratchpad RAM in fbForth—and, I suppose those two words could be making the difference because they are certainly used extensively in the loops in your code—especially, the first example, which is almost twice as fast in your CAMEL99 Forth.

 

<snip>

 

 

Ok this is interesting.  So with DOCOL and EXIT in scratchpad RAM we are the same. 

 

My DO/LOOP primitive code is actually not in scratchpad because it didn't work when I tried it quickly so only BEGIN UNTIL etc and IF/ELSE/THEN ARE getting help from ?BRANCH and BRANCH.

 

My DO/LOOP code borrows from Laxen and Perry via CAMEL Forth and is shown below. 

I originally implemented it with the loop index and limit in R13 and R14 but I want to keep them chaste for my multi-tasker.

 

To really get the fastest Forth loops I find a FOR /NEXT implementation like E-forth is best with a simple down counter to zero.

Goes like crazy compared to DO/LOOPS.  Even Chuck Moore stopped using DO LOOP but the legacy is too big for the

language to remove it completely.

 

Can you see fewer cycles in this code compared to yours?

(BTW the macros POP, PUSH, RPOP and RPUSH work exactly as expected.  I wrote this for Intel first so tried to make the ASM a little bit Forth VM "universal".)

\ Adapted from CAMEL Forth MSP430
\ ; '83 and ANSI standard loops terminate when the boundary of
\ ; limit-1 and limit is crossed, in either direction.  This can
\ ; be conveniently implemented by making the limit 8000h, so that
\ ; arithmetic overflow logic can detect crossing.  I learned this
\ ; trick from Laxen & Perry F83.


\ CAMEL Forth tries to put loop index and limit in registers.
\ We have elected not to do this so we have free registers for
\ a TMS9900 specific, very fast cooperative TASK switcher.

\ NOT using do/loop in registers costs us about 8% slower looping
\ ====================================================================
CODE: <?DO> ( limit ndx -- )
             *SP TOS CMP,        \ compare 2 #s
              @@1 JNE,           \ if they are not the same jump to regular 'do.'  (BELOW)
              IP RPOP,           \ otherwise do a forth 'exit'
              TOS POP,           \ clean the parameter stack
              NEXT,

+CODE: <DO> ( limit indx -- )
@@1:          R0  8000 LI,      \ load "fudge factor" to LIMIT
             *SP+  R0 SUB,      \ LIMIT, compute 8000h-limit "fudge factor"
              R0  TOS ADD,      \ loop ctr = index+fudge
              R0  RPUSH,        \ rpush limit
              TOS RPUSH,        \ rpush index
              TOS POP,          \ refill TOS
              NEXT,
              END-CODE

CODE: <LOOP>
             *RP INC,           \ increment loop
@@2:          @@1 JNO,          \ if no overflow then loop again
              IP INCT,          \ move past (LOOP)'s in-line parameter
              *RP+ *RP+ CMP,    \ RP INC by 4 (1 cell, 30 clocks) Doesn't make much difference to loop speed.
              NEXT,
@@1:         *IP IP ADD,        \ jump back to top of loop  (branch)
              NEXT,
              END-CODE

+CODE: <+LOOP>
              TOS *RP ADD,      \ saving space by jumping into <loop>
              TOS POP,          \ refill TOS, (does not change overflow flag)
              @@2 JMP,
              END-CODE

CODE: I       ( -- n)
              TOS PUSH,            \ making space in TOS slows this down
              *RP TOS MOV,
              2 (RP) TOS SUB,      \ index = loopindex - fudge
              NEXT,
              END-CODE

CODE: J       ( -- n)
              TOS PUSH,
              4 (RP) TOS MOV,       \ outer loop index is on the rstack
              6 (RP) TOS SUB,       \ index = loopindex - fudge
              NEXT,
              END-CODE

CODE: LEAVE
              *RP+ *RP+ CMP,        \ collapse rstack frame in 1 CELL (TMS9900 trick)
               IP RPOP,             \ pop something else to do from the return stack
               NEXT,
               END-CODE


Edited by TheBF, Wed Feb 22, 2017 12:22 PM.


#14 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Wed Feb 22, 2017 2:37 PM

I STAND CORRECTED.

 

I just wrote a FOR NEXT loop and the speed difference between the above DO LOOP and FOR NEXT is almost non-existent.

 

On an Intel ITC Forth I see a 30% improvement.  I am really surprised.  But on the 9900 there is only 1 less instruction!

 

Benchmarks are always right.



#15 Lee Stewart ONLINE  

Lee Stewart

    River Patroller

  • 3,262 posts
  • Location:Silver Run, Maryland

Posted Wed Feb 22, 2017 3:30 PM

My head hurts!   :-o  

 

I can probably figure out how your macros work, but on the face of it, things look pretty similar.

 

For another thing, I have never used JMP, in TMS9900 Forth Assembler.  Perhaps, I will give it a try sometime.  The TI Forth Manual never explained how to use any of the jump codes.  I have always used the structured assembler constructs from the TI Forth Assembler, which is certainly encouraged in the manual.  However, the similar loop code in fbForth 2.0 is all written in straight-up TMS9900 Assembler, so it should not be very difficult to compare.  It is in fbForth002_ResidentDictionary.a99, if you want to check on it before I try anything.

 

One note (you may already know this), in fbForth 2.0 (as in TI Forth) the loop crossover is different in the positive and negative directions from what they are in Forth 83.

 

...lee



#16 TheBF OFFLINE  

TheBF

    Moonsweeper

  • Topic Starter
  • 281 posts
  • Location:The Great White North

Posted Wed Feb 22, 2017 5:41 PM

My head hurts!   :-o

 

I can probably figure out how your macros work, but on the face of it, things look pretty similar.

 

For another thing, I have never used JMP, in TMS9900 Forth Assembler.  Perhaps, I will give it a try sometime.  The TI Forth Manual never explained how to use any of the jump codes.  I have always used the structured assembler constructs from the TI Forth Assembler, which is certainly encouraged in the manual.  However, the similar loop code in fbForth 2.0 is all written in straight-up TMS9900 Assembler, so it should not be very difficult to compare.  It is in fbForth002_ResidentDictionary.a99, if you want to check on it before I try anything.

 

One note (you may already know this), in fbForth 2.0 (as in TI Forth) the loop crossover is different in the positive and negative directions from what they are in Forth 83.

 

...lee

 

Now my head will have to hurt while I wrap it around the loop cross over implications.

 

My macros are in the code window.

\ PUSH & POP on both stacks
: PUSH,         ( src -- )  SP DECT,  *SP   MOV, ;    \ 10+18 = 28  cycles
: POP,          ( dst -- )  *SP+      SWAP  MOV, ;    \ 22 cycles

: RPUSH,        ( src -- ) RP DECT,  *RP   MOV,  ;
: RPOP,         ( dst -- ) *RP+      SWAP  MOV,  ;

\ this one allows nested subroutine calls. Never really needed it
: CALL,         ( dst -- )   \ total cycles 44 to call,  34 to return
                R11 RPUSH,       \ save R11 on forth return stack
               ( addr) BL,       \ branch & link saves the PC in R11
                R11 RPOP, ;      \ R11 RPOP, is laid down by CALL, in the caller
                                 \ We have to lay it in the code after BL so
                                 \ when we return from the Branch&link, R11 is
                                 \ restored to the original value from the rstack

I copied the jump mechanism for my assembler from Win32Forth.  It was always confusing going the other way, from conventional Assembler to Forth assembler so that was my solution.

Sorry it's confusing going the other way. I understand.

 

<time goes by...>

 

So I just peeked into your code and I think the difference in loop speed is the classic space vs speed trade off. Yours is saving lots of space by JMPing to BRANCH every chance you get which is smart on a TI-99.

My BRANCH is far away in scratch pad RAM so I just bit the bullet and took the space.

If I can shoehorn (LOOP) into scratch pad RAM I may have something a little faster again.

 

BF


Edited by TheBF, Wed Feb 22, 2017 6:02 PM.





0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users