+TheBF Posted February 21, 2017 Share Posted February 21, 2017 I have never written a BASIC compiler or even an interpreter but I can understand how it would have lots of overhead to make it safe in the way that BASIC is designed to be. I still have grudges with TI over how they implemented the TI-BASIC and XB languages in GPL. Soooo slow. I spent so many hours in the '80s trying to make it go faster. I found this video on youtube demonstrating the speed-up you get using a compiler for XB and it would have made me happy many years ago. I am in the middle of writing a version of CAMEL Forth for the TI-99 and I wondered how it would compare on this simple test. Here is the video I made using version .5 of the system. I shows a bit more of what this ancient platform could have done if the engineers would have been free to make it properly. Camel Forth V.5 Demo.mov 1 Quote Link to comment Share on other sites More sharing options...
Opry99er Posted February 21, 2017 Share Posted February 21, 2017 That is a video I made to showcase seniorfalcon's compiler several years ago. It is a brilliant utility, and allows for super fast games to be produced in XB. I look forward to your Forth implementation!! Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted February 21, 2017 Share Posted February 21, 2017 FYI, translated to fbForth 2.0, the first takes 5.5 seconds and the second ~0.4 second. ...lee 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 21, 2017 Author Share Posted February 21, 2017 Cool. I am new here but I thought you might be on this site. I used to love XB. I thought is was just great until I wrote something one day and showed my sister in law. She said "Why is it so slow?" She was comparing it to the Commodore 64. I was P.O. ed. :-) Nice to make your acquaintance. I was looking for stuff on youtube when I found your video. Made me curious about how my code compared... of course. It's got me thinking about putting a BASIC wrapper on top of Forth to make something more palatable for people. It's been done in the past on other machines and it should go pretty fast. So much code, so little time. BF Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 21, 2017 Author Share Posted February 21, 2017 FYI, translated to fbForth 2.0, the first takes 5.5 seconds and the second ~0.4 second. ...lee Hi Lee, That's interesting. Is FBForth based on TI-Forth? If so then I believe the difference is mostly in the EMIT implementation. If I recall TI EMIT called the system for some stuff and also provided a proper control key interpreter as well. My version of EMIT is very sparse. I tried something weird to try and avoid multiplication in calculating the cursor position. I keep track of the ROW as the VDP address and the column as an offset. That way I only have to add them together in the word VPOS below so it's pretty quick. I use multiplication for manually positioning the cursor with AT-XY however I am intending to use this implementation for cross-compiler tutorial so I am trying to keep a lot of HI-level code with simple support words in Assembler. : EMIT ( char -- ) VPOS C/SCR @ = IF SCROLL THEN \ if we are at last character in the display, scroll (EMIT) \ put the character on the screen & inc. the column VCOL @ C/L @ = \ are we at end of line? IF (CR) THEN ; \ do carriage return math BF Quote Link to comment Share on other sites More sharing options...
matthew180 Posted February 21, 2017 Share Posted February 21, 2017 ...She said "Why is it so slow?" She was comparing it to the Commodore 64. I was P.O. ed. :-)... We all know the only thing people used C64 BASIC for was poking in assembly programs, so it was not a fair comparison. :-) Rewrite it in 9900 assembly and see how it compares then. 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted February 22, 2017 Share Posted February 22, 2017 ... Is fbForth based on TI-Forth? fbForth is, indeed, based on TI Forth. I believe the difference is mostly in the EMIT implementation. Nope. My implementation of your routines is practically identical to yours. VC! becomes VSBW and [CHAR] becomes ASCII (same as TurboForth). The time difference may be that only the inner interpreter is in scratchpad RAM on the 16-bit bus. If you are running some routines in scratchpad RAM as does TurboForth, your CAMEL99 Forth will be faster. If I recall TI EMIT called the system for some stuff and also provided a proper control key interpreter as well. ... Indeed, EMIT calls the system ALC (in the spoiler below), which handles BEL, BS, LF, CR. Everything else is presumed to be printable code. As noted above, nothing in my implementation of your code uses EMIT . ;[*== EMIT routine CODE = -4 ================= * EMT EQU $LO+$-LLVSPT MOV R2,R1 copy char to R1 for VSBW MOV @$ALTO(U),R0 alternate output device? JEQ EMIT0 jump to video display output if not * * R0 now points to PAB for alternate output device, the one-byte buffer * for which must immediately precede its PAB. PAB must have been set up * to write one byte. * CLR R7 ALTOUT active MOVB R7,@KYSTAT zero status byte DEC R0 point to one-byte VRAM buffer in front of PAB SWPB R1 char to MSB BLWP @VSBW write char to buffer INCT R0 point to Flag/Status byte BLWP @VSBR read it ANDI R1,>1F00 clear error bits without disturbing flag bits BLWP @VSBW write it back to PAB AI R0,8 Set up pointer to namelength byte of PAB MOV R0,@SUBPTR copy to DSR subroutine name-length pointer BLWP @DSRLNK put 1 byte to device DATA >8 B @BKLINK return to caller * * Output is going to the video display * EMIT0 CI R1,7 Is it a bell? JNE NOTBEL CLR R2 MOVB R2,@KYSTAT BLWP @GPLLNK DATA >0036 Emit error tone JMP EMEXIT * NOTBEL CI R1,8 Is it a backspace? JNE NOTBS LI R1,>2000 MOV @CURPO$(U),R0 BLWP @VSBW JGT DECCUR JMP EMEXIT DECCUR DEC @CURPO$(U) JMP EMEXIT * NOTBS CI R1,>A Is it a line feed? JNE NOTLF MOV @$SEND(U),R7 S @$SWDTH(U),R7 C @CURPO$(U),R7 JHE SCRLL A @$SWDTH(U),@CURPO$(u) JMP EMEXIT SCRLL MOV LINK,R7 BL @SCROLL MOV R7,LINK JMP EMEXIT * *** SCROLLING ROUTINE * SCROLL EQU $LO+$-LLVSPT MOV @$SSTRT(U),R0 VRAM addr LI R1,LINBUF Line buffer MOV @$SWDTH(U),R2 Count A R2,R0 Start at line 2 SCROL1 BLWP @VMBR S R2,R0 One line back to write BLWP @VMBW A R2,R0 Two lines ahead for next read A R2,R0 C R0,@$SEND(U) End of screen? JL SCROL1 MOV R2,R1 Blank bottom row of screen LI R0,>2000 Blank S @$SEND(U),R2 NEG R2 Now contains address of start of last line MOV LINK,R6 BL @FILL1 Write the blanks B *R6 * NOTLF CI R1,>D Is it a carriage return? JNE NOTCR CLR R0 MOV @CURPO$(U),R1 MOV R1,R3 S @$SSTRT(U),R1 Adjusted for screen not at 0 MOV @$SWDTH(U),R2 DIV R2,R0 S R1,R3 MOV R3,@CURPO$(U) JMP EMEXIT * NOTCR SWPB R1 Assume it is a printable character MOV @CURPO$(U),R0 BLWP @VSBW MOV @$SEND(U),R2 DEC R2 C R0,R2 JNE NOTCR1 MOV @$SEND(U),R0 S @$SWDTH(U),R0 Was last char on screen. Scroll MOV R0,@CURPO$(U) JMP SCRLL NOTCR1 INC R0 No scroll necessary MOV R0,@CURPO$(U) * EMEXIT B @BKLINK ;]* ...lee Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted February 22, 2017 Share Posted February 22, 2017 We all know the only thing people used C64 BASIC for was poking in assembly programs, so it was not a fair comparison. :-) Rewrite it in 9900 assembly and see how it compares then. I have no experience with C64 BASIC, but the VIC20 BASIC was much faster than TI BASIC. The kid who lived next door had a VIC 20 and when he saw the TI99 his first comment was "why is it so slow?" Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 22, 2017 Author Share Posted February 22, 2017 fbForth is, indeed, based on TI Forth. Nope. My implementation of your routines is practically identical to yours. VC! becomes VSBW and [CHAR] becomes ASCII (same as TurboForth). The time difference may be that only the inner interpreter is in scratchpad RAM on the 16-bit bus. If you are running some routines in scratchpad RAM as does TurboForth, your CAMEL99 Forth will be faster. Indeed, EMIT calls the system ALC (in the spoiler below), which handles BEL, BS, LF, CR. Everything else is presumed to be printable code. As noted above, nothing in my implementation of your code uses EMIT . ;[*== EMIT routine CODE = -4 ================= * EMT EQU $LO+$-LLVSPT MOV R2,R1 copy char to R1 for VSBW MOV @$ALTO(U),R0 alternate output device? JEQ EMIT0 jump to video display output if not * * R0 now points to PAB for alternate output device, the one-byte buffer * for which must immediately precede its PAB. PAB must have been set up * to write one byte. * CLR R7 ALTOUT active MOVB R7,@KYSTAT zero status byte DEC R0 point to one-byte VRAM buffer in front of PAB SWPB R1 char to MSB BLWP @VSBW write char to buffer INCT R0 point to Flag/Status byte BLWP @VSBR read it ANDI R1,>1F00 clear error bits without disturbing flag bits BLWP @VSBW write it back to PAB AI R0,8 Set up pointer to namelength byte of PAB MOV R0,@SUBPTR copy to DSR subroutine name-length pointer BLWP @DSRLNK put 1 byte to device DATA >8 B @BKLINK return to caller * * Output is going to the video display * EMIT0 CI R1,7 Is it a bell? JNE NOTBEL CLR R2 MOVB R2,@KYSTAT BLWP @GPLLNK DATA >0036 Emit error tone JMP EMEXIT * NOTBEL CI R1,8 Is it a backspace? JNE NOTBS LI R1,>2000 MOV @CURPO$(U),R0 BLWP @VSBW JGT DECCUR JMP EMEXIT DECCUR DEC @CURPO$(U) JMP EMEXIT * NOTBS CI R1,>A Is it a line feed? JNE NOTLF MOV @$SEND(U),R7 S @$SWDTH(U),R7 C @CURPO$(U),R7 JHE SCRLL A @$SWDTH(U),@CURPO$(u) JMP EMEXIT SCRLL MOV LINK,R7 BL @SCROLL MOV R7,LINK JMP EMEXIT * *** SCROLLING ROUTINE * SCROLL EQU $LO+$-LLVSPT MOV @$SSTRT(U),R0 VRAM addr LI R1,LINBUF Line buffer MOV @$SWDTH(U),R2 Count A R2,R0 Start at line 2 SCROL1 BLWP @VMBR S R2,R0 One line back to write BLWP @VMBW A R2,R0 Two lines ahead for next read A R2,R0 C R0,@$SEND(U) End of screen? JL SCROL1 MOV R2,R1 Blank bottom row of screen LI R0,>2000 Blank S @$SEND(U),R2 NEG R2 Now contains address of start of last line MOV LINK,R6 BL @FILL1 Write the blanks B *R6 * NOTLF CI R1,>D Is it a carriage return? JNE NOTCR CLR R0 MOV @CURPO$(U),R1 MOV R1,R3 S @$SSTRT(U),R1 Adjusted for screen not at 0 MOV @$SWDTH(U),R2 DIV R2,R0 S R1,R3 MOV R3,@CURPO$(U) JMP EMEXIT * NOTCR SWPB R1 Assume it is a printable character MOV @CURPO$(U),R0 BLWP @VSBW MOV @$SEND(U),R2 DEC R2 C R0,R2 JNE NOTCR1 MOV @$SEND(U),R0 S @$SWDTH(U),R0 Was last char on screen. Scroll MOV R0,@CURPO$(U) JMP SCRLL NOTCR1 INC R0 No scroll necessary MOV R0,@CURPO$(U) * EMEXIT B @BKLINK ;]* ...lee You are correct Lee. I have NEXT in PAD RAM along with EXIT, DOCOL, ?BRANCH and BRANCH. When I tested different benchmarks on CAMEL99 without using that speedup, things were 20% slower, so pretty much exactly right with your timing. You are the man. BF Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted February 22, 2017 Share Posted February 22, 2017 You are correct Lee. I have NEXT in PAD RAM along with EXIT, DOCOL, ?BRANCH and BRANCH. When I tested different benchmarks on CAMEL99 without using that speedup, things were 20% slower, so pretty much exactly right with your timing. You are the man. BF That might do it. We'll have to compare code sometime. The fbForth inner interpreter includes NEXT (actually, its ALC label is $NEXT) and the code fields of : [code field label = DOCOL] and EXIT ( ;S in fbForth) , which are all in scratchpad RAM as are fbForth's workspace registers. ?BRANCH and BRANCH are not in scratchpad RAM in fbForth—and, I suppose those two words could be making the difference because they are certainly used extensively in the loops in your code—especially, the first example, which is almost twice as fast in your CAMEL99 Forth. The second example is almost a dead heat because the loop branch only operates ten times instead of the 7680 times in the first example. I wish I could put more code in scratchpad RAM, but that would be a pretty big rewrite. I know Mark put quite a few of the oft-used words there and had to always be aware of the need to save/restore scratchpad space that conflicted with other functions. ...lee Quote Link to comment Share on other sites More sharing options...
Willsy Posted February 22, 2017 Share Posted February 22, 2017 Yes. File routines were the main offender. Calling disk io routines clobbers pad team in some locations and so does the floating point. Pad ram layout for TF is here: http://turboforth.net/resources/pad_ram.html Quote Link to comment Share on other sites More sharing options...
RXB Posted February 22, 2017 Share Posted February 22, 2017 (edited) I have never written a BASIC compiler or even an interpreter but I can understand how it would have lots of overhead to make it safe in the way that BASIC is designed to be. I still have grudges with TI over how they implemented the TI-BASIC and XB languages in GPL. Soooo slow. I spent so many hours in the '80s trying to make it go faster. I found this video on youtube demonstrating the speed-up you get using a compiler for XB and it would have made me happy many years ago. I am in the middle of writing a version of CAMEL Forth for the TI-99 and I wondered how it would compare on this simple test. Here is the video I made using version .5 of the system. I shows a bit more of what this ancient platform could have done if the engineers would have been free to make it properly. Hmm RXB doing something even more impressive then this video using XB, hard to beat the speed of this: Or if that is not evidence enough here you go: Or lastly try this in RXB: 100 CALL CLEAR 110 FOR L=49 TO 57 120 CALL HCHAR(1,1,L) 130 CALL MOVES("VV",767,0,1) 140 NEXT L 150 ! Test the speed is pretty fast. Edited February 22, 2017 by RXB 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 22, 2017 Author Share Posted February 22, 2017 (edited) That might do it. We'll have to compare code sometime. The fbForth inner interpreter includes NEXT (actually, its ALC label is $NEXT) and the code fields of : and EXIT ( ;S in fbForth) [code field label = $SEMIS], which are all in scratchpad RAM as are fbForth's workspace registers. ?BRANCH and BRANCH are not in scratchpad RAM in fbForth—and, I suppose those two words could be making the difference because they are certainly used extensively in the loops in your code—especially, the first example, which is almost twice as fast in your CAMEL99 Forth. <snip> Ok this is interesting. So with DOCOL and EXIT in scratchpad RAM we are the same. My DO/LOOP primitive code is actually not in scratchpad because it didn't work when I tried it quickly so only BEGIN UNTIL etc and IF/ELSE/THEN ARE getting help from ?BRANCH and BRANCH. My DO/LOOP code borrows from Laxen and Perry via CAMEL Forth and is shown below. I originally implemented it with the loop index and limit in R13 and R14 but I want to keep them chaste for my multi-tasker. To really get the fastest Forth loops I find a FOR /NEXT implementation like E-forth is best with a simple down counter to zero.Goes like crazy compared to DO/LOOPS. Even Chuck Moore stopped using DO LOOP but the legacy is too big for thelanguage to remove it completely. Can you see fewer cycles in this code compared to yours?(BTW the macros POP, PUSH, RPOP and RPUSH work exactly as expected. I wrote this for Intel first so tried to make the ASM a little bit Forth VM "universal".) \ Adapted from CAMEL Forth MSP430 \ ; '83 and ANSI standard loops terminate when the boundary of \ ; limit-1 and limit is crossed, in either direction. This can \ ; be conveniently implemented by making the limit 8000h, so that \ ; arithmetic overflow logic can detect crossing. I learned this \ ; trick from Laxen & Perry F83. \ CAMEL Forth tries to put loop index and limit in registers. \ We have elected not to do this so we have free registers for \ a TMS9900 specific, very fast cooperative TASK switcher. \ NOT using do/loop in registers costs us about 8% slower looping \ ==================================================================== CODE: <?DO> ( limit ndx -- ) *SP TOS CMP, \ compare 2 #s @@1 JNE, \ if they are not the same jump to regular 'do.' (BELOW) IP RPOP, \ otherwise do a forth 'exit' TOS POP, \ clean the parameter stack NEXT, +CODE: <DO> ( limit indx -- ) @@1: R0 8000 LI, \ load "fudge factor" to LIMIT *SP+ R0 SUB, \ LIMIT, compute 8000h-limit "fudge factor" R0 TOS ADD, \ loop ctr = index+fudge R0 RPUSH, \ rpush limit TOS RPUSH, \ rpush index TOS POP, \ refill TOS NEXT, END-CODE CODE: <LOOP> *RP INC, \ increment loop @@2: @@1 JNO, \ if no overflow then loop again IP INCT, \ move past (LOOP)'s in-line parameter *RP+ *RP+ CMP, \ RP INC by 4 (1 cell, 30 clocks) Doesn't make much difference to loop speed. NEXT, @@1: *IP IP ADD, \ jump back to top of loop (branch) NEXT, END-CODE +CODE: <+LOOP> TOS *RP ADD, \ saving space by jumping into <loop> TOS POP, \ refill TOS, (does not change overflow flag) @@2 JMP, END-CODE CODE: I ( -- n) TOS PUSH, \ making space in TOS slows this down *RP TOS MOV, 2 (RP) TOS SUB, \ index = loopindex - fudge NEXT, END-CODE CODE: J ( -- n) TOS PUSH, 4 (RP) TOS MOV, \ outer loop index is on the rstack 6 (RP) TOS SUB, \ index = loopindex - fudge NEXT, END-CODE CODE: LEAVE *RP+ *RP+ CMP, \ collapse rstack frame in 1 CELL (TMS9900 trick) IP RPOP, \ pop something else to do from the return stack NEXT, END-CODE Edited February 22, 2017 by TheBF Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 22, 2017 Author Share Posted February 22, 2017 I STAND CORRECTED. I just wrote a FOR NEXT loop and the speed difference between the above DO LOOP and FOR NEXT is almost non-existent. On an Intel ITC Forth I see a 30% improvement. I am really surprised. But on the 9900 there is only 1 less instruction! Benchmarks are always right. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted February 22, 2017 Share Posted February 22, 2017 My head hurts! I can probably figure out how your macros work, but on the face of it, things look pretty similar. For another thing, I have never used JMP, in TMS9900 Forth Assembler. Perhaps, I will give it a try sometime. The TI Forth Manual never explained how to use any of the jump codes. I have always used the structured assembler constructs from the TI Forth Assembler, which is certainly encouraged in the manual. However, the similar loop code in fbForth 2.0 is all written in straight-up TMS9900 Assembler, so it should not be very difficult to compare. It is in fbForth002_ResidentDictionary.a99, if you want to check on it before I try anything. One note (you may already know this), in fbForth 2.0 (as in TI Forth) the loop crossover is different in the positive and negative directions from what they are in Forth 83. ...lee Quote Link to comment Share on other sites More sharing options...
+TheBF Posted February 22, 2017 Author Share Posted February 22, 2017 (edited) My head hurts! I can probably figure out how your macros work, but on the face of it, things look pretty similar. For another thing, I have never used JMP, in TMS9900 Forth Assembler. Perhaps, I will give it a try sometime. The TI Forth Manual never explained how to use any of the jump codes. I have always used the structured assembler constructs from the TI Forth Assembler, which is certainly encouraged in the manual. However, the similar loop code in fbForth 2.0 is all written in straight-up TMS9900 Assembler, so it should not be very difficult to compare. It is in fbForth002_ResidentDictionary.a99, if you want to check on it before I try anything. One note (you may already know this), in fbForth 2.0 (as in TI Forth) the loop crossover is different in the positive and negative directions from what they are in Forth 83. ...lee Now my head will have to hurt while I wrap it around the loop cross over implications. My macros are in the code window. \ PUSH & POP on both stacks : PUSH, ( src -- ) SP DECT, *SP MOV, ; \ 10+18 = 28 cycles : POP, ( dst -- ) *SP+ SWAP MOV, ; \ 22 cycles : RPUSH, ( src -- ) RP DECT, *RP MOV, ; : RPOP, ( dst -- ) *RP+ SWAP MOV, ; \ this one allows nested subroutine calls. Never really needed it : CALL, ( dst -- ) \ total cycles 44 to call, 34 to return R11 RPUSH, \ save R11 on forth return stack ( addr) BL, \ branch & link saves the PC in R11 R11 RPOP, ; \ R11 RPOP, is laid down by CALL, in the caller \ We have to lay it in the code after BL so \ when we return from the Branch&link, R11 is \ restored to the original value from the rstack I copied the jump mechanism for my assembler from Win32Forth. It was always confusing going the other way, from conventional Assembler to Forth assembler so that was my solution. Sorry it's confusing going the other way. I understand. <time goes by...> So I just peeked into your code and I think the difference in loop speed is the classic space vs speed trade off. Yours is saving lots of space by JMPing to BRANCH every chance you get which is smart on a TI-99. My BRANCH is far away in scratch pad RAM so I just bit the bullet and took the space. If I can shoehorn (LOOP) into scratch pad RAM I may have something a little faster again. BF Edited February 23, 2017 by TheBF 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.