JamesD Posted August 28, 2017 Share Posted August 28, 2017 It makes me wonder how big a "Boolean" type is in the P-Code system. Any idea? Oops re-read your earlier post completely. 16 bits for a Boolean. How big can you go before it hits the limit? Could you try ARRAY of CHAR ? That's still only 20K. Seems a bit wasteful for what could be a single bit. I think the array of char sounds like a reasonable idea, but if the P-Machine doesn't handle arrays across boundaries, it could still be a problem. It sounds odd that it wouldn't support that though. Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 28, 2017 Share Posted August 28, 2017 (edited) array [0..999] of char, just like array[0..999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array. Now if you declare a packed array[0..999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time. I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case. Edited December 5, 2020 by apersson850 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 28, 2017 Share Posted August 28, 2017 I realized I made a mistake in declaring the array of booleans. Corrected that and updated the list. 1000 primes X BASIC 14.3 secs TI BASIC 11.9 Pascal 2.786 s 10,000 primes TF 8.0 CAMEL99 ITC 10.28 CAMEL99 DTC 7.25 When the program is running, function memavail reports 5697 words free. My unit realtime consumes some memory too, especially as it has a segment with assembly support that's not dynamically relocatable. Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 28, 2017 Share Posted August 28, 2017 array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array. Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time. I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case. Isn't the memory expansion paged? Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 28, 2017 Share Posted August 28, 2017 Well, it's split in one 8 K and one 24 K section. But they are all on the same page, at least as I define a page in this context. That is, in the same address space, which for the TMS 9900 comprises 64 Kbytes. Quote Link to comment Share on other sites More sharing options...
JamesD Posted August 28, 2017 Share Posted August 28, 2017 Well, it's split in one 8 K and one 24 K section. But they are all on the same page, at least as I define a page in this context. That is, in the same address space, which for the TMS 9900 comprises 64 Kbytes. Page as in turning a page, changing a page, having to change with RAM appears at an address to use it page Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 28, 2017 Share Posted August 28, 2017 (edited) Well, that's the same definition, then. And in that case it's not paged. It's the same memory expansion as we always have had. Edited August 29, 2017 by apersson850 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted August 30, 2017 Share Posted August 30, 2017 (edited) array [0..999] of char, just like array[0.999] of boolean, still occupies 1000 words, as each character/boolean is stored in one word (16 bits). The system can't allocate a 10000 word array. Now if you declare a packed array[0.999] of char, then that's 500 words, and a packed array[0..999] of boolean is only 63 words. But then you have to add the overhead for packing and unpacking when accessing the array. So it works, but will take longer time. I'm not sure which boundaries you are referring to, when asking if the p-system can handle arrays across boundaries? There are no boundaries to cross in this case. I always like the way Wirth's languages have these different data types. Forth makes you work hard for that nice stuff. So in my Pascal Envy fever I thought I would see what it would take to make a "packed array of bits" and it is more code than I thought. The overhead just to set 1 bit, according to the CLASSIC99 9901 timer is 1.8 milli-seconds in my Forth system. EDIT: New times 1.5mS to set a bit. 1.1mS to read a bit (BIT@) This could be really sped up with assembly language code words but it's still a lot of work to strip out bits. EDIT: Here is the revised code with fixed RSHIFT and uses CELL+ and CELLS to be less platform dependant. \ BOOLEAN array experiment \ BOOLEAN data is one CELL (16 bits on TMS9900) HEX \ create & erase memory area for 'n' bits : BITS: ( n -- ) CREATE 8 / HERE OVER 0 FILL CELL+ ALLOT ; \ added 2 bytes extra \ compute bit# in a cell & cell address : BITFLD ( bit# addr[] -- bit#' addr) SWAP 10 /MOD CELLS ROT + ; : BIT@ ( bit# addr -- ? ) BITFLD @ \ compute bit# & fetch bits in cell SWAP RSHIFT \ if bit#<>0 RSHIFT, 0001 AND ; \ mask 1 bit : BIT#>MASK ( bit# -- n ) 0001 SWAP LSHIFT ; : BSET ( bit# addr[] -- ) BITFLD ( -- bit# addr) SWAP BIT#>MASK >R \ save the mask DUP @ \ -- addr bits R> OR SWAP ! ; \ or mask into bit, store in addr : BRST ( bit# addr[] -- ) BITFLD ( -- bit# addr) SWAP BIT#>MASK INVERT >R \ invert and save mask DUP @ \ -- addr bits R> AND SWAP ! ; \ mask out bits, store back in addr \ test code DECIMAL 300 BITS: ]X \ make array X of 1000 bits : FILLBITS 300 0 DO I ]X BSET LOOP ; : CLRBITS 300 0 DO I ]X BRST LOOP ; : EVENBITS ." Erasing..." CLRBITS 300 0 DO I ]X BSET 2 +LOOP ; : SHOWBITS 300 0 DO I ]X BIT@ . LOOP ; Edited September 14, 2017 by TheBF Quote Link to comment Share on other sites More sharing options...
apersson850 Posted August 30, 2017 Share Posted August 30, 2017 I've not checked how the PME routines for packing and unpacking works. But obviously they take more time than not having to run them. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 14, 2017 Share Posted September 14, 2017 (edited) So I ran some of the *HAYES tester on my Forth system and guess what? I coded my bit shifting words in a way that made them FAIL. No shock because I didn't read the spec. (DUH) In the WORD BIT@ in my earlier post, I added an IF statement to cope with a 0 bit shift. Well the ANS/ISO Forth standard already anticipated that. So I recoded my shift words (LSHIFT & RSHIFT) to pass the test and now BIT@ is much faster and much simpler. I have edited the previous POST to reflect the correction. : BIT@ ( bit# addr -- ? ) BITFLD @ \ compute bit# & fetch bits in cell SWAP RSHIFT \ if bit#<>0 RSHIFT, 0001 AND ; \ mask 1 bit * The Hayes tester is set of test words and test suites for every standard Forth word that let's you confirm that a Forth WORD is doing what it's supposed to do. It looks like this for a simple word like SWAP: T{ 1 2 3 SWAP -> 1 3 2 }T You get OK if the output swap matches the '1 3 2' after the arrow and an error message if not. To read more: http://forth-standard.org/standard/testsuite Edited September 14, 2017 by TheBF 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 26, 2019 Share Posted April 26, 2019 I had always wanted to come back to this little demo to properly do this in Forth in an idiomatic way and for the best performance.This of course means using the Forth assembler. There is no other way to match compilers that generate native code with a Threaded code Forth. So I took a look at Tursi's code. I made a some slight improvements by keeping the loop limits in a register rather than re-loading the register every time through the loop. I used the Forth compiler to make byte directive so signed bytes can be used for the counters. I liked the way that made the code look. This happens at compile time so it costs nothing in run-time. Then I chopped up the big program into 4 routines which would be the Forth approach. This meant I could test each movement independantly.I used Forth for all the setup stuff, clearing the screen and creating the sprite. After that it's just 4 words in a 100 iteration loop. The spoiler has the new code and the video shows the timing using the VDP interrupt on the PC. It seems to agree with my iPhone stopwatch. \ * assumes startup from CAMEL99 Forth \ * Translated from ASM code by Tursi \ stripped out graphics set up and used Forth. \ Uses Forth Workspace at >8300 \ NEEDS DUMP FROM DSK1.TOOLS NEEDS MOV, FROM DSK1.ASM9900 NEEDS CLEAR FROM DSK1.GRAFIX NEEDS SPRITE FROM DSK1.DIRSPRIT NEEDS ELAPSE FROM DSK1.ELAPSE HEX 8C00 CONSTANT VDPWD \ vdp ram write data 8C02 CONSTANT VDPWA \ vdp ram read/write address \ MACRO to setup VDP write address from a register argument : VDPWA, ( reg -- ) DUP VDPWA @@ MOVB, \ write 1st byte of address to VDP chip DUP SWPB, VDPWA @@ MOVB, \ write 2nd byte of address to VDP chip NOP, ; \ Macro to convert integer to signed byte at compile time : byte ( n -- c ) 00FF AND >< ; CODE MOVERIGHT ( -- ) R3 CLR, \ for x=0 DECIMAL R8 239 byte LI, \ to 239 R13 01 byte LI, \ step 1 BEGIN, HEX R0 4301 >< LI, ( >< swaps bytes at compile time) R0 VDPWA, R3 VDPWD @@ MOVB, R13 R3 ADD, \ next x R3 R8 CMP, EQ UNTIL, NEXT, ENDCODE CODE MOVEDOWN R4 CLR, \ for y=0 DECIMAL R8 175 byte LI, \ to 175 BEGIN, HEX R0 4300 >< LI, R0 VDPWA, R4 VDPWD @@ MOVB, R13 R4 ADD, \ next y R4 R8 CMP, EQ UNTIL, NEXT, ENDCODE CODE MOVELEFT DECIMAL R3 239 byte LI, \ for x=239 to 0 HEX R13 -1 byte LI, \ step -1 BEGIN, R0 4301 >< LI, R0 VDPWA, R3 VDPWD @@ MOVB, R13 R3 ADD, \ next x EQ UNTIL, NEXT, ENDCODE CODE MOVEUP DECIMAL R4 175 byte LI, \ * for y=175 to 0 R13 -1 byte LI, \ step -1 BEGIN, HEX R0 4300 >< LI, R0 VDPWA, R4 VDPWD @@ MOVB, R13 R4 ADD, \ * next y EQ UNTIL, NEXT, ENDCODE DECIMAL : START CLEAR \ Ascii color y x spr# [CHAR] * 2 1 1 0 SPRITE 1 MAGNIFY 100 0 DO MOVERIGHT MOVEDOWN MOVELEFT MOVEUP LOOP ; Tursi's Benchmark.mp4 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted April 26, 2019 Share Posted April 26, 2019 By the way, for reference, when I did a straight translation to Forth Assembler of Tursi's original code it was about 6.2 seconds on my Classic99/PC combo. With the loop limit in a register it came down to 5.91 The setup Forth code takes about 22mS measured with the 9901 timer so it's insignificant. I never saw anything close to the published 5 seconds. Is this a hardware issue on my end? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 3, 2019 Share Posted May 3, 2019 (edited) Since I found a way to take some instructions out of CAMEL99 Forth direct threaded version I re-did some times on the Sieve of Eratosthenes. Each Forth system ran the same code in the spoiler. Again we can see TurboForth's strategy of installing a maximum number of Forth routines in 16bit RAM as the one to beat. But using the BL instruction to assist in making direct threaded code shaved the time down from 7.25 to 7.1 seconds.And the newest version of Camel Forth, where more of the primitive routines are written in Assembler improves the time versus regular Camel Forth by 20%. I believe the FBFORTH performance is being held back by the way FIG Forth implements calling Forth's "inner interpreters" per this discussion: http://http://www.bradrodriguez.com/papers/moving3.htm Two approaches have been used in the past:1. The fig-Forth solution. Fig-Forth reserved the first cell ofthe Parameter Field to hold the address of the high-level code.The DODOES routine then obtained the Parameter Field address,pushed the address of the actual data (typically PFA+2) ontothe stack, fetched the address of the high-level routine,and EXECUTEd. Manual timings 1000 primes X BASIC 14.3 secs TI BASIC 11.9 10000 Primes ---------------------- TurboForth 8.0 CAMEL99 10.28 CAMEL99G 8.3 CAMEL99 DTC 7.1 (new version) FBFORTH 15.0 \ Sieve of Erathosenes in Forth \ Tested with TurboForth FBFORTH and CAMEL99 Forth \ array calculators : []@ ( n addr -- c ) + C@ ; : []! ( n ndx addr -- ) + C! ; : ERASE ( addr n -- ) 0 FILL ; HEX : FREEMEM ( -- n) FF00 HERE - ; : ?MEM ( n -- ) FREEMEM OVER < IF ." Out of memory" ABORT THEN ; : SEEPRIMES ( -- ) CR ." Primes: " 2 DO I HERE []@ 0= IF I . THEN ?TERMINAL IF ." Primes halted" ABORT THEN LOOP ; \ byte array uses un-allocated memory at HERE DECIMAL : PRIMES ( n -- ) ?MEM CR ." Running..." HERE OVER ERASE 1 0 HERE []! \ mark as prime like 'C' version 1 1 HERE []! 2 \ start at 2 BEGIN OVER OVER DUP * > WHILE DUP HERE []@ 0= IF OVER OVER DUP * DO 1 I HERE []! DUP +LOOP THEN 1+ REPEAT CR ." Complete." CR DROP CR ." Press ENTER to see primes:" KEY 13 = IF SEEPRIMES THEN ; Edited May 3, 2019 by TheBF 3 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted May 3, 2019 Share Posted May 3, 2019 The fbForth time will likely improve if you first remove the hook to the fbForth ISR by storing 0 at >83C4: HEX 0 83C4 ! ...lee 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 3, 2019 Share Posted May 3, 2019 I will do that and edit the result Lee when I get back to my desk. Quote Link to comment Share on other sites More sharing options...
Willsy Posted May 3, 2019 Share Posted May 3, 2019 Nice! Quote Link to comment Share on other sites More sharing options...
RXB Posted May 3, 2019 Share Posted May 3, 2019 Youtube deleted most of my videos off my channel and no reasons why or what the problem is.....? Youtube management are IDIOTS at best and the software they use to find Porn or Hate Speech is pure crap. Example is Youtube deleted a PBS channel as Porn and it was Sesame Street? Could Youtube write a worse junk program? Quote Link to comment Share on other sites More sharing options...
+TheBF Posted May 4, 2019 Share Posted May 4, 2019 The fbForth time will likely improve if you first remove the hook to the fbForth ISR by storing 0 at >83C4: HEX 0 83C4 ! ...lee Lee, I tried this twice just now. After FBForth started I got 15 seconds. Then again after 0 83C4 ! I got 14.86. Your interrupts are not taking very much time away from the CPU when they are idling it seems. When I look at your inner interpreter I cannot see where the overhead is coming from? 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 4, 2020 Share Posted December 4, 2020 One More Time! I have been working on this Machine Forth concept that uses Forth like language to generate native inline code and/or sub-routines. I had to put this new compiler on the board. VDPLIB is Forth assembler but the VDP byte write is not fully optimized and uses an internal BL as well. I don't think I can get it down to 5 seconds but given that the compiler is only 274 lines of Forth + 170 lines for the cross-assembler vs GCC with ???? Megabytes it's not to bad. I will try changing the VDP byte write routine to a fully INLINE version and see what happens. Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec TBD TurboForth 48 sec 29 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 780 sec \ Tursi's Sprite demo for Machine Forth B Fox Dec 3 2020 INCLUDE DSK1.ELAPSE NEEDS MFORTH FROM DSK2.MFORTH COMPILER: HEX NEW. 2000 ORIGIN. INCLUDE DSK2.VDPLIB INCLUDE DSK2.SCREENIO INCLUDE DSK2.SPRITES HOST: 300 CONSTANT $300 301 CONSTANT $301 [CC] DECIMAL TARGET: PROG: SPRITEDEMO GRAPHICS DELALL 1 # MAGNIFY ( char colr x y ) 42 # 1 # 0 # 0 # SPRITE0 100 # FOR 0 # 240 # FOR DUP $301 # VC! 1+ NEXT DROP 0 # 176 # FOR DUP $300 # VC! 1+ NEXT DROP 239 # FOR I $301 # VC! NEXT 175 # FOR I $300 # VC! NEXT NEXT TEXT NEXT, END. Mforth sprites.mp4 3 Quote Link to comment Share on other sites More sharing options...
GDMike Posted December 4, 2020 Share Posted December 4, 2020 I think I've got one of them asterisks on my floor.... But good numbers ? to see Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 4, 2020 Share Posted December 4, 2020 1 hour ago, GDMike said: I think I've got one of them asterisks on my floor.... But good numbers ? to see I don't quite understand. You mean sprite is moving around on your floor? Do you a have prescription for that stuff yer takin' ? 1 2 Quote Link to comment Share on other sites More sharing options...
GDMike Posted December 4, 2020 Share Posted December 4, 2020 No, I'm like, gees I wasn't expecting all those sprite's all of a sudden! On a big screen TV it's like jumping at ya.. funny 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted December 4, 2020 Share Posted December 4, 2020 On 5/4/2019 at 3:03 PM, TheBF said: Lee, I tried this twice just now. After fbForth started I got 15 seconds. Then again after 0 83C4 ! I got 14.86. Your interrupts are not taking very much time away from the CPU when they are idling it seems. When I look at your inner interpreter I cannot see where the overhead is coming from? From the fbForth 2.0 ISR in the spoiler below, Spoiler * _____ ____ __ __ ___________ * / _/ / / __/__ ____/ /_/ / / _/ __/ _ \ * / _/ _ \/ _// _ \/ __/ __/ _ \ _/ /_\ \/ , _/ * /_//_.__/_/ \___/_/ \__/_//_/ /___/___/_/|_| * ;[*** Interrupt Service ======================================================= * This routine is executed for every interrupt. It processes any pending * speech and souind. It then looks to see whether a user ISR is installed in * ISR. If so, it sets up NEXT for execution of the user ISR. This will work * only if the user has installed an ISR using the following steps in the fol- * lowing order: * * (1) Write an ISR with entry point, say MYISR. * (2) Determine code field address of MYISR with this high-level Forth: * ' MYISR CFA * <<< Maybe need a word to do #3 >>> * (3) Write CFA of MYISR into user variable ISR. * * Steps (2)-(3) in high-level Forth are shown below: * ' MYISR CFA * ISR ! * * <<< Perhaps last step above should be by a word that disables interrupts >>> * * The console ISR branches to the contents of >83C4 because it is non-zero, * with the address, INT1, of the fbForth ISR entry point below (also, the * contents of INTLNK). This means that the console ISR will branch to INT1 * with BL *R12 from WP = GPLWS (>83E0), R12 containing INT1 below to first * process any pending speech and sound. * * If the user's ISR is properly installed, the code that processes the user * ISR modifies NEXT so that the very next time B *NEXT or B *R15 is executed * from Forth's workspace (MAINWS), the code at INT2 will process the user's * ISR just before branching to the normal NEXT entry ($NEXT) in fbForth's * inner interpreter. *** ========================================================================== * ¡¡¡ MUST REMEMBER THAT WE ARE IN GPL WORKSPACE UPON ENTRY. !!! INT1 LI R0,BRSTK load address of top of Branch Address Stack * * Set up for pending speech * MOV @SPCSVC,*R0 save Speech service address onto Branch Stack JEQ SNDCH1 jump to sound-check if no speech INCT R0 increment Branch Stack * * Set up for pending sound table #1 (ST#1) * SNDCH1 MOV @SND1ST,R2 sound table ST#1 to service? JEQ SNDCH2 process speech and sound if needed LI R1,PLAYT1 load PLAYT1 address and... MOV R1,*R0+ ...push it onto Branch Stack * * Set up for pending sound table #2 (ST#2) * SNDCH2 MOV @SND2ST,R3 sound table ST#2 to service? JEQ PRCSPS process speech and sound if needed LI R1,PLAYT2 load PLAYT2 address and... MOV R1,*R0+ ...push it onto Branch Stack * * Process sound stack if both sound tables idle * PRCSPS SOC R2,R3 OR R2 and R3..both sound tables idle? JNE PRSPS2 nope..skip sound stack processing LWPI SND1WS switch to ST#1 WS CI R4,SNDST0 anything on sound stack? JEQ PRSPS1 no..exit sound stack processing DECT R4 pop sound stack position MOV *R4,R2 get sound table address from sound stack INC R0 kick off sound processing of ST#1 (R0=1) PRSPS1 LWPI GPLWS switch back to GPL WS * * Check for any pending speech and sound * PRSPS2 CI R0,BRSTK any speech or sound to process? JEQ USRISR if not, jump to user ISR processing LI R1,BNKRST yup..load return address MOV R1,*R0 push return address onto Branch Stack * * Process pending speech and sound * MOV @MYBANK,@BANKSV save bank at interrupt CLR @>6002 switch to bank 2 for speech & sound services LI R7,BRSTK load top of Branch Stack MOV *R7+,R8 pop speech/sound ISR B *R8 service speech/sound * * Restore interrupted bank * BNKRST ; return point for speech and sound ISRs MOV @BANKSV,R0 restore bank at interrupt SRL R0,13 get the bank# to correct position AI R0,>6000 make it a real bank-switch address CLR *R0 switch to the bank at interrupt * * Process User ISR if defined * USRISR MOV @$ISR+$UVAR,R0 User ISR installed? JEQ INTEX * * Fix NEXT so that the user's ISR is processed the next time B *NEXT (B *R15) * is executed from Forth's WS (MAINWS = >8300), which it does at the end of * every CODE word, keyboard scan and one or two other places. * LI R1,INT2 Load entry point, INT2 MOV R1,@2*NEXT+MAINWS Copy it to Forth's NEXT (R15) * * The following 2 instructions are copies of the remainder of the console ROM's * ISR (except that 'CLR R8' was removed because it is only needed by TI Basic) * because we're not going back there! * INTEX LWPI >83C0 Change to console's ISR WS RTWP Return to caller of console ISR * * Branch through above-modified NEXT (R15) gets us here. NEXT will be restored * before executing user's ISR. INT3 (cleanup routine below) will be inserted * in address list to get us back here for cleanup after user's ISR has finished. * User's ISR is executed at the end of this section just before INT3. * INT2 LIMI 0 Disable interrupts MOVB @>83D4,R0 Get copy of VR01 SRL R0,8 ...to LSB ORI R0,>100 Set up for VR01 ANDI R0,>FFDF Clear VDP-interrupt-enable bit BLWP @VWTR Turn off VDP interrupt LI NEXT,$NEXT Restore NEXT SETO @INTACT Set Forth "pending interrupt" flag DECT R Set up return linkage by pushing MOV IP,*R ...IP (R13, next Forth CFA) to return stack and LI IP,INT3 ...setting IP to INT3 (below) for cleanup MOV @$ISR(U),W Do the user's Forth ISR by executing B @DOEXEC ...it through Forth's inner interpreter * * Clean up and re-enable interrupts. * INT3 DATA INT3+2 $NEXT (or $SEMIS) puts INT3+2 in W (R10) DATA INT3+4 DOEXEC (or $SEMIS) will branch to *W = INT3+4 (next instr) MOV *R+,IP Start cleanup: pop IP from before call to user's ISR CLR @INTACT Clear Forth "pending interrupt" flag MOVB @>83D4,R0 Prepare to restore VR01 by... SRL R0,8 ...moving payload to LSB (enabling VDP interrupt) and AI R0,>100 ...VR # (01) to MSB MOVB @VDPSTA,R1 Remove pending VDP interrupt by reading VDP status BLWP @VWTR Write VR01 LIMI 2 Re-enable interrupts B *NEXT Continue normal task ;]* you can see that, with no speech, sound or user ISR to service, only the following code executes at every interrupt—and, as you probably know, fbForth’s normal state has interrupts on: INT1 LI R0,BRSTK load address of top of Branch Address Stack MOV @SPCSVC,*R0 save Speech service address onto Branch Stack JEQ SNDCH1 jump to sound-check if no speech * ... SNDCH1 MOV @SND1ST,R2 sound table ST#1 to service? JEQ SNDCH2 process speech and sound if needed * ... SNDCH2 MOV @SND2ST,R3 sound table ST#2 to service? JEQ PRCSPS process speech and sound if needed * ... PRCSPS SOC R2,R3 OR R2 and R3..both sound tables idle? JNE PRSPS2 nope..skip sound stack processing * ... PRSPS2 CI R0,BRSTK any speech or sound to process? JEQ USRISR if not, jump to user ISR processing * ... USRISR MOV @$ISR+$UVAR,R0 User ISR installed? JEQ INTEX * ... INTEX LWPI >83C0 Change to console's ISR WS RTWP Return to caller of console ISR ...lee 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 4, 2020 Share Posted December 4, 2020 3 minutes ago, Lee Stewart said: From the fbForth 2.0 ISR in the spoiler below, you can see that, with no speech, sound or user ISR to service, only the following code executes at every interrupt—and, as you probably know, fbForth’s normal state has interrupts on: INT1 LI R0,BRSTK load address of top of Branch Address Stack MOV @SPCSVC,*R0 save Speech service address onto Branch Stack JEQ SNDCH1 jump to sound-check if no speech * ... SNDCH1 MOV @SND1ST,R2 sound table ST#1 to service? JEQ SNDCH2 process speech and sound if needed * ... SNDCH2 MOV @SND2ST,R3 sound table ST#2 to service? JEQ PRCSPS process speech and sound if needed * ... PRCSPS SOC R2,R3 OR R2 and R3..both sound tables idle? JNE PRSPS2 nope..skip sound stack processing * ... PRSPS2 CI R0,BRSTK any speech or sound to process? JEQ USRISR if not, jump to user ISR processing * ... USRISR MOV @$ISR+$UVAR,R0 User ISR installed? JEQ INTEX * ... INTEX LWPI >83C0 Change to console's ISR WS RTWP Return to caller of console ISR ...lee Wow. FbForth is like a duck. There is a LOT of activity beneath the water. I had know idea. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 4, 2020 Share Posted December 4, 2020 Well I can't match GCC but it's close. What was changed: Wrote a macro to inline the VDP byte write. Used Forth to add the write bit to the VDP address and swap the bytes at compile time for the 1st byte write Changed the compiler's FOR/NEXT loop to use R15 for counting. Previously it was in memory on the return stack in 8 bit space. Added a LIMI instruction to the compiler so I don't have to deal interrupts in the VSBW Added a Register load command to the compiler. Used it to pre-load R3 with the VDPWA port before the loop. There are some clear optimizations I can make in future where Forth is refilling the top of stack cache register (DROP) and the following it up right after with a push the TOS register onto the stack. (DUP) ( push/pop optimization) I have the beginnings of that working but some conditions break it so it's not reliable. Language First Pass Optimized GCC 15 sec 5 sec Assembly 17 sec 5 sec Machine Forth 17 sec 7 sec TurboForth 48 sec 29 sec Compiled XB 51 sec none yet FbForth 70 sec 26 sec GPL 80 sec none yet ABASIC 490 sec none yet XB 2000 sec none yet UCSD Pascal 7300 sec 780 sec Spoiler \ Tursi's Sprite demo for Machine Forth Optimized B Fox Dec 4 2020 NEEDS MFORTH FROM DSK2.MFORTH COMPILER: HEX NEW. 2000 ORIGIN. INCLUDE DSK2.VDPLIB INCLUDE DSK2.SCREENIO INCLUDE DSK2.SPRITES COMPILER: \ extend the compiler commands HEX H: VSBW ( n --) \ MACRO, compiles inline code \ Trick: Use Forth to OR the write bit \ and swap bytes the first time at compile time :-) R0 SWAP 4000 OR >< LI, R0 R3 ** MOVB, R0 SWPB, R0 R3 ** MOVB, TOS SWPB, TOS VDPWD @@ MOVB, TOS POP, ;H \ macros that use a macro H: SP0.X! ( n --) 301 VSBW ;H H: SP0.Y! ( n --) 300 VSBW ;H : LIMI ( n -- ) LIMI, ; : REG! ( n -- ) TOS SWAP MOV, TOS POP, ; DECIMAL TARGET: PROG: SPRITEDEMO GRAPHICS DELALL 1 # MAGNIFY 42 # 1 # 0 # 0 # SPRITE0 0 LIMI VDPWA # 3 REG! 100 # FOR 0 # 240 # FOR DUP SP0.X! 1+ NEXT DROP 0 # 176 # FOR DUP SP0.Y! 1+ NEXT DROP 239 # FOR I SP0.X! NEXT 175 # FOR I SP0.Y! NEXT NEXT TEXT NEXT, END. HOST: 1 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.