senior_falcon Posted May 12, 2021 Share Posted May 12, 2021 1 hour ago, RXB said: Good idea though the registers are in Scratch Pad RAM and putting the program into Scratch Pad RAM would benefit some speed. Don't overlook the cost of moving the 30 bytes. I assume you are doing it with GPL, and that would take hundreds of instructions, and maybe as much as a thousand, to get it moved over. Plus you'll get back a little room in the grom for other things. Quote Link to comment Share on other sites More sharing options...
RXB Posted May 12, 2021 Share Posted May 12, 2021 15 minutes ago, senior_falcon said: Don't overlook the cost of moving the 30 bytes. I assume you are doing it with GPL, and that would take hundreds of instructions, and maybe as much as a thousand, to get it moved over. Plus you'll get back a little room in the grom for other things. LOL this is a demo of RXB using CALL MOVES("RV",2079,8192,0) ! 2079 bytes to move using GPL MOVE command and from 8192 and 0 is VDP screen. If you go to 7 minutes on video you see SAMS and MOVES used together and tell me that is slow. It is loading entire graphics into VDP memory, screen, color, sprites and characters all in two commands. RXBBLOADBSAVEAMS.avi - YouTube 1 Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted May 12, 2021 Share Posted May 12, 2021 Glad I could be amusing. 1 1 Quote Link to comment Share on other sites More sharing options...
+Ksarul Posted May 12, 2021 Share Posted May 12, 2021 48 minutes ago, RXB said: LOL this is a demo of RXB using CALL MOVES("RV",2079,8192,0) ! 2079 bytes to move using GPL MOVE command and from 8192 and 0 is VDP screen. If you go to 7 minutes on video you see SAMS and MOVES used together and tell me that is slow. It is loading entire graphics into VDP memory, screen, color, sprites and characters all in two commands. RXBBLOADBSAVEAMS.avi - YouTube I realize it is a fast thing in RXB, but that call does have a bit of underlying GPL overhead, doesn't it? I think @senior_falcon may have been referring to the entire low-level GPL sequence that sets up a move, not just the move itself (which would be fast once set up, as you have demonstrated). The question comes down to: how many machine cycles does it take to set up and move 30 bytes vs. running those same 30 bytes from ROM3? You're not moving the registers, so then the question becomes: how many cycles am I saving by running the 30 byte program in Scratch Pad vs. running it out of ROM? If you are using the routine a lot, the move may definitely squeeze some additional performance out of the system, but if not, the cost of making the move may be higher than the number of machine cycles you save. Each of you is looking at the problem from a different perspective--and that difference may help find an elegant solution better than either one of you would come up with on your own. 3 1 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 12, 2021 Share Posted May 12, 2021 My quick analysis - probably all wrong. ? If you're copying 96 patterns, that's 768 bytes. So if moving each byte takes 3 assembly instructions, you could save 3 * 4 cycles/memory access * 768 memory accesses = 9,216 clock cycles by running the code from scratch pad. This means that moving each of the 30 bytes to scratch pad must take less than 9,216 cycles / 30 = 307 cycles to get any benefit (assuming you have to do this every time). If you do it in assembly then you could get a small benefit, but from GPL probably not. The maximum possible benefit is 9,216 cycles, which is less than 1/360 second. 2 Quote Link to comment Share on other sites More sharing options...
RXB Posted May 12, 2021 Share Posted May 12, 2021 3 hours ago, senior_falcon said: Glad I could be amusing. Top left is RXB 2020, top right is RXB 2021 and bottom is XB. So changing from old XB / RXB 2020 routines from GROM to ROM and using a different MOVE routine I get this: Quote Link to comment Share on other sites More sharing options...
RXB Posted May 12, 2021 Share Posted May 12, 2021 2 hours ago, Asmusr said: My quick analysis - probably all wrong. ? If you're copying 96 patterns, that's 768 bytes. So if moving each byte takes 3 assembly instructions, you could save 3 * 4 cycles/memory access * 768 memory accesses = 9,216 clock cycles by running the code from scratch pad. This means that moving each of the 30 bytes to scratch pad must take less than 9,216 cycles / 30 = 307 cycles to get any benefit (assuming you have to do this every time). If you do it in assembly then you could get a small benefit, but from GPL probably not. The maximum possible benefit is 9,216 cycles, which is less than 1/360 second. Oddly you have to use GPL to move the Source, Destination and number of bytes to move into the Assembly routine then call that routine so even using XB built in ROMs is only faster for larger chunks and small chunks are actually slower then the GPL MOVE routine. After all there is not need to move them to what Assembly needs to use them. All the variables are predefined in XB using GPL not in Assembly. Quote Link to comment Share on other sites More sharing options...
RXB Posted May 12, 2021 Share Posted May 12, 2021 I went shopping and when I came back XB and RXB 2020 was finally done and about same value, RXB 2021 is almost double as fast. Slowest to fastest: RXB 2020 46 minutes 33 seconds, XB 46 minutes 11 seconds and RXB 2021 24 minutes 11 seconds. Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted May 13, 2021 Share Posted May 13, 2021 (edited) The GPL line of code in the console grom at >125A is: MOVE >0040 TO >8300 FROM GROM @>1263 I assume that you are moving the 30 bytes of code to the scratchpad using an instruction like this, but of course only moving >1E bytes This takes about 356 assembly language instructions to move >1E bytes, and this doesn't include the return, which would be a few more instructions. About 56 instructions are needed to set it up, and then, once in the actual loop, it takes 10 instructions to move each byte. So clearly the overhead involved in moving those 30 bytes is greater than any time savings you would get from running them without wait states from scratchpad memory. (Edit 5/14) i am no longer convinced that it will be slower. In the GVZ1 loop, it takes 2304 assembly instructions to move 768 bytes from grom to vdp. The speed increase from running that on the 16 bit bus might outweigh the speed penalty incurred in moving the program to the scratchpad. I will leave it to someone more knowledgeable than I am to determine that. It will be pretty close either way, but for the reasons stated in the two posts below, it seems better to leave the program in cartridge rom and run it from there. Edited May 14, 2021 by senior_falcon 2 1 Quote Link to comment Share on other sites More sharing options...
senior_falcon Posted May 13, 2021 Share Posted May 13, 2021 There is one possible advantage in moving short assembly routines to the scratchpad as Rich has done. Only 48 XML vectors are available in cartridge rom, with 16 each located at >6010, >6030, and >7000. If you had a need for more than 48, one possibility would be to move the short routine to >8300 and use XML >F0 to run it. Or you could move B @ADDRESS to >8300 and then XML >F0 which goes to >8300, then branches to the routine in cartridge ram. One concern I would have about moving 30 bytes of code to scratchpad is that you are getting into the XB permanent storage area starting at >8318 which could cause problems later on in the program. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted May 13, 2021 Share Posted May 13, 2021 4 hours ago, senior_falcon said: There is one possible advantage in moving short assembly routines to the scratchpad as Rich has done. Only 48 XML vectors are available in cartridge rom, with 16 each located at >6010, >6030, and >7000. If you had a need for more than 48, one possibility would be to move the short routine to >8300 and use XML >F0 to run it. Or you could move B @ADDRESS to >8300 and then XML >F0 which goes to >8300, then branches to the routine in cartridge ram. One concern I would have about moving 30 bytes of code to scratchpad is that you are getting into the XB permanent storage area starting at >8318 which could cause problems later on in the program. Smart guy yea. 32 bytes from >8300 to >8317 are only temporary and above that are things like String space start and ending. So I can get away with 24 bytes in most cases but at 32 I have to stash some of those in another place or reset of them results as they all have back ups. Same problem for using FAC & ARGument area as less then 32 bytes. So running from ROM is sometimes the only option. 1 Quote Link to comment Share on other sites More sharing options...
RXB Posted June 7, 2021 Share Posted June 7, 2021 Well now it is apparent that I need the lower 4K XB ROM in order to add Assembly Routines to XB. Turns out the routines I am creating to add to XB need GVWITES, VGWITES, MVDN and MVUP routines in XB ROM lower 4K. So that would be 3 banks with lower 4K bank of XB ROMs. Meanwhile I can use upper 4K for the additional assembly routines. Example of commands I am adding: ******************************************************************************** * direction = U (up), D (Down), L (Left), R (Right) * repetition = repeat number times * string = string or string variable to display just like PRINT ******************************************************************************** * CALL CLEAR(direction,...) * Assembly ROM CALL CLEAR but all 4 direction clears screen ********************************************************************************* * CALL ROLL(direction,...) * CALL ROLL(direction, repetition,...) * Assembly roll the screen like a drum with number of times also ********************************************************************************** * CALL SCROLL(direction,...) * CALL SCROLL(direction,repetition,...) * CALL SCROLL(direction,repetition,string...) * Assembly scroll screen like PRINT but all 4 directions * Also adds in number of times along with the last line displayed I will also replace with assembly: CHARPAT, COINC, COLOR, DISTANCE, GCHAR, GMOTION, HPUT, VPUT, HGET, VGET, HCHAR, VCHAR, LOCATE, MOTION, PATTERN, POSITION, RMOTION, SPRITE Some of these will be markedly faster others may be about the same speed. (I am exited about the future here....) 6 Quote Link to comment Share on other sites More sharing options...
RXB Posted June 9, 2021 Share Posted June 9, 2021 Well anyone with Assembly talent take a look at this source and see what you think: First this is section of XB ROM that I am using to make the new commands: 4076 ************************************************************ 4077 7F7E AORG >7F7E 4079 4080 * (VDP to VDP) or (RAM to RAM) 4081 * WITHOUT ERAM : Move the contents in VDP RAM from a lower 4082 * address to a higher address avoiding a 4083 * possible over-write of data 4084 * >835C ARG : byte count 4085 * >8300 VAR0 : source address 4086 * >8306 VARY2 : destination address 4087 * WITH ERAM Same as above except moves ERAM to ERAM 4088 4089 7F7E C060 MVDN MOV @ARG,R1 Get byte count 7F80 835C 4090 7F82 C160 MOV @VARY2,R5 Get destination 7F84 8306 4091 7F86 C0E0 MOV @VAR0,R3 Get source 7F88 8300 4092 7F8A C1E0 MVDN2 MOV @RAMTOP,R7 ERAM or VDP? 7F8C 8384 99/4 ASSEMBLER MVDNS PAGE 0094 4093 7F8E 1612 JNE MV01 ERAM, so handle it 4094 7F90 1002 JMP MV05 VDP, so jump into loop 4095 7F92 0605 MVDN1 DEC R5 4096 7F94 0603 DEC R3 4097 7F96 MV05 EQU $ 4098 7F96 D7E0 MOVB @R3LB,*R15 Write out read address 7F98 83E7 4099 7F9A D7C3 MOVB R3,*R15 4100 7F9C D1E0 MOVB @XVDPRD,R7 Read a byte 7F9E 8800 4101 7FA0 D7E0 MOVB @R5LB,*R15 Write out write address 7FA2 83EB 4102 7FA4 0265 ORI R5,WRVDP Enable VDP write 7FA6 4000 4103 7FA8 D7C5 MOVB R5,*R15 4104 7FAA D807 MOVB R7,@XVDPWD Write the byte 7FAC 8C00 4105 7FAE 0601 DEC R1 One less byte to move 4106 7FB0 16F0 JNE MVDN1 Loop if more to move 4107 7FB2 045B RT 4108 7FB4 MV01 EQU $ 4109 7FB4 D553 MVDNZ1 MOVB *R3,*R5 Move a byte 4110 7FB6 0603 DEC R3 Decrement destination 4111 7FB8 0605 DEC R5 Decrement source 4112 7FBA 0601 DEC R1 One less byte to move 4113 7FBC 16FB JNE MVDNZ1 Loop if more to move 4114 7FBE 045B RT 4115 ************************************************************ 4116 And here are the new commands and GPL will do the conversion from command to access the values used and pass them to Assembly. ******************************************************************************** * CALL CLEAR(direction,...) * Assembly ROM CALL CLEAR but all 4 direction clears screen ********************************************************************************* * CALL ROLL(direction,...) * CALL ROLL(direction,repetition,...) * Assembly roll the screen like a drum with number of times also **************************************************************** * XML MVDN (MV05 for VDP) (VDP to VDP) * ************************************************************** * CALL RROLL(direction,repetion,...) * **************************************************************** RROLL LI R2,24 * ROW counter LI R3,31 * 31 right edge Screen Address LI R5,>03C0 * Buffer for 24 characters RRL1 LI R1,1 * 1 Byte BL @MV05 * Byte from screen to buffer INC R5 * Buffer+1 AI R3,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE RRL1 * 0? No loop * Buffer has 24 far right characters LI R2,24 * ROW counter CLR R3 * Screen address LI R5,1 * Destination RRL2 LI R1,31 * Number of Bytes BL @MV05 * Move screen line over AI R3,32 * VDP address+32 AI R5,32 * VDP destination address+32 DEC R2 * ROW COUNTER-1 JNE RRL2 * 0? No loop * Moved all on screen 1 right LI R2,24 * ROW counter LI R3,>03C0 * Buffer for 1 character LI R5,31 * 31 right edge Screen Address RRL3 LI R1,1 * Get 1 Byte BL @MV05 * Byte from buffer to screen INC R3 * Buffer+1 AI R5,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE RRL3 * 0? No loop RT **************************************************************** * CALL LROLL(direction,repetion,...) * **************************************************************** LROLL LI R2,24 * ROW counter CLR R3 * 0 left edge Screen Address LI R5,>03C0 * Buffer for 24 characters LRL1 LI R1,1 * 1 Byte BL @MV05 * Byte from screen to buffer INC R5 * Buffer+1 AI R3,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE LRL1 * 0? No loop * Buffer has 24 far left characters LI R2,24 * ROW counter LI R3,1 * Screen address CLR R5 * Destination LRL2 LI R1,31 * Number of Bytes BL @MV05 * Move screen line over AI R3,32 * VDP address+32 AI R5,32 * VDP destination address+32 DEC R2 * ROW COUNTER-1 JNE LRL2 * 0? No loop * Moved all on screen 1 left LI R2,24 * ROW counter LI R3,>03C0 * Buffer for 1 character CLR R5 * 0 left edge Screen Address RRL3 LI R1,1 * Get 1 Byte BL @MV05 * Byte from buffer to screen INC R3 * Buffer+1 AI R5,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE RRL3 * 0? No loop RT * ************************************************************** * CALL UROLL(direction,repetion,...) * ************************************ *************************** UROLL CLR R3 * 0 top Sreeen Adress LI R5,>03C0 * Buffer for 32 characters LI R1,32 * 32 bytes length BL @MV05 * Bytes from screen to buffer * Buffer has 32 top line characters LI R3,32 * 2nd line Screen address CLR R5 * 0 screen Destination LI R1,736 * Number of Bytes BL @MV05 * Move screen line 1 up * Moved all on screen up 1 LI R3,>03C0 * Buffer for 32 characters LI R5,736 * Bottom left edge Screen Address LI R1,32 * Get 32 Bytes length BL @MV05 * Byte from buffer to screen RT * ************************************************************** * CALL DROLL(direction,repetion,...) * ************************************ *************************** DROLL LI R3,736 * Bottom of Sreeen Adress LI R5,>03C0 * Buffer for 32 characters LI R1,32 * 32 bytes length BL @MV05 * Bytes from screen to buffer * Buffer has 32 top line characters LI R3,32 * 2nd line Screen address CLR R5 * 0 screen Destination LI R1,736 * Number of Bytes BL @MV05 * Move screen line 1 down * Moved all on screen up 1 LI R3,>03C0 * Buffer for 32 characters CLR R5 * Top left edge Screen Address LI R1,32 * Get 32 Bytes BL @MV05 * Byte from buffer to screen RT *********************************************************************************** END If you have any suggestion please pass them along to me. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted June 9, 2021 Share Posted June 9, 2021 1 hour ago, RXB said: First this is section of XB ROM that I am using to make the new commands: A comment and a question about the XB snippet: First, the comment—unless it is needed for time delay to avoid VDP overrun and/or the MV05 loop is BLed from somewhere else in XB, the ORI instruction can be hoisted from the loop, as indicated in the spoiler below, to save a little time. Spoiler ********************************************************** AORG >7F7E * (VDP to VDP) or (RAM to RAM) * WITHOUT ERAM : Move the contents in VDP RAM from a lower * address to a higher address avoiding a * possible over-write of data * >835C ARG : byte count * >8300 VAR0 : source address * >8306 VARY2 : destination address * WITH ERAM Same as above except moves ERAM to ERAM MVDN MOV @ARG,R1 Get byte count MOV @VARY2,R5 Get destination * <<-------+ MOV @VAR0,R3 Get source | MVDN2 MOV @RAMTOP,R7 ERAM or VDP? | JNE MV01 ERAM, so handle it | JMP MV05 VDP, so jump into loop | MVDN1 DEC R5 | DEC R3 | MV05 EQU $ | MOVB @R3LB,*R15 Write out read address | MOVB R3,*R15 | MOVB @XVDPRD,R7 Read a byte | MOVB @R5LB,*R15 Write out write address | ORI R5,WRVDP Enable VDP write >>-------+ MOVB R5,*R15 MOVB R7,@XVDPWD Write the byte DEC R1 One less byte to move JNE MVDN1 Loop if more to move RT MV01 EQU $ MVDNZ1 MOVB *R3,*R5 Move a byte DEC R3 Decrement destination DEC R5 Decrement source DEC R1 One less byte to move JNE MVDNZ1 Loop if more to move RT ************************************************************ Second, the question—does RXB require expansion RAM? If it does, you could dispense with the VRAM-to-VRAM-copy code to save ROM space. ...lee Quote Link to comment Share on other sites More sharing options...
RXB Posted June 9, 2021 Share Posted June 9, 2021 15 minutes ago, Lee Stewart said: A comment and a question about the XB snippet: First, the comment—unless it is needed for time delay to avoid VDP overrun and/or the MV05 loop is BLed from somewhere else in XB, the ORI instruction can be hoisted from the loop, as indicated in the spoiler below, to save a little time. Hide contents ********************************************************** AORG >7F7E * (VDP to VDP) or (RAM to RAM) * WITHOUT ERAM : Move the contents in VDP RAM from a lower * address to a higher address avoiding a * possible over-write of data * >835C ARG : byte count * >8300 VAR0 : source address * >8306 VARY2 : destination address * WITH ERAM Same as above except moves ERAM to ERAM MVDN MOV @ARG,R1 Get byte count MOV @VARY2,R5 Get destination * <<-------+ MOV @VAR0,R3 Get source | MVDN2 MOV @RAMTOP,R7 ERAM or VDP? | JNE MV01 ERAM, so handle it | JMP MV05 VDP, so jump into loop | MVDN1 DEC R5 | DEC R3 | MV05 EQU $ | MOVB @R3LB,*R15 Write out read address | MOVB R3,*R15 | MOVB @XVDPRD,R7 Read a byte | MOVB @R5LB,*R15 Write out write address | ORI R5,WRVDP Enable VDP write >>-------+ MOVB R5,*R15 MOVB R7,@XVDPWD Write the byte DEC R1 One less byte to move JNE MVDN1 Loop if more to move RT MV01 EQU $ MVDNZ1 MOVB *R3,*R5 Move a byte DEC R3 Decrement destination DEC R5 Decrement source DEC R1 One less byte to move JNE MVDNZ1 Loop if more to move RT ************************************************************ Second, the question—does RXB require expansion RAM? If it does, you could dispense with the VRAM-to-VRAM-copy code to save ROM space. ...lee Thanks Lee I can recreate XB ROMs to include your mod as indicated! RXB can run on Console alone or with Expansion RAM and or SAMS too. This particular set of commands are strictly for VDP RightRoll, LeftRoll, UpRoll, DownRoll and same for SCROLL in all directions next addition. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted June 9, 2021 Share Posted June 9, 2021 12 minutes ago, RXB said: Thanks Lee I can recreate XB ROMs to include your mod as indicated! If you do make the ORI change, you will need to add that line before each of your MV05 calls (which I had not yet noticed when I made the suggestion), so maybe it is not worth it. ...lee Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted June 9, 2021 Share Posted June 9, 2021 2 hours ago, RXB said: And here are the new commands At first blush, You cannot use MV05 for overlapping copies to lower destination addresses because it will destroy the overlap region. This means you cannot use it for LROLL. UROLL is fine because there is no overlap, but there is a more efficient way I will work on. To use MV05 to copy more than one byte, you must pass the end source and destination addresses, not the beginning. You need to add one less than the byte count to each address. You are using R3 and R5 without realizing that MV05 is corrupting them. In RROLL, you are adding the saved column to the end (where you got it) rather than the beginning (where it belongs). I will work on this some more, but first this question: What is the largest block of free RAM in scratchpad for RXB? Can we use the FAC – ARG area (>834A – >836D)? If so, we could buffer a row there so we could use VDP multibyte copies for UROLL and DROLL. ...lee 3 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted June 9, 2021 Share Posted June 9, 2021 8 hours ago, Lee Stewart said: I will work on this some more, but first this question: What is the largest block of free RAM in scratchpad for RXB? Can we use the FAC – ARG area (>834A – >836D)? If so, we could buffer a row there so we could use VDP multibyte copies for UROLL and DROLL. That would be so much faster. Why not also for RROLL and LROLL? 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted June 9, 2021 Share Posted June 9, 2021 34 minutes ago, Asmusr said: That would be so much faster. Why not also for RROLL and LROLL? Indeed. I kind of thought of that after I posted (I think). I was a little punchy and needed to get back to bed. ? ...lee Quote Link to comment Share on other sites More sharing options...
RXB Posted June 9, 2021 Share Posted June 9, 2021 4 hours ago, Lee Stewart said: Indeed. I kind of thought of that after I posted (I think). I was a little punchy and needed to get back to bed. ? ...lee Sloppy but this should work: AORG >7000 * Windy XB routines **************************************************************** * direction = U (up), D (Down), L (Left), R (Right) * repetition = repeat number times * string = string or string variable to display just like PRINT **************************************************************** * CALL CLEAR(direction,...) * Assembly ROM CALL CLEAR but all 4 direction clears screen **************************************************************** * CALL ROLL(direction,...) * CALL ROLL(direction,repetition,...) * Assembly roll the screen like a drum with number of times also **************************************************************** * XML MVDN (MV05 for VDP) (VDP to VDP) * ************************************************************** * CALL RROLL(direction,repetion,...) * **************************************************************** RROLL LI R2,24 * ROW counter LI R3,31 * 31 right edge Screen Address LI R5,>03C0 * Buffer for 24 characters LI R1,1 * 1 Byte RRL1 MOVB @>83E7,*R15 Write out read address MOVB R3,*R15 MOVB @>8800,R7 Read a byte MOVB @>83EB,*R15 Write out write address ORI R5,>4000 Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 Write the byte INC R5 * Buffer+1 AI R3,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE RRL1 * 0? No loop * Buffer has 24 far right characters LI R2,24 * ROW counter CLR R3 * Screen address LI R5,1 * Destination RRL2 LI R1,31 * Number of Bytes RRL2L MOVB @>83E7,*R15 Write out read address MOVB R3,*R15 MOVB @>8800,R7 Read a byte MOVB @>83EB,*R15 Write out write address ORI R5,>4000 Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 Write the byte AI R3,32 * VDP address+32 AI R5,32 * VDP destination address+32 INC R1 * Column+1 JNE RRL2L * 0? No loop DEC R2 * ROW COUNTER-1 JNE RRL2 * 0? No loop * Moved all on screen 1 right LI R2,24 * ROW counter LI R3,>03C0 * Buffer for 1 character LI R5,1 * 1 left edge Screen Address RRL3 LI R1,1 * Get 1 Byte MOVB @>83E7,*R15 Write out read address MOVB R3,*R15 MOVB @>8800,R7 Read a byte MOVB @>83EB,*R15 Write out write address ORI R5,>4000 Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 Write the byte INC R3 * Buffer+1 AI R5,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE RRL3 * 0? No loop RT **************************************************************** * CALL LROLL(direction,repetion,...) * **************************************************************** LROLL LI R2,24 * ROW counter CLR R3 * 0 left edge Screen Address LI R5,>03C0 * Buffer for 24 characters LI R1,1 * 1 Byte LRL1 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R5 * Buffer+1 AI R3,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE LRL1 * 0? No loop * Buffer has 24 far left characters LI R2,24 * ROW counter LI R3,1 * Screen address CLR R5 * Destination LRL2 LI R1,31 * Number of Bytes LRL2L MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte AI R3,32 * VDP address+32 AI R5,32 * VDP destination address+32 INC R1 * COLUMN COUNTER+1 JNE LRL2L * 0? No loop DEC R2 * ROW COUNTER-1 JNE LRL2 * 0? No loop * Moved all on screen 1 left LI R2,24 * ROW counter LI R3,>03C0 * Buffer for 1 character LI R5,31 * 31 right edge Screen Address LI R1,1 * Get 1 Byte LRL3 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R3 * Buffer+1 AI R5,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE RRL3 * 0? No loop RT * ************************************************************** * CALL UROLL(direction,repetion,...) * ************************************ *************************** UROLL CLR R3 * 0 top Sreeen Adress LI R5,>03C0 * Buffer for 32 characters LI R1,32 * 32 bytes length URL1 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R3 * COLUMN+1 INC R4 * BUFFER+1 DEC R1 * COUNTER-1 JNE URL1 * 0? No loop * Buffer has 32 top line characters LI R3,32 * 2nd line Screen address CLR R5 * 0 screen Destination LI R1,736 * Number of Bytes URL2 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R3 * lower screen line INC R5 * upper scren line DEC R1 * COUNTER-1 JNE URL2 * 0? No loop * Moved all on screen up 1 LI R3,>03C0 * Buffer for 32 characters LI R5,736 * Bottom left edge Screen Address LI R1,32 * Get 32 Bytes length URL3 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R3 * BUFFER+1 INC R5 * SCREEN+1 DEC R1 * COUNTER-1 JNE URL3 * 0? No loop RT * ************************************************************** * CALL DROLL(direction,repetion,...) * ************************************ *************************** DROLL LI R3,736 * Bottom of Sreeen Adress LI R5,>03C0 * Buffer for 32 characters LI R1,32 * 32 bytes length DRL1 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R3 * SCREEN+1 INC R5 * BUFFER+1 DEC R1 * COUNTER-1 JNE DRL1 * 0? No loop * Buffer has 32 top line characters LI R3,32 * 2nd line Screen address CLR R5 * 0 screen Destination LI R1,736 * Number of Bytes DRL2 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte INC R3 * lower screen address INC R5 * upper screen address DEC R1 * COUNTER-1 JNE DRL2 * 0? No loop * Moved all on screen up 1 LI R3,>03C0 * Buffer for 32 characters CLR R5 * Top left edge Screen Address LI R1,32 * Get 32 Bytes DRL3 MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte DEC R3 * COUNTER-1 JNE DRL3 * 0? No loop RT *************************************************************** END 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted June 10, 2021 Share Posted June 10, 2021 Here is my first pass at your roll routines. The following spoiler has copy routines for VRAM to RAM and RAM to VRAM as well as the first roll routine, RROLL, that uses them: Spoiler SAVRTN EQU >836C * free space after RAM line buffer VBUFF EQU >03C0 * line buffer in VRAM * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Write VRAM address * Expects address in R0 * * BL here for writing data * VWADDW ORI R0,>4000 * set to write VRAM data * * BL here for reading data * VWADD MOVB @83E1,*R15 * write LSB of R0 to VDPWA MOVB R0,*R15 * write MSB of R0 to VDPWA RT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The following utilities expect * R0 = VRAM address * R1 = RAM result for 1 byte or RAM address for multiple bytes * R2 = count for multi-byte copies * * R10 will be destroyed * * Copy one byte from VRAM (R0) to R1 * VSR MOV R11,R10 * save return BL @VWADD * write out VRAM read address MOVB @8800,R1 * read VRAM byte B *R10 * return to caller * * Copy one byte from R1 to VRAM (R0) * VSW MOV R11,R10 * save return BL @VWADDW * write out VRAM write address MOVB R1,@8C00 * write VRAM byte B *R10 * return to caller * * Copy R2 bytes from VRAM (R0) to RAM (R1) * VMR MOV R11,R10 * save return BL @VWADD * write out VRAM read address VMRLP MOVB @8800,*R1+ * read next VRAM byte to RAM DEC R2 * dec count JNE VMRLP * repeat if not done B *R10 * return to caller * * Copy R2 bytes from RAM (R1) to VRAM (R0) * VMW MOV R11,R10 * save return BL @VWADDW * write out VRAM write address VMWLP MOVB *R1+,@8C00 * write next VRAM byte from RAM DEC R2 * dec count JNE VMWLP * repeat if not done B *R10 * return to caller * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * RROLL * RROLL MOV R11,@SAVRTN * save return address CLR R0 * set to screen start LI R3,24 * rows to roll * Write row to RAM buffer RROLLP LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * copy last column to first MOVB @FAC+31,R1 * get RAM byte to copy BL @VSW * write end byte to row start * copy 1st 31 columns one column right INC R0 * start at 2nd column LI R1,FAC * reset RAM buffer pointer LI R2,31 * count BL @VMW * copy rest of line * Process next row AI R0,31 * next row DEC R3 * dec row count JNE RROLLP * roll next row if not done MOV @SAVRTN,R11 * restore return address RT ...lee 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted June 10, 2021 Share Posted June 10, 2021 OK, Rich (@RXB), here is the complete suite: Spoiler SAVRTN EQU >836C * free space after RAM line buffer VBUFF EQU >03C0 * line buffer in VRAM * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Write VRAM address * Expects address in R0 * * BL here for writing data * VWADDW ORI R0,>4000 * set to write VRAM data * * BL here for reading data * VWADD MOVB @83E1,*R15 * write LSB of R0 to VDPWA MOVB R0,*R15 * write MSB of R0 to VDPWA RT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The following utilities expect * R0 = VRAM address * R1 = RAM result for 1 byte or RAM address for multiple bytes * R2 = count for multi-byte copies * * R10 will be destroyed * * Copy one byte from VRAM (R0) to R1 * VSR MOV R11,R10 * save return BL @VWADD * write out VRAM read address MOVB @8800,R1 * read VRAM byte B *R10 * return to caller * * Copy one byte from R1 to VRAM (R0) * VSW MOV R11,R10 * save return BL @VWADDW * write out VRAM write address MOVB R1,@8C00 * write VRAM byte B *R10 * return to caller * * Copy R2 bytes from VRAM (R0) to RAM (R1) * VMR MOV R11,R10 * save return BL @VWADD * write out VRAM read address VMRLP MOVB @8800,*R1+ * read next VRAM byte to RAM DEC R2 * dec count JNE VMRLP * repeat if not done B *R10 * return to caller * * Copy R2 bytes from RAM (R1) to VRAM (R0) * VMW MOV R11,R10 * save return BL @VWADDW * write out VRAM write address VMWLP MOVB *R1+,@8C00 * write next VRAM byte from RAM DEC R2 * dec count JNE VMWLP * repeat if not done B *R10 * return to caller * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * RROLL * RROLL MOV R11,@SAVRTN * save return address CLR R0 * set to screen start LI R3,24 * rows to roll * Write row to RAM buffer RROLLP LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * copy last column to first MOVB @FAC+31,R1 * get RAM byte to copy BL @VSW * write end byte to row start * copy 1st 31 columns one column right INC R0 * start at 2nd column LI R1,FAC * reset RAM buffer pointer LI R2,31 * count BL @VMW * copy rest of line * Process next row AI R0,31 * next row DEC R3 * dec row count JNE RROLLP * roll next row if not done MOV @SAVRTN,R11 * restore return address RT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * LROLL * LROLL MOV R11,@SAVRTN * save return address CLR R0 * set to screen start LI R3,24 * rows to roll * Write row to RAM buffer LROLLP LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * copy 1st 31 columns one column left LI R1,FAC+1 * set RAM buffer pointer to 2nd char LI R2,31 * count BL @VMW * copy rest of line * copy first column to last AI R0,31 * set VRAM dest to last column MOVB @FAC,R1 * get RAM byte to copy BL @VSW * write start byte to row end * Process next row INC R0 * next row DEC R3 * dec row count JNE LROLLP * roll next row if not done MOV @SAVRTN,R11 * restore return address RT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * UROLL * UROLL MOV R11,@SAVRTN * save return address CLR R0 * set to screen start LI R3,23 * rows to roll (all but 1st) * Write first row to RAM buffer LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * Copy RAM buffer to VRAM buffer LI R0,VBUFF * set VRAM dest to VBUFF LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMW * copy row to VBUFF (VRAM buffer) * Start copy loop at 2nd row LI R0,32 * point to 2nd row * Write row to RAM buffer UROLLP LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * Copy to previous row AI R0,-32 * back up 1 row LI R1,FAC * reset RAM buffer pointer LI R2,32 * count BL @VMW * copy to previous row * Process next row AI R0,64 * next row DEC R3 * dec row count JNE UROLLP * roll next row if not done * Copy saved row to RAM LI R0,VBUFF * set VRAM source LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * Copy saved row to last row LI R0,736 * point to last row LI R1,FAC * reset RAM buffer pointer LI R2,32 * count BL @VMW * copy to last row MOV @SAVRTN,R11 * restore return address RT * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * DROLL * DROLL MOV R11,@SAVRTN * save return address LI R0,736 * set to last row LI R3,23 * rows to roll (all but last) * Write last row to RAM buffer LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * Copy RAM buffer to VRAM buffer LI R0,VBUFF * set VRAM dest to VBUFF LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMW * copy row to VBUFF (VRAM buffer) * Start copy loop at 2nd-to-last row LI R0,704 * point to row 22 * Write row to RAM buffer DROLLP LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * Copy to next row AI R0,32 * down 1 row LI R1,FAC * reset RAM buffer pointer LI R2,32 * count BL @VMW * copy to next row * Process next row AI R0,-64 * back up 2 rows DEC R3 * dec row count JNE DROLLP * roll next row if not done * Copy saved row to RAM LI R0,VBUFF * set VRAM source LI R1,FAC * set RAM buffer LI R2,32 * count BL @VMR * copy row to RAM buffer * Copy saved row to first row CLR R0 * point to first row LI R1,FAC * reset RAM buffer pointer LI R2,32 * count BL @VMW * copy to first row MOV @SAVRTN,R11 * restore return address RT I have not had time to test it. The roll routines can probably be tightened up because there is quite a bit of redundancy, especially in UROLL and DROLL. ...lee 4 Quote Link to comment Share on other sites More sharing options...
RXB Posted June 10, 2021 Share Posted June 10, 2021 OMG thanks Lee learned much from this. GPL is going to call the ROLL routines as part of the SCROLL routines then if needed print last line in string replacing that roll line. Your use of FAC instead of VDP >03C0 for only 32 characters is pretty minimal for increase in speed overall. As unlike everyone else I am using the XML Registers instead of BLWP use by XB CALL INIT or EA CALL INIT I can get away with only BL Here is the latest from what I learned from you: AORG >7000 * Windy XB routines **************************************************************** * direction = U (up), D (Down), L (Left), R (Right) * repetition = repeat number times * string = string or string variable to display just like PRINT **************************************************************** * CALL CLEAR(direction,...) * Assembly ROM CALL CLEAR but all 4 direction clears screen **************************************************************** * CALL ROLL(direction,...) * CALL ROLL(direction,repetition,...) * Assembly roll the screen like a drum with number of times also **************************************************************** * XML MVDN (MV05 for VDP) (VDP to VDP) * ************************************************************** * CALL RROLL(direction,repetion,...) * **************************************************************** RROLL MOV R11,R8 * save return address LI R2,24 * ROW counter LI R3,31 * 31 right edge Screen Address LI R5,>03C0 * Buffer for 24 characters LI R1,1 * 1 Byte BL @LRLP1 * Move IT * Buffer has 24 far right characters LI R2,24 * ROW counter CLR R3 * Screen address LI R5,1 * Destination BL @LRLP2 * Move IT * Moved all on screen 1 right LI R2,24 * ROW counter LI R3,>03C0 * Buffer for 1 character LI R5,1 * 1 left edge Screen Address LI R1,1 * Get 1 Byte BL @LRLP3 * Move IT B *R8 * RETURN **************************************************************** * CALL LROLL(direction,repetion,...) * **************************************************************** LROLL MOV R11,R8 * save return address LI R2,24 * ROW counter CLR R3 * 0 left edge Screen Address LI R5,>03C0 * Buffer for 24 characters LI R1,1 * 1 Byte BL @LRLP1 * Move IT * Buffer has 24 far left characters LI R2,24 * ROW counter LI R3,1 * Screen address CLR R5 * Destination BL @LRLP2 * Move IT * Moved all on screen 1 left LI R2,24 * ROW counter LI R3,>03C0 * Buffer for 1 character LI R5,31 * 31 right edge Screen Address LI R1,1 * Get 1 Byte BL @LRLP3 * Move IT B *R8 * RETURN **************************************************************** LRLP1 MOV R11,R9 * save return address LRR1 BL @MIT * Move IT INC R5 * Buffer+1 AI R3,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE LRR1 * 0? No loop B *R9 * RETURN **************************************************************** LRLP2 MOV R11,R9 * save return address LRLPM LI R1,31 * Number of Bytes LRLP2 BL @MIT * Move IT AI R3,32 * VDP address+32 AI R5,32 * VDP destination address+32 INC R1 * COLUMN COUNTER+1 JNE LRLP2 * 0? No loop DEC R2 * ROW COUNTER-1 JNE LRLPM * 0? No loop B *R9 * RETURN **************************************************************** LRLP3 MOV R11,R9 * save return address LRLPN BL @MIT * Move IT INC R3 * Buffer+1 AI R5,32 * Screen Address+32 DEC R2 * ROW COUNTER-1 JNE LRLPN * 0? No loop B *R9 * RETURN **************************************************************** * CALL UROLL(direction,repetion,...) * ************************************ *************************** UROLL MOV R11,R8 * save return address CLR R3 * 0 top Sreeen Adress LI R5,>03C0 * Buffer for 32 characters LI R1,32 * 32 bytes length BL @UDLP1 * Move IT * Buffer has 32 top line characters LI R3,32 * 2nd line Screen address CLR R5 * 0 screen Destination LI R1,736 * Number of Bytes BL @UDLP2 * Move IT * Moved all on screen up 1 LI R3,>03C0 * Buffer for 32 characters LI R5,736 * Bottom left edge Screen Address LI R1,32 * Get 32 Bytes length BL @UDLP3 * Move IT B *R8 * RETURN * ************************************************************** * CALL DROLL(direction,repetion,...) * **************************************************************** DROLL MOV R11,R8 * save return address LI R3,736 * Bottom of Sreeen Adress LI R5,>03C0 * Buffer for 32 characters LI R1,32 * 32 bytes length BL @UDLP1 * Move IT * Buffer has 32 top line characters CLR R3 * 0 Screen address LI R5,32 * 2nd line Destination LI R1,736 * Number of Bytes BL @UDLP2 * Move IT * Moved all on screen up 1 LI R3,>03C0 * Buffer for 32 characters CLR R5 * Top left edge Screen Address LI R1,32 * Get 32 Bytes BL @UDLP3 * Move IT B *R9 * RETURN *************************************************************** UDLP1 MOV R11,R9 * save return address UDR1 BL @MIT * Move IT INC R3 * BUFFER+1 INC R5 * SCREEN+1 DEC R1 * COUNTER-1 JNE UDR1 * 0? No loop B *R9 * RETURN *************************************************************** UDLP2 MOV R11,R9 * save return address UDR2 BL @MIT * Move IT INC R3 * lower screen line INC R5 * upper scren line DEC R1 * COUNTER-1 JNE UDR2 * 0? No loop B *R9 * RETURN **************************************************************** UDLP3 MOV R11,R9 * save return address UDR3 BL @MIT * Move IT INC R3 * BUFFER+1 INC R5 * SCREEN+1 DEC R1 * COUNTER-1 JNE UDR3 * 0? No loop B *R9 * RETURN **************************************************************** MIT MOV R11,R10 * save return address MOVB @>83E7,*R15 * Write out read address MOVB R3,*R15 MOVB @>8800,R7 * Read a byte MOVB @>83EB,*R15 * Write out write address ORI R5,>4000 * Enable VDP write MOVB R5,*R15 MOVB R7,@>8C00 * Write the byte B *R10 * RETURN **************************************************************** END 3 Quote Link to comment Share on other sites More sharing options...
GDMike Posted June 10, 2021 Share Posted June 10, 2021 (edited) Lee, is your general replacement of VMBW and VMBR,VSBW and VSBR, EDITOR assembler routines for reading and writing to VDP ram, what you recommend in the examples above? Or do you have or use something different for general use ? And what sorta savings does it do if I use them instead of the EA version? Edited June 10, 2021 by GDMike Quote Link to comment Share on other sites More sharing options...
RXB Posted June 10, 2021 Share Posted June 10, 2021 1 hour ago, GDMike said: Lee, is your general replacement of VMBW and VMBR,VSBW and VSBR, EDITOR assembler routines for reading and writing to VDP ram, what you recommend in the examples above? Or do you have or use something different for general use ? And what sorta savings does it do if I use them instead of the EA version? Oddly all of these are built into the XB ROMs but they do not have the same names and it is strange that no one before has utilized them as space saving? I mean look at how much space you could save in Lower 8K using them! 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.