JamesD Posted September 30, 2015 Share Posted September 30, 2015 (edited) This is my code to print a character to the screen in my 64 column text code for the Atom.It will eventually be 80 columns for other machines but this is my current target.The first line of the font is always blank so it writes a zero to the screen for it without masking.The font is broken up by all of the first bytes in a table, all of the 2nd bytes in a table, etc... This eliminates having to perform a multiply in the code leading up to this (as discussed in the topic I started on multiplying by 7).I also don't have to reuse indirect indexed addressing so I don't have to manipulate y as much.The left and right nibble in the font are the same to keep from having to bit shift.The code does work but I'm open to suggestions for more speed.This is just the code to write the left nibbles, there is separate code for right nibbles. lda #0 sta (fscreen),y ; write to the screen lda FCol1,X ; get the 2nd byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #32 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen lda FCol2,X ; get the 3rd byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #64 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen lda FCol3,X ; get the 4th byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #96 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen lda FCol4,X ; get the 5th byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #128 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen lda FCol5,X ; get the 6th byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #160 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen lda FCol6,X ; get the 7th byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #192 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen lda FCol7,X ; get the 8th byte of the font and #%11110000 ; mask area used by new character sta temp0 ; save it ldy #224 ;point to next screen byte lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora temp0 ; combine background with font data sta (fscreen),y ; write back to the screen rts Edited September 30, 2015 by JamesD Quote Link to comment Share on other sites More sharing options...
mariuszw Posted September 30, 2015 Share Posted September 30, 2015 Use eor to write data to screen, like this: lda data eor (screen),y and mask eor (screen),y sta (screen),y Quote Link to comment Share on other sites More sharing options...
+JAC! Posted September 30, 2015 Share Posted September 30, 2015 Pre-mask FCOLxx per nibble and then: lda (fscreen),y ; read byte at destinationand #%00001111 ; mask off unused halfora FCOL1,x ; combine background with font data sta (fscreen),y Also depending on the environment you may get rid of the ldy #32 ldy #64 ldy #96 by having 8 screen pointers instead of one, all different by the relevant #32 bytes. But there is the tradeoff regarding how & when to update 8 poiners instead of 1. Additionally, if the output is line-wise, it makes sense to first clear a full line and then spare the "and #%00001111 ; mask off unused half" Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 30, 2015 Author Share Posted September 30, 2015 Pre-mask FCOLxx per nibble and then: lda (fscreen),y ; read byte at destination and #%00001111 ; mask off unused half ora FCOL1,x ; combine background with font data sta (fscreen),y Also depending on the environment you may get rid of the ldy #32 ldy #64 ldy #96 by having 8 screen pointers instead of one, all different by the relevant #32 bytes. But there is the tradeoff regarding how & when to update 8 poiners instead of 1. Additionally, if the output is line-wise, it makes sense to first clear a full line and then spare the "and #%00001111 ; mask off unused half" I think updating 8 pointers is going to be worse than the LDY since these change every character Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 30, 2015 Author Share Posted September 30, 2015 (edited) Use eor to write data to screen, like this: lda data eor (screen),y and mask eor (screen),y sta (screen),y Actually, I think it would have to be lda FCol6,X ldy #32 eor (fscreen),y and #%00001111 eor (fscreen),y sta (fscreen),y It's 2 instructions shorter but I'll have to count cycles to see how it compares since it uses one more complex addressing instruction rather than simple 2 cycle instructions That is what I was looking for though! Thanks! *edit* Well, it produced interesting results but not the desired result so it's not quite right. *edit* I copied over the data source. It works now. I'll just benchmark it. *edit* It saves 4 clock cycles per line of the character. So 28 per character for a 7 pixel high font like I'm using or 32 per character for an 8 pixel font. For a full text screen of 64x24 that's 6,144 clock cycles for the 7 pixel high font. That'll do! Edited September 30, 2015 by JamesD 1 Quote Link to comment Share on other sites More sharing options...
mariuszw Posted September 30, 2015 Share Posted September 30, 2015 ldy #32 lda (fscreen),y eor FCol6,x and #%11110000 eor FCol6,x sta (fscreen),y should do as well, saving additional cycle per byte written to screen memory. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 FWIW the first version saved 1 instruction on the 6803 and 3 on the Z80. It wasn't a big deal for the 6803 but that makes a big difference on the Z80 with it's larger number of cycles per instruction. I couldn't see a lot of difference visually on the 6502 code but every cycle helps. The bottleneck on the 6502 code is probably the scroll at this point. I'm not sure what I can do with an unrolled loop of this. lda (add1),y sta (add2),y iny ldy #32 lda (fscreen),y eor FCol6,x and #%11110000 eor FCol6,x sta (fscreen),y should do as well, saving additional cycle per byte written to screen memory. The only problem with that is that it doubles the number of lines of code I have to modify to switch fonts.The 6803 and Z80 require changing 1 line. This requires changing 28. Oh well, it was 14 already, what's a few more? Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 The bottleneck on the 6502 code is probably the scroll at this point. I'm not sure what I can do with an unrolled loop of this. lda (add1),y sta (add2),y iny This is the only thing I've come up with to speed this up. start: S1: LDA Source1,X D1: STA Dest1,X LDA Source2,X STA Dest2,X LDA Source3,X STA Dest3,X LDA Source4,X STA Dest4,X LDA Source5,X STA Dest5,X LDA Source6,X STA Dest6,X LDA Source7,X STA Dest7,X LDA Source8,X STA Dest8,X INX BNE start ;test for end of screen here B?? exit INC S1+2 INC D1+2 etc... Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 This is the only thing I've come up with to speed this up. start: S1: LDA Source1,X D1: STA Dest1,X LDA Source2,X STA Dest2,X LDA Source3,X STA Dest3,X LDA Source4,X STA Dest4,X LDA Source5,X STA Dest5,X LDA Source6,X STA Dest6,X LDA Source7,X STA Dest7,X LDA Source8,X STA Dest8,X INX BNE start ;test for end of screen here B?? exit INC S1+2 INC D1+2 etc... Actually, I think I can do something like this and scroll several lines of data in the loop start: LDA PTR2,X STA PTR1,X LDA PTR3,X STA PTR2,X LDA PTR4,X STA PTR3,X LDA PTR5,X STA PTR4,X LDA PTR6,X STA PTR5,X LDA PTR7,X STA PTR6,X LDA PTR8,X STA PTR7,X INX BNE start Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 This made a huge difference on my text scroll. 80 columns instead of 64 would be more difficult scroll2: ldx #0 @sloop: lda $8100,x sta $8000,x lda $8200,x sta $8100,x lda $8300,x sta $8200,x lda $8400,x sta $8300,x lda $8500,x sta $8400,x lda $8600,x sta $8500,x lda $8700,x sta $8600,x lda $8800,x sta $8700,x lda $8900,x sta $8800,x lda $8A00,x sta $8900,x lda $8B00,x sta $8A00,x lda $8C00,x sta $8B00,x lda $8D00,x sta $8C00,x lda $8E00,x sta $8D00,x lda $8F00,x sta $8E00,x lda $9000,x sta $8F00,x lda $9100,x sta $9000,x lda $9200,x sta $9100,x lda $9300,x sta $9200,x lda $9400,x sta $9300,x lda $9500,x sta $9400,x lda $9600,x sta $9500,x lda $9700,x sta $9600,x inx beq @nxt jmp @sloop @nxt: Quote Link to comment Share on other sites More sharing options...
FifthPlayer Posted October 1, 2015 Share Posted October 1, 2015 Is this for the A8? If so, you can use display list manipulation to scroll the data, rather than copying data around in the framebuffer. Keep a ringbuffer of lines for the display and use LMS instructions to point to lines in the ringbuffer. Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 Is this for the A8? If so, you can use display list manipulation to scroll the data, rather than copying data around in the framebuffer. Keep a ringbuffer of lines for the display and use LMS instructions to point to lines in the ringbuffer. Sadly it is not. It's for the Acorn Atom. I'm making a little library of routines for use with 6847 based machines. Then I want to create a few programs for that in C. Nothing spectacular, just something to fill some spare time. Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 The text rendering code now looks like thisThanks for your help! leftnibble: lda #0 sta (fscreen),y ; write to the screen ldy #32 ;point to next screen byte lda (fscreen),y eor FCol1,X ; EOR with the next byte of the font and #%00001111 eor FCol1,X ; EOR with the next byte of the font sta (fscreen),y ldy #64 ;point to next screen byte lda (fscreen),y eor FCol2,X ; EOR with the next byte of the font and #%00001111 eor FCol2,X ; EOR with the next byte of the font sta (fscreen),y ldy #96 ;point to next screen byte lda (fscreen),y eor FCol3,X ; EOR with the next byte of the font and #%00001111 eor FCol3,X ; EOR with the next byte of the font sta (fscreen),y ldy #128 ;point to next screen byte lda (fscreen),y eor FCol4,X ; EOR with the next byte of the font and #%00001111 eor FCol4,X ; EOR with the next byte of the font sta (fscreen),y ldy #160 ;point to next screen byte lda (fscreen),y eor FCol5,X ; EOR with the next byte of the font and #%00001111 eor FCol5,X ; EOR with the next byte of the font sta (fscreen),y ldy #192 ;point to next screen byte lda (fscreen),y eor FCol6,X ; EOR with the next byte of the font and #%00001111 eor FCol6,X ; EOR with the next byte of the font sta (fscreen),y ldy #224 ;point to next screen byte lda (fscreen),y eor FCol7,X ; EOR with the next byte of the font and #%00001111 eor FCol7,X ; EOR with the next byte of the font sta (fscreen),y rts 1 Quote Link to comment Share on other sites More sharing options...
MaPa Posted October 1, 2015 Share Posted October 1, 2015 (edited) It saves 4 clock cycles per line of the character. So 28 per character for a 7 pixel high font like I'm using or 32 per character for an 8 pixel font. For a full text screen of 64x24 that's 6,144 clock cycles for the 7 pixel high font. Actualy it's 43k cycles (64*24 chars * 28 cycles per char) And as JAC! already suggested, you can have two fonts with unused nibbles cleared and save another 4 cycles per byte, 28 per char so another 43k cycles. Edited October 1, 2015 by MaPa 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 (edited) Actualy it's 43k cycles (64*24 chars * 28 cycles per char)Yup, I had that and re-added on the calculator and missed the * 7. http://www.stardot.org.uk/forums/viewtopic.php?f=44&t=10149 And as JAC! already suggested, you can have two fonts with unused nibbles cleared and save another 4 cycles per byte, 28 per char so another 43k cycles.If I have 672 bytes to spare for the doubled font then I might do that but I'm not going to for the moment. The code is down to 6 instructions per byte already but that would get it to 4. I think that actually saves 7 cycles per byte over my current code. I clear the screen first and last line when I scroll. If I have to print over something I'll need an and though. ldy #224 ;point to next screen byte lda (fscreen),y ora FCol7,X ; OR with the next byte of the font sta (fscreen),y I just created a test version based on that using a clone of the existing font and I'd need to put them side by side to tell the difference. Most of the clock cycles are in the scroll. Edited October 1, 2015 by JamesD Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 1, 2015 Author Share Posted October 1, 2015 This does work with the extra font data but the speed difference amounts to about 1 line extra for an entire screen full once it's scrolling.During the screen drawing before the scroll it is much faster. print_64: ; register a contains character sec sbc #' ' ; printable character set data starts at space, ASCII 32 tax ; save as character table offset ; point screen to $8000 + row (base screen address + row) lda #$80 clc adc row ; adding row to MSB = 256 * row sta fscreen+1 ldy #0 ; top line is always black (1st byte of the font) ; start at zero offset from screen address ; add the column lda col ; 2 columns / byte lsr sta fscreen ; save it bcc leftnibble clc txa adc #96 tax ;************************************************** ;* left nibble ;************************************************** leftnibble: lda #0 sta (fscreen),y ; write to the screen ldy #32 ;point to next screen byte lda (fscreen),y ora FCol1,X ; EOR with the next byte of the font sta (fscreen),y ldy #64 ;point to next screen byte lda (fscreen),y ora FCol2,X ; EOR with the next byte of the font sta (fscreen),y ldy #96 ;point to next screen byte lda (fscreen),y ora FCol3,X ; EOR with the next byte of the font sta (fscreen),y ldy #128 ;point to next screen byte lda (fscreen),y ora FCol4,X ; EOR with the next byte of the font sta (fscreen),y ldy #160 ;point to next screen byte lda (fscreen),y ora FCol5,X ; EOR with the next byte of the font sta (fscreen),y ldy #192 ;point to next screen byte lda (fscreen),y ora FCol6,X ; EOR with the next byte of the font sta (fscreen),y ldy #224 ;point to next screen byte lda (fscreen),y ora FCol7,X ; EOR with the next byte of the font sta (fscreen),y rts Quote Link to comment Share on other sites More sharing options...
xxl Posted October 3, 2015 Share Posted October 3, 2015 print_64: ; register a contains character sec sbc #' ' ; printable character set data starts at space, ASCII 32 tax ; save as character table offset tax sbx #' ' clc txa adc #96 tax txa sbx #$100-96 2 Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 3, 2015 Author Share Posted October 3, 2015 I'm not big on illegal opcodes but if I need a couple extra clock cycles I might use that.I'm sure there are a few clock cycles to be had here or there in the rest of the code but I think the only thing I can do to cut any significant number of clock cycles is to unroll the scroll code further, but that means doubling the size of that code or worse. I figure that would save around 1278 cycles. That adds up but I think I'd only use that if I have to. Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 6, 2015 Author Share Posted October 6, 2015 When two characters line up as left nibble and right nibble of the same byte you can print them at the same time like this: ;************************************************** ; write two characters at once ;************************************************** print_642: ; register a contains character lda (string),y sec sbc #' ' ; printable character set data starts at space, ASCII 32 sta firstchar ; save as character table offset iny lda (string),y sec sbc #' ' sta secondchar ; point screen to $8000 + row (base screen address + row) lda #$80 clc adc row ; adding row to MSB = 256 * row sta fscreen+1 ldy #0 ; top line is always black (1st byte of the font) ; start at zero offset from screen address ; add the column lda col ; 2 columns / byte lsr sta fscreen ; save it twochar: lda BGColor sta (fscreen),y ; write to the screen ldy #32 ; point to next screen byte ldx firstchar ; offset to 1st character lda FCol1,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol21,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen ldy #64 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol2,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol22,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen ldy #96 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol3,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol23,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen ldy #128 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol4,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol24,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen ldy #160 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol5,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol25,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen ldy #192 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol6,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol26,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen ldy #224 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol7,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol27,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen rts Quote Link to comment Share on other sites More sharing options...
xxl Posted October 6, 2015 Share Posted October 6, 2015 maybe this way: ldy #32 ; point to next screen byte ldx secondchar ; offset to 2nd character lda FCol21,X ; add the next byte of the 2nd character ldx firstchar ; offset to 1st character eor FCol1,X ; load the next byte of the 1st character sta (fscreen),y ; write it to the screen ldy #64 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol2,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol22,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen and so. and of course: lax (string),y sbx #' ' ; printable character set data starts at space, ASCII 32 stx firstchar ; save as character table offset and lax (string),y sbx #' ' stx secondchar Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 6, 2015 Author Share Posted October 6, 2015 maybe this way: ldy #32 ; point to next screen byte ldx secondchar ; offset to 2nd character lda FCol21,X ; add the next byte of the 2nd character ldx firstchar ; offset to 1st character eor FCol1,X ; load the next byte of the 1st character sta (fscreen),y ; write it to the screen ldy #64 ;point to next screen byte ldx firstchar ; offset to 1st character lda FCol2,X ; load the next byte of the 1st character ldx secondchar ; offset to 2nd character eor FCol22,X ; add the next byte of the 2nd character sta (fscreen),y ; write it to the screen and so. That will work nicely. and of course: lax (string),y sbx #' ' ; printable character set data starts at space, ASCII 32 stx firstchar ; save as character table offset and lax (string),y sbx #' ' stx secondchar I'd still like to avoid illegal/undocumented opcodes unless I absolutely have to use them. Quote Link to comment Share on other sites More sharing options...
JamesD Posted October 7, 2015 Author Share Posted October 7, 2015 (edited) Before I started optimizing on the left, after on the right.The most noticeable optimizations were the revised scroll and printing two characters at a time but the latter did depend a bit on some of the others. Sorry for the jerky video, I'm doing this on my laptop and sometimes it doesn't keep up. Edited October 7, 2015 by JamesD 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.