Preppie Posted July 31, 2020 Share Posted July 31, 2020 I've wrote a soft sprite routine and think I've got it nice and fast but I'm trying to squeeze every last bit out of it. Here's the bit of code i'm trying to squeeze: lda character ;load the character value (0-255) tax ;save character in X-reg and #%01100000 ;isolate the hi bits asl ;shift left ensuring carry is clear rol rol ;rotate the bits to be in bits 0 & 1 rol adc change_font+2 ;add bits to hi-byte (MSB) sta change_bck+2 ;store the resulting value txa ;Get character value back from X into A asl asl ;3 asl = multiply by 8 asl sta change_bck+1 ;store the result in lo-byte (LSB) Note: it's in antic 4 so although I'm not interested in bit7 of character it may be set. Basically it's adding 0-3 to the hi-byte depending on bit6&7 and then mulitplying the lower 5 bits by 8 to get the place in the font set. I've wracked my brains and come up with nothing better, even tried a little table in ZP to turn 0,32,64,96 into 0,1,2,3 but it was no faster the way I coded it. Maybe there's some bit wizardry that could do it. I work this out to 36 cycles, it needs to run 9 times per sprite for a total of 324 cycles. I know it's not a lot but every cycle saved helps 3 Quote Link to comment Share on other sites More sharing options...
popmilo Posted July 31, 2020 Share Posted July 31, 2020 I'm not exactly sure what's "change_font" doing, but I'll asume it's your current charset address high byte. For direct "calc" method (shorter code) I would do: lda character sta change_bck+1 lda #0 asl change_bck+1 rol asl change_bck+1 rol asl change_bck+1 rol ; now we have character*8 in hi-lo form in accumulator and change_bck+1 adc change_font+2 ; no need for clc as carry is zero for sure sta change_bck+2 It's little shorter than your version, but could be even more cycles because of more asl operations with zp var (if change_bck is on zp ?). But... Best way is table for sure: ldx character lda tab_lo,x sta change_bck+1 lda tab_hi,x clc adc change_font+2 sta change_bck+2 Where: tab_lo: .fill 128, <(i*8) tab_hi: .fill 128, >(i*8) Even better... Have table values with hard encoded charset base address. ldx character lda tab_lo,x sta change_bck+1 lda tab_hi,x sta change_bck+2 tab_lo: .fill 128, <(base+i*8) tab_hi: .fill 128, >(base+i*8) 3 Quote Link to comment Share on other sites More sharing options...
MaPa Posted July 31, 2020 Share Posted July 31, 2020 Use tables.. something like: ldx character ;load the character value (0-255) lda char_addr_tab_hi,x ;get hi byte of char def adc change_font+2 ;add bits to hi-byte (MSB) sta change_bck+2 ;store the resulting value lda char_addr_tab_low,x ;get low byte of char def sta change_bck+1 ;store the result in lo-byte (LSB) .. .. char_addr_tab_low :128 dta l(font+#*8) :128 dta l(font+#*8) ; inverse chars has the same data offset char_addr_tab_hi :128 dta h(font+#*8) :128 dta h(font+#*8) ; inverse chars has the same data offset If change_font+2 is constant, you can add that value into the table already and delete adc change_font+2 instruction. This uses 512 bytes for tables... or 256 if you add and #$7f after loading char. 3 Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted July 31, 2020 Share Posted July 31, 2020 (edited) Perhaps use a page of memory and store the pointers there? table_lo = $600 table_hi = $680 maketable: ldy #0 ;loop through the characters value (0-127) loop: tax ;save character in X-reg and #%01100000 ;isolate the hi bits asl ;shift left ensuring carry is clear rol rol ;rotate the bits to be in bits 0 & 1 rol adc change_font+2 ;add bits to hi-byte (MSB) sta table_hi,Y ;store the resulting value txa ;Get character value back from X into A asl asl ;3 asl = multiply by 8 asl sta table_lo+1 ;store the result in lo-byte (LSB) iny bpl loop rts Then use this as follows: lda character and #127 tax lda table_lo,x sta change_bck+1 lda table_hi,x sta change_bck+2 [EDIT] ah, I was too slow, others have done similar Edited July 31, 2020 by Wrathchild 3 Quote Link to comment Share on other sites More sharing options...
Preppie Posted July 31, 2020 Author Share Posted July 31, 2020 (edited) Just to clear up a little confusion, the screen uses 4 character sets interleaved and the code self-modifies so I can use absolute addressing in the central loop instead of Indirect Indexed. Although it's a little heavy in the outer loops those 1 cycle savings payoff due to the number of times the inner loop runs. Therefore change_font & change_bck point to the absolute addresses that I'm modifying here. ldx character lda tab_lo,x sta change_bck+1 lda tab_hi,x clc adc change_font+2 sta change_bck+2 I think this should do the job if I don't mind throwing away some more RAM (isn't that always the case). Although won't I need 2 pages for the tables as each character could be 0-255 due to the fifth color. I could add an AND #%01111111 at the start and just use 1 page for the table I suppose (edit: MaPa pointed this out) I think I can do away with the CLC too as long as it's clear going into the main routine. Thanks everyone, especially popmilo who got in first with that solution. Edited July 31, 2020 by Preppie 1 Quote Link to comment Share on other sites More sharing options...
Preppie Posted July 31, 2020 Author Share Posted July 31, 2020 (edited) edit: nah, that doesn't work lol Edited July 31, 2020 by Preppie 1 Quote Link to comment Share on other sites More sharing options...
Preppie Posted July 31, 2020 Author Share Posted July 31, 2020 Ok, that did the job I'm now matching Jankovic's (popmilo by the look of the photo i guess ) 14 sprites at 25fps (or 7 @ 50fps ofc) - video is his not mine. I guess I could have asked for the code but where's the fun in that? Don't think I can push it any further but I'll have another look later. 3 Quote Link to comment Share on other sites More sharing options...
MaPa Posted July 31, 2020 Share Posted July 31, 2020 1 hour ago, Preppie said: Although it's a little heavy in the outer loops those 1 cycle savings payoff due to the number of times the inner loop runs. Therefore change_font & change_bck point to the absolute addresses that I'm modifying here. I didn't count the cycles nor saw your code.. but with absolute addressing you have to loop one byte (lda,and,ora,sta) cycle or not? With indirect indexed addressing you can easily unroll loop saving 3 cycles per byte which almost cancles the 4 cycles more with indirect indexed and leaves you with "simple" loop prepare. 1 Quote Link to comment Share on other sites More sharing options...
Preppie Posted July 31, 2020 Author Share Posted July 31, 2020 I was just thinking 'why the hell don't I unroll that inner loop' My answer was 'because I can't adjust all those absolutes', to which my reply was 'you should have used indirect indexed from the start you moron' ? You live and learn, I'm just an assembler noob so I've learned a lot doing that routine as it is and now I get to rewrite it - woohooo! 4 1 Quote Link to comment Share on other sites More sharing options...
Yaron Nir Posted August 2, 2020 Share Posted August 2, 2020 @Preppie, just out of curiosity, are working on a specific game ? Quote Link to comment Share on other sites More sharing options...
Preppie Posted August 2, 2020 Author Share Posted August 2, 2020 2 hours ago, Yaron Nir said: @Preppie, just out of curiosity, are working on a specific game ? I was supposed to be writing a quick little game for ABBUC and I got side tracked - I need to put this on the backburner 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.