Jump to content
IGNORED

Can this bit of Assembler be speeded up?


Recommended Posts

I've wrote a soft sprite routine and think I've got it nice and fast but I'm trying to squeeze every last bit out of it.

 

Here's the bit of code i'm trying to squeeze:

 

lda character				;load the character value (0-255)         
tax					;save character in X-reg                  
and #%01100000				;isolate the hi bits                      
asl					;shift left ensuring carry is clear       
rol	                                          
rol					;rotate the bits to be in bits 0 & 1      
rol                                                           
adc change_font+2			;add bits to hi-byte (MSB)                
sta change_bck+2			;store the resulting value                
txa					;Get character value back from X into A                                 
asl                                                           
asl					;3 asl = multiply by 8                    
asl                                                           
sta change_bck+1			;store the result in lo-byte (LSB)   

 

 

Note: it's in antic 4 so although I'm not interested in bit7 of character it may be set.

 

Basically it's adding 0-3 to the hi-byte depending on bit6&7 and then mulitplying the lower 5 bits by 8 to get the place in the font set.

 

I've wracked my brains and come up with nothing better, even tried a little table in ZP to turn 0,32,64,96 into 0,1,2,3 but it was no faster the way I coded it.  Maybe there's some bit wizardry that could do it. 

 

I work this out to 36 cycles, it needs to run 9 times per sprite for a total of 324 cycles.  I know it's not a lot but every cycle saved helps :)

 

 

 

  • Like 3
Link to comment
Share on other sites

I'm not exactly sure what's "change_font" doing, but I'll asume it's your current charset address high byte.

For direct "calc" method (shorter code) I would do:

lda character
sta change_bck+1
lda #0
asl change_bck+1
rol
asl change_bck+1
rol
asl change_bck+1
rol              ; now we have character*8 in hi-lo form in accumulator and change_bck+1
adc change_font+2   ; no need for clc as carry is zero for sure
sta change_bck+2

It's little shorter than your version, but could be even more cycles because of more asl operations with zp var (if change_bck is on zp ?).

 

But... :)
Best way is table for sure:

ldx character
lda tab_lo,x
sta change_bck+1
lda tab_hi,x
clc
adc change_font+2
sta change_bck+2

Where:

tab_lo:   .fill 128, <(i*8)
tab_hi:   .fill 128, >(i*8)

 

Even better...

Have table values with hard encoded charset base address.

ldx character
lda tab_lo,x
sta change_bck+1
lda tab_hi,x
sta change_bck+2

tab_lo:   .fill 128, <(base+i*8)
tab_hi:   .fill 128, >(base+i*8)

 

  • Like 3
Link to comment
Share on other sites

Use tables.. something like:

ldx character				;load the character value (0-255)         
lda char_addr_tab_hi,x		;get hi byte of char def
adc change_font+2			;add bits to hi-byte (MSB)                
sta change_bck+2			;store the resulting value                
lda char_addr_tab_low,x     ;get low byte of char def
sta change_bck+1			;store the result in lo-byte (LSB)   
..
..

char_addr_tab_low  :128 dta l(font+#*8)
                   :128 dta l(font+#*8)		; inverse chars has the same data offset
char_addr_tab_hi   :128 dta h(font+#*8)
                   :128 dta h(font+#*8)		; inverse chars has the same data offset

If change_font+2 is constant, you can add that value into the table already and delete adc change_font+2 instruction. This uses 512 bytes for tables... or 256 if you add and #$7f after loading char.

  • Like 3
Link to comment
Share on other sites

Perhaps use a page of memory and store the pointers there?

 

table_lo = $600
table_hi = $680

maketable:
 ldy #0                 ;loop through the characters value (0-127)         
loop:
 tax                    ;save character in X-reg                  
 and #%01100000         ;isolate the hi bits                      
 asl                    ;shift left ensuring carry is clear       
 rol                                              
 rol                    ;rotate the bits to be in bits 0 & 1      
 rol                                                           
 adc change_font+2      ;add bits to hi-byte (MSB)                
 sta table_hi,Y       ;store the resulting value                
 txa                    ;Get character value back from X into A                                 
 asl                                                           
 asl                    ;3 asl = multiply by 8                    
 asl                                                           
 sta table_lo+1       ;store the result in lo-byte (LSB)   
 iny
 bpl loop
 rts

Then use this as follows:

lda character
and #127
tax
lda table_lo,x
sta change_bck+1
lda table_hi,x
sta change_bck+2

 

[EDIT] ah, I was too slow, others have done similar

Edited by Wrathchild
  • Like 3
Link to comment
Share on other sites

Just to clear up a little confusion, the screen uses 4 character sets interleaved and the code self-modifies so I can use absolute addressing in the central loop instead of Indirect Indexed.

 

Although it's a little heavy in the outer loops those 1 cycle savings payoff due to the number of times the inner loop runs.  Therefore change_font & change_bck point to the absolute addresses that I'm modifying here.

 

ldx character
lda tab_lo,x
sta change_bck+1
lda tab_hi,x
clc
adc change_font+2
sta change_bck+2

 

I think this should do the job if I don't mind throwing away some more RAM (isn't that always the case).  Although won't I need 2 pages for the tables as each character could be 0-255 due to the fifth color.  I could add an AND #%01111111 at the start and just use 1 page for the table I suppose (edit: MaPa pointed this out)

 

I think I can do away with the CLC too :)  as long as it's clear going into the main routine.

 

Thanks everyone, especially popmilo who got in first with that solution.

 

 

 

Edited by Preppie
  • Like 1
Link to comment
Share on other sites

Ok, that did the job :)  I'm now matching Jankovic's (popmilo by the look of the photo i guess :)) 14 sprites at 25fps (or 7 @ 50fps ofc) - video is his not mine.

 

 

 

I guess I could have asked for the code but where's the fun in that?

 

Don't think I can push it any further but I'll have another look later.  

 

  • Like 3
Link to comment
Share on other sites

1 hour ago, Preppie said:

Although it's a little heavy in the outer loops those 1 cycle savings payoff due to the number of times the inner loop runs.  Therefore change_font & change_bck point to the absolute addresses that I'm modifying here.

 

I didn't count the cycles nor saw your code.. but with absolute addressing you have to loop one byte (lda,and,ora,sta) cycle or not? With indirect indexed addressing you can easily unroll loop saving 3 cycles per byte which almost cancles the 4 cycles more with indirect indexed and leaves you with "simple" loop prepare.

  • Like 1
Link to comment
Share on other sites

I was just thinking 'why the hell don't I unroll that inner loop' :)   My answer was 'because I can't adjust all those absolutes', to which my reply was 'you should have used indirect indexed from the start you moron' ?

 

You live and learn, I'm just an assembler noob so I've learned a lot doing that routine as it is and now I get to rewrite it - woohooo!

  • Like 4
  • Haha 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...