Assembly Question...

damosan · January 14, 2020

I'm continuing to play around with A8 graphics. I'm working on a sprite plotter (as have many before) and I have the following snippet of assembly:

	lda	#$28
	clc
	adc	destination
	sta	destination
	bcc	*+6
	inc	destination+1

This code runs after I plot two precalc'd sprite bytes and I want to increment the destination by 40. Then I plot the next two bytes...then increment destination by 40.... Destination is, of course, a 16 bit value.

What I'm seeing on the screen is the sprite spread out over a single line so this isn't doing what I think it's supposed to. ?

Edited January 14, 2020 by damosan

MaPa · January 14, 2020

It looks OK to me, but IMHO bcc *+6 skips one byte after inc destination+1, so you skip some one byte instruction after it or jump "into" instruction so who know what it does.

damosan · January 14, 2020

2 hours ago, MaPa said:

It looks OK to me, but IMHO bcc *+6 skips one byte after inc destination+1, so you skip some one byte instruction after it or jump "into" instruction so who know what it does.

inc destination+1 is a 3 byte instruction right? Then wouldn't it be bcc *+3? I saw an example online where they state to jump over a 2 byte instruction you need to do *+4... Is that correct?

Learning, learning, learning...

rensoup · January 14, 2020

9 minutes ago, damosan said:

inc destination+1 is a 3 byte instruction right? Then wouldn't it be bcc *+3? I saw an example online where they state to jump over a 2 byte instruction you need to do *+4... Is that correct?

Why not just use a label ? Then you can check the result in the debugger if you still want to use that syntax.

shanti77 · January 14, 2020

bcc=0

bcc *+x ;2 bytes

Inc destination+1 ;3 bytes (if not page zero)

So you must write:

bcc *+5

E474 · January 14, 2020

Hi,

You can use a local label, in ATASM it would be prefixed with a '?' character, so your code would be:

BCC ?L1

INC DEST+1

?L1

RTS

Sometimes it's difficult to think of a meaningful label, but I think it's better to use a label than *+n, as I think it makes for more readable code.

xxl · January 14, 2020

lax destination              ; illegal
sbx #$100-$28      ; +$28    ; illegal
stx destination
bcc @+
inc destination+1
@

save 2 cycles ?

ivop · January 14, 2020

Yes, always use labels. You might reorder instructions and the instruction after the branch could be shorter or longer after that and you might forget to update *+x.

@xxl Could I persuade you to not use the term 'illegal' anymore and call them 'undocumented'? Some people on AA keep complaining about "illegal" instructions, but all computers sold by Atari can execute those instructions, unless you replaced the original CPU. They were a side effect of the 120 column PLA (programmable logic array) used on the NMOS 6502 sillicon. Not being documented does not mean they are forbidden. Even Bill Mensch said so in the (Antic?) podcast. There is no such thing as an illegal instruction, paraphrased.

And there's the interlaced graphic modes, which are not interlaced at all.... Guess I have the same thing @Mathy has with SIO2USB and SIO2PC-over-USB

flashjazzcat · January 15, 2020

2 hours ago, ivop said:

Could I persuade you to not use the term 'illegal' anymore and call them 'undocumented'?

I think you're preaching to the converted here. XXL would probably call them 'mandatory', although I question the sense of introducing undocumented opcodes when the objective is to clarify the fundaments assembly language coding.

damosan · January 15, 2020

This is what I ended up with...certainly not the best but it works and smokes what cc65 was doing.

The idea here is that I have a graphics buffer I'm writing to (destination) and a sprite that I'm reading from (source, 16 bytes). So I basically loop 8 times writing two bytes to the graphics buffer boosting the pointers at the end of the loop.

Thanks for the comments above btw.

	.export _plot_bitmap_fast

	destination = $d4	; address of gfx buffer
	source      = $d6	; address of 16 byte array
	
.proc	_plot_bitmap_fast
	ldx	#8		; 8 rows
loop:
	ldy	#$00		; two updates - index 0 then 1
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	iny			; point to second byte
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	inc	source		; increment source by one
	bne	nextincrement
	inc	source+1	; and source high byte if required
nextincrement:	
	inc	source		; increment source address by one
	bne	boostpointer
	inc	source+1	; and source high byte if required
boostpointer:	
	lda	#$28		; increment destination address by 40
	clc
	adc	destination
	sta	destination
	bcc	nextloop
	inc     destination+1
nextloop:	
	dex			; modify loop counter
	bne	loop		; and write the next row of bytes
	
exit_fast_plot:	
	rts

.endproc

NRV · January 15, 2020

If you don't need "source" updated at the end, you could just use Y:

.export _plot_bitmap_fast

	destination = $d4	; address of gfx buffer
	source      = $d6	; address of 16 byte array
	
.proc	_plot_bitmap_fast
	ldy	#0

loop:
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	iny			; point to second byte
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	iny

	lda	destination
	clc
	adc	#40-2
	bcc	no_inc
	inc	destination+1
no_inc:	
	sta	destination

	cpy	#8*2
	bne	loop		; and write the next row of bytes
	
exit_fast_plot:	
	rts

.endproc

Edited January 15, 2020 by NRV

damosan · January 15, 2020

8 hours ago, NRV said:

If you don't need "source" updated at the end, you could just use Y:

Nice. Thank you. I have lots to learn...

Your version is twice as fast as the original (10 objects over 300 screens in just under 300 jiffies vs. 597 jiffies for the first cut).

Edited January 15, 2020 by damosan

RevEng · January 15, 2020

The cpy will clear the carry flag for any values that will continue your loop, so you can ditch the clc within the loop. You'll need a clc prior to the loop, though.

StickJock · January 15, 2020

Since the order you do the memory modification doesn't matter, you can also use the classic optimization of making it a down counting loop and get rid of the cpy at the end of the loop. This will save 2 bytes & 2 cycles from the loop.

Load Y with #(8*2)-1, and change the INYs to DEYs (moving the last one to the end of the loop, changing the "bne loop" to "bpl loop"). Change the adc/inc to sbc/dec, along with changing the initial value of destination to point to the last byte - ((8*2)-1) instead of the first byte.

damosan · January 16, 2020

As an aside. I find that as a long time C programmer I tend to introduce C-isms into code.

Thanks all.

E474 · January 16, 2020

Hi,

I was thinking about the original code you posted, as I never use BCC label type code for this situation. I always write:

clc

Lda #val

Adc location

Sta location

Lda #0

Adc location+1

Sta location+1

I thought the BCC check was a good idea, but when I wondered about why I had been writing code this way, I realised it was because I had been basing it on code that added two 16 bit numbers, not an 8 bit number and a 16 bit number. What I usually write is:

clc

Lda #<constant

Adc location

Sta location

Lda #>constant

Adc location+1

Sta location+1

Although this is slower code than the code you posted, it does have the advantage of not needing to be changed if you want to use a #constant value higher than 255.

xxl · January 16, 2020

@damosan:

you can use the X register to store the .LO dest address:

	lda	destination
	clc
	adc	#40-2
	bcc	no_inc
	inc	destination+1
no_inc:
	sta	destination

with "SBX" you can both add and subtract, so the piece with the counting direction does not affect the method

        txa              ; 2 instead of 3
        sbx #$100-(value); 2 instead of 2+2
        bcc @+
        inc dest+1
@       stx dest

damosan · January 16, 2020

Side question - I need a fast way to determine which bitmask to use for a graphics 8 screen. I'm shifting the X left to get the byte in question (next move but that's easy...table driven lookup for X byte offset)...just need fast way to determine the bitmask based on pixel location.

flashjazzcat · January 16, 2020

40 minutes ago, damosan said:

just need fast way to determine the bitmask based on pixel location

	lda xcoord
	and #$07
	tax
	lda bitmasktab,x
	
...

bitmasktab
	.byte %10000000
	.byte %01000000
	.byte %00100000
	.byte %00010000

	...etc

I usually go with something like that. Obviously you can get the background mask with EOR #$ff, or have a separate table.

Edited January 16, 2020 by flashjazzcat
#

damosan · January 16, 2020

Cool. I'm just playing with a black background at the moment so don't have any sort of background mask to worry about at the moment.

For a 16 bit quantity you need to do something with the high byte as well?

...or...I could just do another lookup table for the masks...hmmm....

Edit: *time passes* Nope. It works ANDing just the low byte.

Edited January 17, 2020 by damosan
But wait...there's more...

damosan · January 17, 2020

For those following at home.

Nothing too fancy. It appears to work fine.

I'm getting about 149 pixels per jiffy with this (compared to 65 / second with cc65). I suspect this could be made to go a bit faster if I used a lookup for the byte offset vs. a string of LSRs as well as added the gfx buffer address to the Y offsets at init time.

.proc   _plot_pixel_fast
	ldy	yy

	lda	yindexhi,y	; load Y index into argument (zp)
	sta	argument+1
	lda	yindexlo,y
	sta	argument
	
	clc
	adc	basegfx		; argument low byte is already in A
	sta	argument	; add Y offset to argument
	lda	argument+1	; add Y (high byte) to argument
	adc	basegfx+1
	sta	argument+1
	
	lda	xx		; what bitmap to use?
	and	#$07		; or low byte by $07  
	tax			; transfer bitmap offset to X
	
	clc
	lsr	xx+1		; divide by two...full LSR/ROR for the first shift
	ror	xx
	lda	xx
	lsr	a
	lsr	a
	tay

	lda	(argument),y	; load whatever byte is on the screen
	eor	bitmasks,x	; eor with our computed mask
	sta	(argument),y	; write new value back to the screen
	
	rts
.endproc

Edited January 17, 2020 by damosan
Small edit.

mono · January 17, 2020

29 minutes ago, damosan said:

clc

lsr xx+1 ; divide by two...full LSR/ROR for the first shift

ror xx

lda xx

lsr a

lsr a

tay

It could be faster a bit:

lsr xx+1
lda xx
ror
lsr
lsr
tay

but look at the second routine in http://atariki.krap.pl/index.php/Programowanie:_Rysowanie_punktu and replace and+ora to eor bytepxl,x at the end.

Edited January 17, 2020 by mono

damosan · January 17, 2020

Yeah that code creates tables for everything (which is good). I think my next cut will do that ... this weekend. I suspect I'll see quite a speed increase.

Thanks for the link.

+Stephen · January 17, 2020

You'll never approach the speed that hand tuned 6502 will give you on these machines. That was true all the way up through the Jaguar. I love these threads, everybody finds a way to shrink code by 2 bytes, or make it a few cycles faster. It's a definite art.

Assembly Question...

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members