Jump to content
IGNORED

Assembly Question...


Recommended Posts

I'm continuing to play around with A8 graphics.  I'm working on a sprite plotter (as have many before) and I have the following snippet of assembly:

 

	lda	#$28
	clc
	adc	destination
	sta	destination
	bcc	*+6
	inc	destination+1

This code runs after I plot two precalc'd sprite bytes and I want to increment the destination by 40.  Then I plot the next two bytes...then increment destination by 40....  Destination is, of course, a 16 bit value.

What I'm seeing on the screen is the sprite spread out over a single line so this isn't doing what I think it's supposed to.  ?

 

 

 

Edited by damosan
Link to comment
Share on other sites

2 hours ago, MaPa said:

It looks OK to me, but IMHO bcc *+6 skips one byte after inc destination+1, so you skip some one byte instruction after it or jump "into" instruction so who know what it does.

inc destination+1 is a 3 byte instruction right?  Then wouldn't it be bcc *+3?  I saw an example online where they state to jump over a 2 byte instruction you need to do *+4...  Is that correct?

 

Learning, learning, learning...

 

Link to comment
Share on other sites

9 minutes ago, damosan said:

inc destination+1 is a 3 byte instruction right?  Then wouldn't it be bcc *+3?  I saw an example online where they state to jump over a 2 byte instruction you need to do *+4...  Is that correct?

 

Why not just use a label ? Then you can check the result in the debugger if you still want to use that syntax.

 

Link to comment
Share on other sites

Hi,

 

   You can use a local label, in ATASM it would be prefixed with a '?' character, so your code would be:

 

   BCC ?L1

   INC DEST+1

?L1

   RTS

 

Sometimes it's difficult to think of a meaningful label, but I think it's better to use a label than *+n, as I think it makes for more readable code. 

 

 

   

Link to comment
Share on other sites

Yes, always use labels. You might reorder instructions and the instruction after the branch could be shorter or longer after that and you might forget to update *+x.

 

@xxl Could I persuade you to not use the term 'illegal' anymore and call them 'undocumented'? :) Some people on AA keep complaining about "illegal" instructions, but all computers sold by Atari can execute those instructions, unless you replaced the original CPU. They were a side effect of the 120 column PLA (programmable logic array) used on the NMOS 6502 sillicon. Not being documented does not mean they are forbidden. Even Bill Mensch said so in the (Antic?) podcast. There is no such thing as an illegal instruction, paraphrased.

 

And there's the interlaced graphic modes, which are not interlaced at all....  Guess I have the same thing @Mathy has with SIO2USB and SIO2PC-over-USB :D

 

  • Like 2
Link to comment
Share on other sites

2 hours ago, ivop said:

Could I persuade you to not use the term 'illegal' anymore and call them 'undocumented'?

I think you're preaching to the converted here. XXL would probably call them 'mandatory', although I question the sense of introducing undocumented opcodes when the objective is to clarify the fundaments assembly language coding. :)

  • Haha 1
Link to comment
Share on other sites

This is what I ended up with...certainly not the best but it works and smokes what cc65 was doing.  :)

The idea here is that I have a graphics buffer I'm writing to (destination) and a sprite that I'm reading from (source, 16 bytes).  So I basically loop 8 times writing two bytes to the graphics buffer boosting the pointers at the end of the loop.

 

Thanks for the comments above btw.

 

	.export _plot_bitmap_fast

	destination = $d4	; address of gfx buffer
	source      = $d6	; address of 16 byte array
	
.proc	_plot_bitmap_fast
	ldx	#8		; 8 rows
loop:
	ldy	#$00		; two updates - index 0 then 1
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	iny			; point to second byte
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	inc	source		; increment source by one
	bne	nextincrement
	inc	source+1	; and source high byte if required
nextincrement:	
	inc	source		; increment source address by one
	bne	boostpointer
	inc	source+1	; and source high byte if required
boostpointer:	
	lda	#$28		; increment destination address by 40
	clc
	adc	destination
	sta	destination
	bcc	nextloop
	inc     destination+1
nextloop:	
	dex			; modify loop counter
	bne	loop		; and write the next row of bytes
	
exit_fast_plot:	
	rts

.endproc

 

Link to comment
Share on other sites

If you don't need "source" updated at the end, you could just use Y:

 

.export _plot_bitmap_fast

	destination = $d4	; address of gfx buffer
	source      = $d6	; address of 16 byte array
	
.proc	_plot_bitmap_fast
	ldy	#0

loop:
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	iny			; point to second byte
	lda	(source), y
	eor	(destination),y
	sta	(destination),y
	iny

	lda	destination
	clc
	adc	#40-2
	bcc	no_inc
	inc	destination+1
no_inc:	
	sta	destination

	cpy	#8*2
	bne	loop		; and write the next row of bytes
	
exit_fast_plot:	
	rts

.endproc

 

Edited by NRV
  • Like 1
Link to comment
Share on other sites

8 hours ago, NRV said:

If you don't need "source" updated at the end, you could just use Y:

Nice.  Thank you.  I  have lots to learn...

 

Your version is twice as fast as the original (10 objects over 300 screens in just under 300 jiffies vs. 597 jiffies for the first cut).

Edited by damosan
  • Like 1
Link to comment
Share on other sites

Since the order you do the memory modification doesn't matter, you can also use the classic optimization of making it a down counting loop and get rid of the cpy at the end of the loop. This will save 2 bytes & 2 cycles from the loop.

 

Load Y with #(8*2)-1, and change the INYs to DEYs (moving the last one to the end of the loop, changing the "bne loop" to "bpl loop").  Change the adc/inc to sbc/dec, along with changing the initial value of destination to point to the last byte - ((8*2)-1) instead of the first byte.

  • Like 1
Link to comment
Share on other sites

Hi,

 

   I was thinking about the original code you posted, as I never use BCC label type code for this situation. I always write:

 

    clc

    Lda #val

    Adc location

    Sta location

    Lda #0

    Adc location+1

    Sta location+1

 

I thought the BCC check was a good idea, but when I wondered about why I had been writing code this way, I realised it was because I had been basing it on code that added two 16 bit numbers, not an 8 bit number and a 16 bit number. What I usually write is:

 

    clc

    Lda #<constant

    Adc location

    Sta location

    Lda #>constant

    Adc location+1

    Sta location+1

 

Although this is slower code than the code you posted, it does have the advantage of not needing to be changed if you want to use a #constant value higher than 255.

 

 

Link to comment
Share on other sites

@damosan:

 

you can use the X register to store the .LO dest address:
 

	lda	destination
	clc
	adc	#40-2
	bcc	no_inc
	inc	destination+1
no_inc:
	sta	destination

with "SBX" you can both add and subtract, so the piece with the counting direction does not affect the method

 

        txa              ; 2 instead of 3
        sbx #$100-(value); 2 instead of 2+2
        bcc @+
        inc dest+1
@       stx dest

 

  • Like 1
Link to comment
Share on other sites

Side question - I need a fast way to determine which bitmask to use for a graphics 8 screen.  I'm shifting the X left to get the byte in question (next move but that's easy...table driven lookup for X byte offset)...just need fast way to determine the bitmask based on pixel location.

 

Link to comment
Share on other sites

40 minutes ago, damosan said:

just need fast way to determine the bitmask based on pixel location

	lda xcoord
	and #$07
	tax
	lda bitmasktab,x
	
...

bitmasktab
	.byte %10000000
	.byte %01000000
	.byte %00100000
	.byte %00010000

	...etc

I usually go with something like that. Obviously you can get the background mask with EOR #$ff, or have a separate table.

Edited by flashjazzcat
#
Link to comment
Share on other sites

Cool.  I'm just playing with a black background at the moment so don't have any sort of background mask to worry about at the moment.

 

For a 16 bit quantity you need to do something with the high byte as well?

 

...or...I could just do another lookup table for the masks...hmmm....

 

Edit: *time passes*  Nope.  It works ANDing just the low byte.

Edited by damosan
But wait...there's more...
Link to comment
Share on other sites

For those following at home.

 

Nothing too fancy.  It appears to work fine.

 

I'm getting about 149 pixels per jiffy with this (compared to 65 / second with cc65).  I suspect this could be made to go a bit faster if I used a lookup for the byte offset vs. a string of LSRs as well as added the gfx buffer address to the Y offsets at init time.

.proc   _plot_pixel_fast
	ldy	yy

	lda	yindexhi,y	; load Y index into argument (zp)
	sta	argument+1
	lda	yindexlo,y
	sta	argument
	
	clc
	adc	basegfx		; argument low byte is already in A
	sta	argument	; add Y offset to argument
	lda	argument+1	; add Y (high byte) to argument
	adc	basegfx+1
	sta	argument+1
	
	lda	xx		; what bitmap to use?
	and	#$07		; or low byte by $07  
	tax			; transfer bitmap offset to X
	
	clc
	lsr	xx+1		; divide by two...full LSR/ROR for the first shift
	ror	xx
	lda	xx
	lsr	a
	lsr	a
	tay

	lda	(argument),y	; load whatever byte is on the screen
	eor	bitmasks,x	; eor with our computed mask
	sta	(argument),y	; write new value back to the screen
	
	rts
.endproc

 

Edited by damosan
Small edit.
  • Like 1
Link to comment
Share on other sites

29 minutes ago, damosan said:

clc

lsr xx+1 ; divide by two...full LSR/ROR for the first shift

ror xx

lda xx

lsr a

lsr a

tay

It could be faster a bit:

lsr xx+1
lda xx
ror
lsr
lsr
tay

but look at the second routine in http://atariki.krap.pl/index.php/Programowanie:_Rysowanie_punktu and replace and+ora to eor bytepxl,x at the end.

Edited by mono
Link to comment
Share on other sites

You'll never approach the speed that hand tuned 6502 will give you on these machines.  That was true all the way up through the Jaguar.  I love these threads, everybody finds a way to shrink code by 2 bytes, or make it a few cycles faster.  It's a definite art.

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...