damosan Posted January 14, 2020 Share Posted January 14, 2020 (edited) I'm continuing to play around with A8 graphics. I'm working on a sprite plotter (as have many before) and I have the following snippet of assembly: lda #$28 clc adc destination sta destination bcc *+6 inc destination+1 This code runs after I plot two precalc'd sprite bytes and I want to increment the destination by 40. Then I plot the next two bytes...then increment destination by 40.... Destination is, of course, a 16 bit value. What I'm seeing on the screen is the sprite spread out over a single line so this isn't doing what I think it's supposed to. ? Edited January 14, 2020 by damosan Quote Link to comment Share on other sites More sharing options...
MaPa Posted January 14, 2020 Share Posted January 14, 2020 It looks OK to me, but IMHO bcc *+6 skips one byte after inc destination+1, so you skip some one byte instruction after it or jump "into" instruction so who know what it does. Quote Link to comment Share on other sites More sharing options...
damosan Posted January 14, 2020 Author Share Posted January 14, 2020 2 hours ago, MaPa said: It looks OK to me, but IMHO bcc *+6 skips one byte after inc destination+1, so you skip some one byte instruction after it or jump "into" instruction so who know what it does. inc destination+1 is a 3 byte instruction right? Then wouldn't it be bcc *+3? I saw an example online where they state to jump over a 2 byte instruction you need to do *+4... Is that correct? Learning, learning, learning... Quote Link to comment Share on other sites More sharing options...
rensoup Posted January 14, 2020 Share Posted January 14, 2020 9 minutes ago, damosan said: inc destination+1 is a 3 byte instruction right? Then wouldn't it be bcc *+3? I saw an example online where they state to jump over a 2 byte instruction you need to do *+4... Is that correct? Why not just use a label ? Then you can check the result in the debugger if you still want to use that syntax. Quote Link to comment Share on other sites More sharing options...
shanti77 Posted January 14, 2020 Share Posted January 14, 2020 bcc=0 bcc *+x ;2 bytes Inc destination+1 ;3 bytes (if not page zero) So you must write: bcc *+5 Quote Link to comment Share on other sites More sharing options...
E474 Posted January 14, 2020 Share Posted January 14, 2020 Hi, You can use a local label, in ATASM it would be prefixed with a '?' character, so your code would be: BCC ?L1 INC DEST+1 ?L1 RTS Sometimes it's difficult to think of a meaningful label, but I think it's better to use a label than *+n, as I think it makes for more readable code. Quote Link to comment Share on other sites More sharing options...
xxl Posted January 14, 2020 Share Posted January 14, 2020 lax destination ; illegal sbx #$100-$28 ; +$28 ; illegal stx destination bcc @+ inc destination+1 @ save 2 cycles ? Quote Link to comment Share on other sites More sharing options...
ivop Posted January 14, 2020 Share Posted January 14, 2020 Yes, always use labels. You might reorder instructions and the instruction after the branch could be shorter or longer after that and you might forget to update *+x. @xxl Could I persuade you to not use the term 'illegal' anymore and call them 'undocumented'? Some people on AA keep complaining about "illegal" instructions, but all computers sold by Atari can execute those instructions, unless you replaced the original CPU. They were a side effect of the 120 column PLA (programmable logic array) used on the NMOS 6502 sillicon. Not being documented does not mean they are forbidden. Even Bill Mensch said so in the (Antic?) podcast. There is no such thing as an illegal instruction, paraphrased. And there's the interlaced graphic modes, which are not interlaced at all.... Guess I have the same thing @Mathy has with SIO2USB and SIO2PC-over-USB 2 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 15, 2020 Share Posted January 15, 2020 2 hours ago, ivop said: Could I persuade you to not use the term 'illegal' anymore and call them 'undocumented'? I think you're preaching to the converted here. XXL would probably call them 'mandatory', although I question the sense of introducing undocumented opcodes when the objective is to clarify the fundaments assembly language coding. 1 Quote Link to comment Share on other sites More sharing options...
damosan Posted January 15, 2020 Author Share Posted January 15, 2020 This is what I ended up with...certainly not the best but it works and smokes what cc65 was doing. The idea here is that I have a graphics buffer I'm writing to (destination) and a sprite that I'm reading from (source, 16 bytes). So I basically loop 8 times writing two bytes to the graphics buffer boosting the pointers at the end of the loop. Thanks for the comments above btw. .export _plot_bitmap_fast destination = $d4 ; address of gfx buffer source = $d6 ; address of 16 byte array .proc _plot_bitmap_fast ldx #8 ; 8 rows loop: ldy #$00 ; two updates - index 0 then 1 lda (source), y eor (destination),y sta (destination),y iny ; point to second byte lda (source), y eor (destination),y sta (destination),y inc source ; increment source by one bne nextincrement inc source+1 ; and source high byte if required nextincrement: inc source ; increment source address by one bne boostpointer inc source+1 ; and source high byte if required boostpointer: lda #$28 ; increment destination address by 40 clc adc destination sta destination bcc nextloop inc destination+1 nextloop: dex ; modify loop counter bne loop ; and write the next row of bytes exit_fast_plot: rts .endproc Quote Link to comment Share on other sites More sharing options...
NRV Posted January 15, 2020 Share Posted January 15, 2020 (edited) If you don't need "source" updated at the end, you could just use Y: .export _plot_bitmap_fast destination = $d4 ; address of gfx buffer source = $d6 ; address of 16 byte array .proc _plot_bitmap_fast ldy #0 loop: lda (source), y eor (destination),y sta (destination),y iny ; point to second byte lda (source), y eor (destination),y sta (destination),y iny lda destination clc adc #40-2 bcc no_inc inc destination+1 no_inc: sta destination cpy #8*2 bne loop ; and write the next row of bytes exit_fast_plot: rts .endproc Edited January 15, 2020 by NRV 1 Quote Link to comment Share on other sites More sharing options...
damosan Posted January 15, 2020 Author Share Posted January 15, 2020 (edited) 8 hours ago, NRV said: If you don't need "source" updated at the end, you could just use Y: Nice. Thank you. I have lots to learn... Your version is twice as fast as the original (10 objects over 300 screens in just under 300 jiffies vs. 597 jiffies for the first cut). Edited January 15, 2020 by damosan 1 Quote Link to comment Share on other sites More sharing options...
RevEng Posted January 15, 2020 Share Posted January 15, 2020 The cpy will clear the carry flag for any values that will continue your loop, so you can ditch the clc within the loop. You'll need a clc prior to the loop, though. 1 Quote Link to comment Share on other sites More sharing options...
StickJock Posted January 15, 2020 Share Posted January 15, 2020 Since the order you do the memory modification doesn't matter, you can also use the classic optimization of making it a down counting loop and get rid of the cpy at the end of the loop. This will save 2 bytes & 2 cycles from the loop. Load Y with #(8*2)-1, and change the INYs to DEYs (moving the last one to the end of the loop, changing the "bne loop" to "bpl loop"). Change the adc/inc to sbc/dec, along with changing the initial value of destination to point to the last byte - ((8*2)-1) instead of the first byte. 1 Quote Link to comment Share on other sites More sharing options...
damosan Posted January 16, 2020 Author Share Posted January 16, 2020 As an aside. I find that as a long time C programmer I tend to introduce C-isms into code. Thanks all. Quote Link to comment Share on other sites More sharing options...
E474 Posted January 16, 2020 Share Posted January 16, 2020 Hi, I was thinking about the original code you posted, as I never use BCC label type code for this situation. I always write: clc Lda #val Adc location Sta location Lda #0 Adc location+1 Sta location+1 I thought the BCC check was a good idea, but when I wondered about why I had been writing code this way, I realised it was because I had been basing it on code that added two 16 bit numbers, not an 8 bit number and a 16 bit number. What I usually write is: clc Lda #<constant Adc location Sta location Lda #>constant Adc location+1 Sta location+1 Although this is slower code than the code you posted, it does have the advantage of not needing to be changed if you want to use a #constant value higher than 255. Quote Link to comment Share on other sites More sharing options...
xxl Posted January 16, 2020 Share Posted January 16, 2020 @damosan: you can use the X register to store the .LO dest address: lda destination clc adc #40-2 bcc no_inc inc destination+1 no_inc: sta destination with "SBX" you can both add and subtract, so the piece with the counting direction does not affect the method txa ; 2 instead of 3 sbx #$100-(value); 2 instead of 2+2 bcc @+ inc dest+1 @ stx dest 1 Quote Link to comment Share on other sites More sharing options...
damosan Posted January 16, 2020 Author Share Posted January 16, 2020 Side question - I need a fast way to determine which bitmask to use for a graphics 8 screen. I'm shifting the X left to get the byte in question (next move but that's easy...table driven lookup for X byte offset)...just need fast way to determine the bitmask based on pixel location. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted January 16, 2020 Share Posted January 16, 2020 (edited) 40 minutes ago, damosan said: just need fast way to determine the bitmask based on pixel location lda xcoord and #$07 tax lda bitmasktab,x ... bitmasktab .byte %10000000 .byte %01000000 .byte %00100000 .byte %00010000 ...etc I usually go with something like that. Obviously you can get the background mask with EOR #$ff, or have a separate table. Edited January 16, 2020 by flashjazzcat # Quote Link to comment Share on other sites More sharing options...
damosan Posted January 16, 2020 Author Share Posted January 16, 2020 (edited) Cool. I'm just playing with a black background at the moment so don't have any sort of background mask to worry about at the moment. For a 16 bit quantity you need to do something with the high byte as well? ...or...I could just do another lookup table for the masks...hmmm.... Edit: *time passes* Nope. It works ANDing just the low byte. Edited January 17, 2020 by damosan But wait...there's more... Quote Link to comment Share on other sites More sharing options...
damosan Posted January 17, 2020 Author Share Posted January 17, 2020 (edited) For those following at home. Nothing too fancy. It appears to work fine. I'm getting about 149 pixels per jiffy with this (compared to 65 / second with cc65). I suspect this could be made to go a bit faster if I used a lookup for the byte offset vs. a string of LSRs as well as added the gfx buffer address to the Y offsets at init time. .proc _plot_pixel_fast ldy yy lda yindexhi,y ; load Y index into argument (zp) sta argument+1 lda yindexlo,y sta argument clc adc basegfx ; argument low byte is already in A sta argument ; add Y offset to argument lda argument+1 ; add Y (high byte) to argument adc basegfx+1 sta argument+1 lda xx ; what bitmap to use? and #$07 ; or low byte by $07 tax ; transfer bitmap offset to X clc lsr xx+1 ; divide by two...full LSR/ROR for the first shift ror xx lda xx lsr a lsr a tay lda (argument),y ; load whatever byte is on the screen eor bitmasks,x ; eor with our computed mask sta (argument),y ; write new value back to the screen rts .endproc Edited January 17, 2020 by damosan Small edit. 1 Quote Link to comment Share on other sites More sharing options...
mono Posted January 17, 2020 Share Posted January 17, 2020 (edited) 29 minutes ago, damosan said: clc lsr xx+1 ; divide by two...full LSR/ROR for the first shift ror xx lda xx lsr a lsr a tay It could be faster a bit: lsr xx+1 lda xx ror lsr lsr tay but look at the second routine in http://atariki.krap.pl/index.php/Programowanie:_Rysowanie_punktu and replace and+ora to eor bytepxl,x at the end. Edited January 17, 2020 by mono Quote Link to comment Share on other sites More sharing options...
damosan Posted January 17, 2020 Author Share Posted January 17, 2020 Yeah that code creates tables for everything (which is good). I think my next cut will do that ... this weekend. I suspect I'll see quite a speed increase. Thanks for the link. 1 Quote Link to comment Share on other sites More sharing options...
+Stephen Posted January 17, 2020 Share Posted January 17, 2020 You'll never approach the speed that hand tuned 6502 will give you on these machines. That was true all the way up through the Jaguar. I love these threads, everybody finds a way to shrink code by 2 bytes, or make it a few cycles faster. It's a definite art. 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.