ScumSoft Posted March 2, 2011 Share Posted March 2, 2011 (edited) Hello again everyone, I've been busy since last week programming up my games framework(see blog for details). Its great to be back here and to start coding again. I have come up with a neat interlaced routine (not sure if something similar has been posted) and I need to flip a single bit on and off every frame which sets the interlace routine to even/odd frames: So what happens is during the kernal I manage it all like this: ldy Scanlines ;[]+3 ;Our Scanline count for our interlaced loops ;-Loaded from VBLANK to save precious playfield cycles PF_LOOP: tya ;[3]+2 ;Interlace frame check eor #1 ;[5]+2 ;Toggle bit, this switches between Draw & Logic scanlines and #$01 ;[7]+2 ;Mask all but bit 1 beq PF_LOGIC ;[9]+2/3 ;Branch to Logic scanline PF_DRAW: ;******************************** ; [DRAW SCANLINE] ; * 96 Scanlines of visible graphics ; * Scanline [??], S.cyc [11] ; * PixelPos [-35], Color clock [33] ;******************************** ;***DO STUFF like draw graphics jmp PF_Return ;[]+3 ;Jump out of Draw PF_LOGIC: ;******************************** ; [LOGIC] ; * 96 scanlines for logic ; * Scanline [39 to ??], S.cyc [12] ; * PixelPos [-32], Color clock [36] ;******************************** ;***DO STUFF LIKE ;***Blank out graphics so they are not visible on this scanline ;***Process updates ect ;-Fall through to PF_Return PF_Return: dey ;Decrement scanline sta WSYNC ;- bne PF_LOOP ;If more scanlines left, loop ;All done? Fall through to overscan And in the Overscan I do this: ;***Interlaced display settings*** lda Interlace ;This will interlace the display eor #1 ;-toggle bit between 0 and 1 sta Interlace ;-store value for next pass bne .ODD ;-if its a 1 then setup for ODD frames lda #191 ;Set to 192 scanlines total (191 to 0) sta Scanlines ;-used during PF_LOOP lda #33 ;Compensate 1 more scanline for interlace sta TIM64T ;Will total 262 scanlines jmp .OS_LOGIC ;Proceed to Overscan logic .ODD: lda #192 ;We want 193 scanlines (192 to 0) sta Scanlines ;-store value for next pass lda #34 ;Compensate 1 less scanline for interlace sta TIM64T ;Will total 262 scanlines ;-proceed to Overscan logic Is this a typical way of doing an interlaced kernal? Every frame is alternated starting at Logic->Draw to Draw->Logic and so forth, I find this gives minimal flicker and allows for an entire frames worth of time for logic and drawing collectively. What I would like to know is if someone invented a better way to do interlaced kernals. This seems to work fine right now for my game, but I am very curious as to what others have done before me. Okay sleep time, cya all tomorrow. Edited March 2, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 3, 2011 Author Share Posted March 3, 2011 Alright I give. How does one implement a Jump table in 6507 code? I'd like to hold an offset like such to a table of Jumps: Offset = 0 through 3 JumpTable: jmp doThis jmp doThat jmp doSomething jmp DONTDOTHAT And I'd like to call it via jmp JumpTable,Offset to land on the right bounce. Quote Link to comment Share on other sites More sharing options...
+SpiceWare Posted March 3, 2011 Share Posted March 3, 2011 This is how I did it in Stay Frosty to run the appropriate level specific routines (move fireballs, elevators, etc). I've been using this method since the 80s when I was coding on the Vic 20, C= 64 and C= 128. ;***************************** ;* S T A R T * ;* Level Specific Processing * ;***************************** lda CurrentLevel and #LEVEL_MASK ; 32 levels asl tax lda LPjumpTable+1,x pha lda LPjumpTable,x pha rts LPjumpTable: .word Level1Processing-1 ; .word Level2Processing-1 ; .word Level3Processing-1 ; .word Level4Processing-1 ; .word Level5Processing-1 ; .word Level6Processing-1 ; .word Level7Processing-1 ; .word Level8Processing-1 ; .word Level9Processing-1 ; .word Level10Processing-1 ; .word Level11Processing-1 ; .word Level12Processing-1 ; .word Level13Processing-1 ; .word Level14Processing-1 ; .word Level15Processing-1 ; .word Level16Processing-1 ; .word Level17Processing-1 ; .word Level18Processing-1 ; .word Level19Processing-1 ; .word Level20Processing-1 ; .word Level21Processing-1 ; .word Level22Processing-1 ; .word Level23Processing-1 ; .word Level24Processing-1 ; .word Level25Processing-1 ; .word Level26Processing-1 ; .word Level27Processing-1 ; .word Level28Processing-1 ; .word Level29Processing-1 ; .word Level30Processing-1 ; .word Level31Processing-1 ; .word Level32Processing-1 ; 1 Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 4, 2011 Author Share Posted March 4, 2011 Oh thank you so much SpiceWare its working great! I had the addressing wrong, I was placing commands in the table and not offsets to the routines. Once I finish my game, I'll have a huge list of problems I've encountered and should compile them together and their appropriate solutions for others to learn by. Beginner tutorials are great for getting started, but actual game design problems and their solutions would be a valuable resource don't you think? Now I can finally proceed. Thanks again. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 5, 2011 Author Share Posted March 5, 2011 Well never mind the first post here, I found a much better way to interlace the frames. Much of that code was not really needed at all Hurray for optimizations! Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 22, 2011 Author Share Posted March 22, 2011 (edited) ; Unused/undefined registers ($285-$294) ds 1 ; $286 ds 1 ; $287 ds 1 ; $288 TEMP0 ds 1 ; $289 Writeable and readable ds 1 ; $28A TEMP1 ds 1 ; $28B Writeable and readable ds 1 ; $28C ds 1 ; $28D ds 1 ; $28E ds 1 ; $28F ds 1 ; $290 ds 1 ; $291 Mirror of TEMP0 data ds 1 ; $292 ds 1 ; $293 Mirror of TEMP1 data I modified VCS.h like this. I noticed there was a few undefined bytes of memory space, but I am not sure where this space is located. So I assigned a name to each and tried storing and loading data from each of them, and this is what I got. Stella can read and write to the above addresses, but I wasn't sure if a real 2600 could, so I loaded GFX data into TEMP0 and TEMP1 each scanline, slapped it into my harmony cart and behold they work on a real 2600. Where in the 2600 are these bytes located? It defines these in the riot chip, but what parts of this chip are actually unused? The other bytes aren't writable but those 2 are for some reason. Found a RIOT.txt that explains them as this: $0286 = (RIOT $06) - Write edge detect control - negative edge, enable int (1) $0287 = (RIOT $07) - Write edge detect control - positive edge, enable int (1) $0288 = (RIOT $08) - Write DRA $0289 = (RIOT $09) - Write DDRA $028A = (RIOT $0A) - Write DRB $028B = (RIOT $0B) - Write DDRB $028C = (RIOT $0C) - Write edge detect control - negative edge, disable int (1) $028D = (RIOT $0D) - Write edge detect control - positive edge, disable int (1) $028E = (RIOT $0E) - Write edge detect control - negative edge, enable int (1) $028F = (RIOT $0F) - Write edge detect control - positive edge, enable int (1) $0290 = (RIOT $10) - Write DRA $0291 = (RIOT $11) - Write DDRA $0292 = (RIOT $12) - Write DRB $0293 = (RIOT $13) - Write DDRB I'm not sure what DRA/DDRA/DRB/DDRB pertain too. Edited March 22, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted March 22, 2011 Share Posted March 22, 2011 $0289 an $028B are mirror addresses of SWACNT and SWBCNT (used to define "data direction" of the bits of SWCHA and SWCHB). In short, you can redefine which bits of the 2 registers you want to use as ram instead of their original configuration...namely, reading the controller ports and console switches. 1 Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 22, 2011 Author Share Posted March 22, 2011 Thanks nukey! That makes much more sense. I was looking through the header file and wondered why those bytes were undefined and decided to just check em out. Okay my side tracked mission is done. Back to coding I go. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 22, 2011 Author Share Posted March 22, 2011 (edited) I've implemented a software 96x192 pixel display output. However I am having some issues coming up with a fast method to do mid-byte positioning. I have a 12byte display output buffer and brute forcing a sprites bits into position would be done like this: GeneratePixelOffset: lax PlayerX ;[]+3 Load Players X position 0-95 ldy XPosTable,X ;[]+4 Get amount to shift sty ShiftAMT ;[]+3 Store for later lsr ;[]+2 Divide by 8 lsr ;[]+2 lsr ;[]+2 tax ;[]+2 Use as offset lda GFXtable,X ;[]+4 Get GFXbuffer slot number sta P0GFXslot ;[]+3 Save for later rts ;[]+6 GFXtable: .byte $00,$06,$01,$07,$02,$08,$03,$09,$04,$0A,$05,$0B ;GFXbuffer 0-11 XPosTable: .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 00-07 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 08-15 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 16-23 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 24-31 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 32-39 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 40-47 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 48-55 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 56-63 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 64-71 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 72-79 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 80-87 .byte $00,$01,$02,$03,$04,$05,$06,$07 ;Pixels 88-95 [...] ***THEN IN THE SCANLINE KERNAL*** ;Pretend I want to draw the player right now ldy GFXindex ;Which sprite to draw? lda (GFXplayer),Y ;fetch GFX data to be drawn ldx P0GFXslot ;calculated outside kernal, 0 to 11 ;Position the sprite roughly into the appropriate GFXbuffer byte sta GFXbuffer,X ;Now rotate into position determined by ShiftAMT lda ShiftAMT beq .noShift lsr beq Shift1 lsr beq Shift2 [ect...] This would take too many cycles just checking to see how many bytes to shift. Then if say I am shifting over 3 bytes: shift3: lsr GFXbuffer,X ror GFXbuffer+1,X lsr GFXbuffer,X ror GFXbuffer+1,X lsr GFXbuffer,X ror GFXbuffer+1,X Way too many cycles over budget. So I am currently looking into doing some smart masking and bit flipping to avoid this much overhead. Would a simpler method already be known that I could learn from? 96PixelTest.bin Edited March 22, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
RevEng Posted March 22, 2011 Share Posted March 22, 2011 If you replace your "ldy XPosTable,X" with "and #7" you'll get the same results without the lookup table. The fastest method to shift your sprites is to store copies of all of your sprites pre-shifted, trading off rom for cpu time. Then use EOR to place the software sprites into your line ram instead of STA. 1 Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 22, 2011 Author Share Posted March 22, 2011 (edited) Brilliant! I really appreciate the help. [edit] Well eor only works on the acc, so to place the result into ram I have to use a sta regardless right? Edited March 22, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
RevEng Posted March 22, 2011 Share Posted March 22, 2011 Brilliant! I really appreciate the help. [edit] Well eor only works on the acc, so to place the result into ram I have to use a sta regardless right? No problem. You'd want to load the value from your sprite table into the accumulator, and then eor it into your ram line-buffer... lda (GFXplayer),y eor GFXBuffer,x Sta GFXBuffer,x ...not sure if it was clear in my last post, but using EOR instead of STA has the benefit that if 2 sprites fall into the same GFXBuffer byte, you won't disturb the second one when placing the first. You could also use ORA instead of EOR, the difference being the effect when software sprites overlap each other. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 22, 2011 Author Share Posted March 22, 2011 Right, that was already completely understood. Although having eor $ram modify the ram contents directly would be a nice addressing mode though it would take the same amount of cycles to perform. That is what macros are for right okay then. Lets calculate real quick the ram requirements of storing 7x the amount of data for one 8x8 sprite = 448 bytes. However being clever I could store half the data by sharing bytes, so half that is 224 bytes per sprite. My game has 3 different 8x24 player sprites so that would be with byte sharing 2016 bytes for the just the player graphics. Combine that with the object and monster sprites and I am well over 16k just for graphic data. I think I'll need to simplify them a bit, and cut back on the detail to make this way work. I would have hoped there existed some mathematical tricks to dividing a sprite by 4,6 and 8 without all the shifts. I'm looking into some other methods of mid byte positioning and well see which one would be most ideal then. Thanks for the continued support. Quote Link to comment Share on other sites More sharing options...
RevEng Posted March 23, 2011 Share Posted March 23, 2011 Glad to add where I can. okay then.Lets calculate real quick the ram requirements of storing 7x the amount of data for one 8x8 sprite = 448 bytes. You mean rom requirement, not ram, right? Storing a pre-shifted sprite should take (#_bytes_width+1)*height*7 bytes. For an 8x8 sprite that would be 2*8*7=112 bytes. Still a lot of rom, though. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted March 23, 2011 Author Share Posted March 23, 2011 Oh yes I did, simple typo on my part. I've realized that there isn't a need to do any of this at all, I simply don't have to do the shifting during the draw phase and can preshift the data during Vsync and Overscan time then store the result in an 8byte buffer for each object to be displayed on screen. My game is using less than 1/3 of the 128 bytes of ram, so this isn't a problem as there is only ever 4 objects on the screen maximum. I'll look into using the DPC+ for future improvements to this kernal which provides much more ram space to work with Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted April 6, 2011 Author Share Posted April 6, 2011 (edited) Is it possible to force a VSYNC instead of performing the Overscan? It seems to work on Stella, but on the real console the TV seems to always perform the Overscan. I would very much like to tell the TV to skip the Overscan and reset the frame to perform a fast redraw on the screen that would minimize/eliminate flicker on a 4 frame kernal. It would be nice if possible, as the turn around time would minimize the length of time between phosphor hits and thus offering brighter colors and minimize or eliminate the flicker seen. After 4 frames have been drawn, then we go to the Overscan and process the needed logic. Lather rinse repeat. Edited April 6, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
eshu Posted April 6, 2011 Share Posted April 6, 2011 In short - no... In long - on the VCS you need to build up all the parts of the TV signal on the fly, you could slightly reduce the overscan period to produce a non-standard frame rate (above 60hz) that may work on some tv's - the more you reduce it the less tv's it will work on, you're best off working as close to the standard as possible (60hz for NTSC, 50hz for PAL) - what you most definitely cannot do is remove the overscan on some frames and not others as then you won't even have a static frame size and frame rate - I'd be surprised if any TV would display that. Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted April 6, 2011 Author Share Posted April 6, 2011 (edited) Yeah I figured as much, the 2600 isn't controlling the TV so much as it is walking along with it. I became curious if sending a VSYNC signal would tell the TV to stop what it was doing and return the beam to the VBLANK period prematurely. If this was possible you could refresh the screen a second time much quicker thus hitting the phosphors again in a shorter period. Even if the refresh was off you wouldn't have any roll since you'd be controlling the beam the entire time. Now then, you can draw however many scanlines you wish, so long as the TV's timing is still sent the VBLANK signal once you're done. Case and point in the following demos where none of them have an Overscan period. As soon as the desired number of scanlines are drawn, we can hop back to the VBLANK and start drawing again. Although in order to avoid the roll, the required number of scanlines has to be drawn as seen in the case of the 170scanline demo, even though the scanline number is consistent, it still rolls. (I believe this can be avoided with careful timing) I was hoping we could force a non-standard refresh rate on the TV, as this would open up some really nice tricks. I've been plugging along with my game and desire to use a 4-frame flickerblind kernal without any flicker. If I could skip the overscan and refresh the screen once every other frame, this would eliminate the flicker. It works well in Stella, so I might make a version that uses this trick just for emulation play and use my other kernal for the real units. Well, it was worth a shot. [edit] Whoops forgot to add the 170scanline.bin 262Interlaced.bin 262Scanlines.bin 170Scanlines.bin Edited April 6, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted April 7, 2011 Author Share Posted April 7, 2011 (edited) I need to optimize what this routine does, even tho I wrote it myself I can't seem to find a way to make it faster. Or substitute a different method in it's place NOT preshifting every sprite sacrificing rom space for cycles, it's a much better learning experience to think of a way to do this faster. ;******************************** ; [ROTATE ROUTINES] ;******************************** ;I test RotateAMT before entering the routine, skipping when not needed. RotateBytes: lda #15 ;[]+2 Work on 16 bytes of data 0-15 sta Counter ;[]+3 Set counter lda RotateAMT ;[]+3 Fetch rotate amount cmp #5 ;[]+2 Faster to do ROL? bpl .doROL ;[]+2/3 Branch if yes ;Rotate values 1 through 4 .RORa ldx RotateAMT ;[]+3 Load amount to rotate ldy Counter ;[]+2 Use counter as byte index, Y is trashed so reload counter lda SPRITE,Y ;[]+4 Load Sprite byte to be rotated .RORloop ;8-bit rotate tay ;[]+2 Save byte being worked on ror ;[]+2 Rotate data into carry tya ;[]+2 Restore byte being worked on ror ;[]+2 Shift carry into byte dex ;[]+2 Countdown rotate amount, worst case cycles for single byte = 52, best = 12 bne .RORloop ;[]+2/3 Loop if more to rotate ldy Counter ;[]+3 General counter sta SPRITE,Y ;[]+4 Store Result into 16-byte SPRITE buffer lda #$FF ;[]+2 Compare value dcp Counter ;[]+5 Decrement counter, compare with #$FF bne .RORa ;[]+2/3 Branch if more bytes to rotate rts .doROL ;Rotate values 5 through 7 ;A holds RotateAMT right now eor #6 ;[]+2 Set value to 1,0, or 3 bne .ROLa ;[]+2/3 Does rotate value = 0 lda #2 ;[]+2 Yes then load corrected value .ROLa sta RotateAMT ;[]+3 Save new value .ROLb ldx RotateAMT ;[]+3 Load amount to rotate ldy Counter ;[]+2 Use counter as byte index, Y is trashed so reload counter lda SPRITE,Y ;[]+4 Load Sprite byte to be rotated .ROLloop ;8-bit rotate tay ;[]+2 Save byte being worked on rol ;[]+2 Rotate data into carry tya ;[]+2 Restore byte being worked on rol ;[]+2 Shift carry into byte dex ;[]+2 Countdown rotate amount, worst case cycles for single byte = 52, best = 12 bne .ROLloop ;[]+2/3 Loop if more to rotate ldy Counter ;[]+3 General counter sta SPRITE,Y ;[]+4 Store Result into 16-byte SPRITE buffer lda #$FF ;[]+2 Compare value dcp Counter ;[]+5 Decrement counter, compare with #$FF bne .ROLb ;[]+2/3 Branch if more bytes to rotate rts Edited April 7, 2011 by ScumSoft Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted April 8, 2011 Share Posted April 8, 2011 What is most important here? Reducing average time or maximum time? Simple improvement: For the ROL loop you better do cmp #$80 rol Other ideas are: - store every sprite twice (normal and shifted 4 bits), so that you have to shift less - unroll the loops and use a jump table to select the correct starting point - rol/ror with 9 bits (incl. the carry) and fix the extra bit after the last shift. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted April 8, 2011 Share Posted April 8, 2011 (edited) Here is some untested(!) code: Your code: inner loop: 13 cylces; average (2.5 times): 31.5 outer loop: 26 cylces + 31.5; total (16 times): 919 cycles lda #>Ror1 ;[]+2 sta .vec ;[]+3 ... ldy RotateAMT ;[]+3 Load amount to rotate lda RorJmpTbl,y ;[]+4 sta .vec ;[]+3 ldx #15 []+2 Use counter as byte index .RORa lda SPRITE,x ;[]+4 Load Sprite byte to be rotated and RorAndTbl,y ;[]+4 ror ;[]+2 eor SPRITE,x ;[]+4 and RorAndTbl,y ;[]+4 eor SPRITE,x ;[]+4 = 18 extra cycles to preserve X and make ror faster jmp (.vec) ;[]+5 Ror4: ror ;[]+2 Ror3: ror ;[]+2 Ror2: ror ;[]+2 Ror1: ror ;[]+2 sta SPRITE,x ;[]+4 Store Result into 16-byte SPRITE buffer dex ;[]+2 Decrement counter bpl .RORa ;[]+2/3 Branch if more bytes to rotate ... RorJmpTbl: ; make sure >Ror1 and >Ror4 are in the same page! .byte <Ror1, <Ror2, <Ror3, <Ror4 RorAndTbl: .byte %1, %11, %111, %1111 setup code: 17 cycles shifts (average 2.5 shifts): 5 cycles loop: 36 + 5 = 41 cycles; total (16 times): 655 cycles Edited April 8, 2011 by Thomas Jentzsch 1 Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted April 8, 2011 Author Share Posted April 8, 2011 I do appreciate the help, I'll see how much of an impact this new setup makes. What is most important is to get the entire routine to minimal cycles used, So I really like the hybrid shift + preshifted sprite idea, one thing I definitely didn't think of using before. I am over budget by 456 cycles in the overscan because I moved from 8x8 sprites to 8x16. 8x8 just didn't have the space to make the player look right, but this also added quite a bit of presprite overhead to the routines. I need preshifted data due to the way my software output buffers align to the 96x96 screen space I have setup right now. The player position registers aren't moving along with the sprite like your typical game does, instead they are stationary and split into 6 sections per frame, then aligned to form a 12x24 software settable block of pixels. Ok, off to test the code. Be back later on. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted April 8, 2011 Share Posted April 8, 2011 I do appreciate the help, I'll see how much of an impact this new setup makes. What is most important is to get the entire routine to minimal cycles used, So I really like the hybrid shift + preshifted sprite idea, one thing I definitely didn't think of using before. I am over budget by 456 cycles in the overscan Do you have free cycles in VBlank? Quote Link to comment Share on other sites More sharing options...
ScumSoft Posted April 9, 2011 Author Share Posted April 9, 2011 Vblanks time is entirely used for working on the remaining sprites, level construction, sounds and game logic, and masking the buffers to get them ready for drawing. Only the player and monsters are 8x16 sprites and therefore take the most time to work on, so I do them in the overscan first, everything else are 8x8 sprites and not an issue. I'll probably rearrange the workload as need be later on, I just need the entire games functionality in place first. But I wanted to see if there was a way to optimize this routine now as it dictates how large of a sprite I can toss in my game and still have time left over for other things. If 8x8 is the largest feesable sprite to work on then I have to design the game around this accordingly see? But I am sure I can get some 8x16 ones in here. Quote Link to comment Share on other sites More sharing options...
bogax Posted April 9, 2011 Share Posted April 9, 2011 ... .RORa lda SPRITE,x ;[]+4 Load Sprite byte to be rotated and RorAndTbl,y ;[]+4 ror ;[]+2 eor SPRITE,x ;[]+4 and RorAndTbl,y ;[]+4 eor SPRITE,x ;[]+4 = 18 extra cycles to preserve X and make ror faster jmp (.vec) ;[]+5 ... I don't think you need that first and eg .RORa lda SPRITE,x ; ? abcdefgh ror ; h ?abcdefg eor SPRITE,x ; h xxxxxxxx and RorAndTbl,y ; h 0000xxxx eor SPRITE,x ; h abcddefg ror ; g habcddef ror ; f ghabcdde ror ; e fghabcdd ror ; d efghabcd an alternative lda SPRITE,x ; ? abcdefgh and RorAndTbl,y ; ? abcd0000 clc adc SPRITE,x ; a bcd0efgh ror ; h abcd0efg ror ; g habcd0ef ror ; f ghabcd0e ror ; e fghabcd0 ror ; 0 efghabcd of course you have to invert the mask(s) and if you know the carry will be clear you could leave out the clc and possibly gain a couple cycles for rol lda SPRITE,x ; ? abcdefgh asl ; a bcdefgh0 adc #$80 ; b ?cdefgha rol ; ? cdefghab ie three cycles per bit if you do them in pairs but I think you'd need a seperate routine for an odd number of bits 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.