TROGDOR Posted August 3, 2008 Share Posted August 3, 2008 (edited) 8k ROMs are a thing of beauty. Rather then spending all my time optimizing a 4k ROM, I can spend my time writing features, and not worrying as much about it being super optimized. It also allows me to try some ROM-intensive techniques to speed up my kernel. In a recent post, vdub_bobby mentioned a nice masking technique to quickly draw a player sprite: lda (GfxPtr),Y and (MaskPtr),Y sta GRP0;+13 cycles Pros: Still pretty fast. Only need to pad your Mask with zeroes, so doesn't use as much space. Cons: Even padding one table with zeroes is a lot of wasted ROM This code can even be used with animated sprites. If you set GfxPtr correctly, it will load garbage while the mask is on, and then load the proper graphics when the mask is off. So you can point GfxPtr to different animation cells on different frames, as long as their size matches the mask size. And you don't even have to worry about zero padding. The problem I'm seeing with this implementation is timing. If either of the indirection calls crosses a page boundary, the instruction will take 6 cycles instead of 5 cycles. This has two negative implications. The algorithm will potentially take 15 cycles instead of 13, and you'll have to use a WSYNC because the time can vary. The only way I see this as avoidable is if you use half vertical resolution. For example, you could use 88 zeros, then 8 player graphics, then another 88 zeros. Same with the mask. In my case, I'm using full vertical resolution. That's 184 zeros, 8 mask bytes, and another 184 zeros, which will cross a page boundary at some point. The only way to avoid the page boundary is to use 3 pages: 184 zeros 8 mask 64 zeros 64 zeros 8 mask 184 zeros 124 zeros 8 mask 124 zeros That eats 768 bytes of ROM, but it will allow you to pick a mask that spans 192 bytes and never crosses a page boundary. (Correct me if I'm wrong on this, or if there is an easier way to do it.) The other problem is the animated sprites. There's no way I'm going to try to zero pad all my animations, so I'm going to live with the fact that a page boundary may be crossed when loading the player graphics. This will force me to use a WSYNC. Since I've already taken that hit, I may as well allow the boundary crossing on the mask as well. So my mask becomes: 184 zeros 8 mask 184 zeros That's 376 bytes wasted. I can live with that in an 8k ROM. So the end result is I have a variable time graphics algorithm that will take at most 15 cycles, require a WSYNC, and will burn 376 bytes of ROM on a mask. Can anyone offer any optimization on this, or is this as good as it gets? Edited August 8, 2008 by TROGDOR Quote Link to comment Share on other sites More sharing options...
+batari Posted August 3, 2008 Share Posted August 3, 2008 If you are resorting to a WSYNC to compensate for page crossing, you are at best using 18 cycles for your routine now. I start to wonder if the masking method is worth it now, as there are non-masking, constant cycle drawing methods that may fit in 18 cycles, without wasting vast areas of ROM. Quote Link to comment Share on other sites More sharing options...
Cybergoth Posted August 3, 2008 Share Posted August 3, 2008 Well, it depends on how cycle exact the rest of the timing has to be and lots of other factors. Of course no matter how many masked objects you use, you only need to WSYNC once per line. And a lot of my kernels could afford that (or needed syncing for other reasons anyway ) Also, even when doing a full resolution screen you can possibly still split it into a top and a bottom half or even more zones. Quote Link to comment Share on other sites More sharing options...
TROGDOR Posted August 3, 2008 Author Share Posted August 3, 2008 (edited) I've considered the alternative, but it eats even more cycles: LDA #SPRITEHEIGHT; 2 DCP SpriteTemp; 5 BCC SkipDraw; 2 LDA (GfxPtr),Y; 5 STA GRP0; 3 ReturnFromSkipDraw;+17 cycles This will work if you already have 0 stored in GRP0, and if you pad an extra 0 at the end of the player graphics. But, this won't work using VDEL. I need to write to GRP0 even when it's zero, to allow VDEL to work as expected. But, if I write GRP0, I'll have to make the SkipDraw section longer. SkipDraw ; 3 cycles from the previous branch. Total must be 12 to sync with kernel code. LDA #0; 2 STA GRP0; force 4 cycle absolute. JMP ReturnFromSkipDraw; 3 LDA #SPRITEHEIGHT; 2 DCP SpriteTemp; 5 BCC SkipDraw; 2 LDA (GfxPtr),Y; 5 STA GRP0; 3 NOP; 2 ReturnFromSkipDraw;+19 cycles So I'm looking at 19+19 = 38 cycles for 2 sprites with skipdraw, or 15+15+3 = 33 cycles for 2 sprites with masking. Are there any other possibilities? Ah, I've thought of one improvement. I only need VDEL on one of the two sprites. So I can squeeze it down to 19+17 = 36 cycles for two skipdraw sprites. If I understand VDEL correctly, I can set VDEL on P0, store the P0 graphic in the VDEL buffer, then write P1 (in non-VDEL mode), and that will write out P0 and P1 simultaneously, during horizontal blank. Edit: I've shaved off another cycle: SkipDraw ; 3 cycles from the previous branch. Total must be 11 to sync with kernel code. LDA #0; 2 STA GRP0; 3 JMP ReturnFromSkipDraw; 3 LDA #SPRITEHEIGHT; 2 DCP SpriteTemp; 5 BCC SkipDraw; 2 LDA (GfxPtr),Y; 5 STA GRP0; 4 (force absolute mode) ReturnFromSkipDraw;+18 cycles So now I'm down to 17+18 = 35 cycles for 2 sprite skipdraw with VDEL. Edited August 3, 2008 by TROGDOR Quote Link to comment Share on other sites More sharing options...
+batari Posted August 3, 2008 Share Posted August 3, 2008 If you are using a constant sprite height of 8, and you can be sure that bit 3 of the X register is always 0, you can use SAX to put a zero in GRP0 without needing the load. Or if you post your whole kernel, some programmers here may be able to suggest something else Quote Link to comment Share on other sites More sharing options...
supercat Posted August 3, 2008 Share Posted August 3, 2008 The only way I see this as avoidable is if you use half vertical resolution. There are two ways to get around that: -1- Subdivide the screen into 'zones', and use different pointers for each zones. One advantage of this approach is that if you use two zones of 96 pixels each you can make objects wrap cleanly from the top of the screen to the bottom and be partially-visible in both places. I know of no other practical means to do that. -2- Use a two-line kernel that loads alternating sets of sprite data every line. Simplest style: ldy #95 lp: sta WSYNC; As though you're not going to be doing other stuff in kernel... lda (sprite1a),y and (mask1),y sta GRP0 lda (color1a),y sta COLUP0 sta WSYNC; As though you're not going to be doing other stuff in kernel... lda (sprite1b),y and (mask1),y sta GRP0 lda (color1b),y sta COLUP0 dey bpl lp The interleaving (not sure about the masking) is the approach used in E.T. and it works very well. You'll probably have to write some utilities to separate out the even and odd lines of sprite data, but you get full-resolution color sprites movable in single-line increments with one zero byte between sprites. If you have some sprites that only need half the resolution, you can use those too and still move them in single-line increments (again, with one zero between sprites). Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted August 5, 2008 Share Posted August 5, 2008 (edited) Edit:I've shaved off another cycle: SkipDraw ; 3 cycles from the previous branch. Total must be 11 to sync with kernel code. LDA #0; 2 STA GRP0; 3 JMP ReturnFromSkipDraw; 3 LDA #SPRITEHEIGHT; 2 DCP SpriteTemp; 5 BCC SkipDraw; 2 LDA (GfxPtr),Y; 5 STA GRP0; 4 (force absolute mode) ReturnFromSkipDraw;+18 cycles So now I'm down to 17+18 = 35 cycles for 2 sprite skipdraw with VDEL. A (possibly) better way is to use the variation I call DoDraw: lda #SPRITEHEIGHT dcp SpriteTemp bcs DoDraw lda #0 .byte $2C DoDraw lda (GfxPtr),Y sta GRP0 Won't work in some of the more exotic bankswitch schemes (including the SuperCharger!) but it gets you a constant 18 cycles and writes to the graphics registers every time. Also doesn't require any outside branches, which can become a HUGE pain to manage in a complicated kernel. Edited August 5, 2008 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
TROGDOR Posted August 8, 2008 Author Share Posted August 8, 2008 (edited) I've spent the last couple days working on the kernel for The Battle of Midway. Here's the evolution. All three of the following kernels do the same thing. They display both planes, both missiles, and the bomb on every scanline, with full vertical resolution for both display and movement. Kernel 1 The original kernel was crap, which I wrote 3 years ago. To its credit, it does actually work, but in a horribly inefficient way. When you see the code, you'll understand why I was itching to rewrite it. NormalKernel LDX #ENABL TXS LDX #184 LDY #0 LDA #0 STA Temp4 STA Temp5 ;------------------------------------------------------------ JMP KernelLoop align 256 SkipPlane0Draw LDA #0 ;2 (+1 from previous branch.) STA Temp4 ;3 Update the buffer. NOP ;2 NOP ;2 NOP ;2 NOP ;2 BEQ ContinueKernel0 ;3 Block = 17 cycles. SkipPlane1Draw LDA #0 ;2 (+1 from previous branch.) STA Temp5 ;3 Update the buffer. NOP ;2 NOP ;2 NOP ;2 NOP ;2 BEQ ContinueKernel1 ;3 Block = 17 cycles. KernelLoop CPX BombYHi ;3 Preload the BombY comparison. STA WSYNC ;3 ; New scanline starts here. PHP ;3 sets ENABL STA GRP1 ;3 STY GRP0 ;3 CPX Bullet1Y ;3 PHP ;3 sets ENAM1 CPX Bullet0Y ;3 PHP ;3 sets ENAM0, Total 21 ; Reset the stack TXA ;2 LDX #ENABL ;2 TXS ;2 TAX ;2 Block = 8 cycles. ; Determine P0 graphics for next line. SEC ;2 ; TXA ;2 A now holds scanline. Careful! This is implied from above. SBC Plane0PosYHi ;3 ADC #9 ;2 BCC SkipPlane0Draw ;2 or 3, Block = 9 TAY ;2 LDA (Plane0GfxLo),Y ;5 STA Temp4 ;3 This is the buffer for the next player graphics. INY ;2 LDA (Plane0GfxLo),Y ;5, Block = 17 ContinueKernel0 LDY Temp5 ;3 Pull the graphics for P1 from the buffer. DEX ;2 CPX BombYHi ;3 Preload the BombY comparison. STA WSYNC ;3 Block = 11, Total = 66 ;---------------------------------------- ; Next scanline starts here. PHP ;3 sets ENABL STA GRP0 ;3 STY GRP1 ;3 CPX Bullet1Y ;3 PHP ;3 sets ENAM1 CPX Bullet0Y ;3 PHP ;3 sets ENAM0, Total 21 ; Reset the stack TXA ;2 LDX #ENABL ;2 TXS ;2 TAX ;2 Block = 8 cycles. ; Determine P1 graphics for next line. SEC ;2 ; TXA ;2 A now holds scanline. Careful! Implied from stack hack. SBC Plane1Y ;3 ADC #9 ;2 BCC SkipPlane1Draw ;2 or 3, Block = 9 TAY ;2 LDA (Plane1GfxLo),Y ;5 STA Temp5 ;3 This is the buffer for the next player graphics. INY ;2 LDA (Plane1GfxLo),Y ;5, Block = 17 ContinueKernel1 LDY Temp4 ;3 Pull the graphics for P1 from the buffer. DEX ;2 TitleKernelEntry CPX #ShipZone ;3 ; CPX BombYHi ;3 Implied above. ; STA WSYNC ;3 Impiled above. BNE KernelLoop ;2 or 3, Block = 13, Total = 71 ;------------------------------------------------------------ This beast would load up P0 graphics for the current line and the next line, storing the next line in a temp var. On the next line, it would do the same thing with P1. In this manner, it could write both P0 and P1 on every line. It barely fit in the time constraints. Kernel 2 It only took me 4 years, but I finally fully understand how the VDEL buffers work and are meant to be used. Kernel 2 is a rewrite of the first kernel, but making proper use of VDEL for P0. This removes the need for the temp variables, and saves enough time that I was able to squeeze it down to a 1LK. It could be optimized further using the DCP skipdraw, which would shave off 6 more cycles. It currently uses all 76 cycles, although 3 of those cycles are a NOP. Is there a cleaner way to declare a 3 cycle NOP in dasm? NormalKernel LDX #ENABL TXS LDA #1 STA VDELP0 ;Enable VDEL for P0, so it can write early in the line. LDA #0 STA VDELP1 ;Enable VDEL for P0, so it can write early in the line. LDX #183 LDY #0 LDA #0 STA Temp4 STA Temp5 ;------------------------------------------------------------ JMP KernelLoopEntry align 256 SkipPlane0Draw ;3 from branch LDA Temp4 ;3 cycle hack to load 0. JMP ContinueKernel0 ;3 Block = 9 cycles. SkipPlane1Draw ;3 from branch LDA Temp4 ;3 cycle hack to load 0. JMP ContinueKernel1 ;3 Block = 9 cycles. ;------------------------------------------------------------ KernelLoopEntry STA WSYNC KernelLoop ; New scanline starts here. CPX BombYHi ;3 PHP ;3 sets ENABL STA GRP1 ;3 CPX Bullet1Y ;3 PHP ;3 sets ENAM1 CPX Bullet0Y ;3 PHP ;3 sets ENAM0, Total 21 ; Reset the stack TXA ;2 LDX #ENABL ;2 TXS ;2 TAX ;2 Block = 8 cycles. (29) ; Determine P0 graphics for next line. SEC ;2 ; TXA ;2 A now holds scanline. Careful! This is implied from above. SBC Plane0PosYHi ;3 ADC #9 ;2 BCC SkipPlane0Draw ;2 or 3, TAY ;2 LDA (Plane0GfxLo),Y ;5 ContinueKernel0 STA GRP0 ;3 Block = 19 (48) SEC ;2 TXA ;2 A now holds scanline. SBC Plane1Y ;3 ADC #9 ;2 BCC SkipPlane1Draw ;2 or 3, Block = 9 TAY ;2 LDA (Plane1GfxLo),Y ;5 Block = 18 (66) ContinueKernel1 DEX ;2 TitleKernelEntry .byte #$04 ;3 cycle NOP .byte #$00 CPX #ShipZone ;2 BNE KernelLoop ;2 or 3, Block = 10, Total = 76 Kernel 3 This kernel uses quick load masking. It does the same thing in only 60 cycles. NormalKernel LDX #ENABL ;2 TXS ;2 LDA #1 ;2 STA VDELP0 ;3 Enable VDEL for P0, so it can write early in the line. LDA #0 ;2 STA VDELP1 ;3 Disable VDEL for P1. LDX #ENABL ;2 LDY #184 ;2 LDA #0 ;2 ;------------------------------------------------------------ JMP KernelLoop ;3 This jump is necessary because of the align. align 256 KernelLoop CPY BombYHi ;3 STA WSYNC ;3 ; New scanline starts here. PHP ;3 sets ENABL STA GRP1 ;3 CPY Bullet1Y ;3 PHP ;3 sets ENAM1 CPY Bullet0Y ;3 PHP ;3 sets ENAM0, Total 18 ; Reset the stack TXS ;2 (20) LDA (Plane0GfxLo),Y ;~6 AND (MaskGfx0Lo),Y ;~6 STA GRP0 ;3 Block = 15 (35) LDA (Plane1GfxLo),Y ;~6 AND (MaskGfx1Lo),Y ;~6 Block = 12 (47) DEY ;2 TitleKernelEntry CPY #ShipZone ;2 BNE KernelLoop ;2 or 3, Block = 7 (54) ;+6 from above (60) One interesting thing about this kernel is it uses the Y index for everything. This means I can preload X with #ENABL, and it only takes me one instruction to reset the stack hack. Even with the WSYNC I've got 16 more cycles to burn, which I could use to add multiline color or a variable height ball or missile. The trickiest part about using the mask is correctly preloading the mask and graphics vectors. Here's the support code to set up the vectors: UpdateSpriteDirs LDX Plane0Dir LDA SpriteLookupTable,X SEC SBC Plane0Y STA Plane0GfxLo LDX Plane1Dir LDA SpriteLookupTable,X SEC SBC Plane1Y STA Plane1GfxLo LDY #>Plane00 STY Plane0GfxHi LDY #>Plane00 STY Plane1GfxHi ... LDA #>KernelMask ; Set up the mask for the normal kernel. STA MaskGfx0Hi STA MaskGfx1Hi LDA #<KernelMask SEC SBC Plane0Y STA MaskGfx0Lo LDA #<KernelMask SEC SBC Plane1Y STA MaskGfx1Lo ... SpriteLookupTable .byte #(<Plane00 + .byte #(<Plane01 + .byte #(<Plane01 + .byte #(<Plane02 + .byte #(<Plane02 + .byte #(<Plane02 + .byte #(<Plane03 + .byte #(<Plane03 + .byte #(<Plane04 + .byte #(<Plane05 + .byte #(<Plane05 + .byte #(<Plane06 + .byte #(<Plane06 + .byte #(<Plane06 + .byte #(<Plane07 + .byte #(<Plane07 + ... align 256 (184 bytes of whatever data you want to put here.) Plane00 dc.b #%00111000 dc.b #%00010000 dc.b #%00111000 dc.b #%01111100 dc.b #%11111110 dc.b #%00010000 dc.b #%00010000 dc.b #%00010000 Plane01 etc... The +8 in the LookupTable sets the origin at the bottom of the target graphics data. For the plane graphics data, you can put anything in the first 184 bytes of the page, but Plane00 has to be at least 184 bytes away from the start of the page. Also, be careful that you don't have so many animation cells that you go into the next page. For this game, I had 8 lines x 8 graphics = 64 bytes. 184 + 64 = 248, so I'm safe. When setting the mask and graphics vectors, the vector high bytes are static. The vector low bytes are calculated by subtracting their vertical location from the mask origin label. And the mask: ;----------------------------------------------------- align 256 .byte #0 .byte #0 ... repeat for a total of 184 zeros. .byte #0 .byte #0 .byte #$FF .byte #$FF .byte #$FF .byte #$FF .byte #$FF .byte #$FF .byte #$FF .byte #$FF KernelMask .byte #0 .byte #0 ... repeat for another 184 zeros. .byte #0 .byte #0 The key here is that the mask label goes right after the $FF "enable" bytes. The align is necessary to make sure the high byte of your mask vector points to the correct page. Here's the latest Battle of Midway zip with the source, which now uses the masking kernel. It's almost a playable game now. My next blog update should have a playable version. tbom19b.zip Edited August 8, 2008 by TROGDOR Quote Link to comment Share on other sites More sharing options...
Cybergoth Posted August 8, 2008 Share Posted August 8, 2008 Is there a cleaner way to declare a 3 cycle NOP in dasm? Using the common macro.h it's as simple as: SLEEP 3 Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted August 12, 2008 Share Posted August 12, 2008 (edited) And doesn't DASM support NOP 0 ? Or something similar? Edited August 12, 2008 by vdub_bobby Quote Link to comment Share on other sites More sharing options...
Cybergoth Posted August 12, 2008 Share Posted August 12, 2008 I think that works. Maybe it understands even DOP, though I think that particular mnemonic isn't as popular as for example LAX. Quote Link to comment Share on other sites More sharing options...
Robert M Posted October 3, 2008 Share Posted October 3, 2008 -2- Use a two-line kernel that loads alternating sets of sprite data every line. Simplest style: ldy #95 lp: sta WSYNC; As though you're not going to be doing other stuff in kernel... lda (sprite1a),y and (mask1),y sta GRP0 lda (color1a),y sta COLUP0 sta WSYNC; As though you're not going to be doing other stuff in kernel... lda (sprite1b),y and (mask1),y sta GRP0 lda (color1b),y sta COLUP0 dey bpl lp The interleaving (not sure about the masking) is the approach used in E.T. and it works very well. You'll probably have to write some utilities to separate out the even and odd lines of sprite data, but you get full-resolution color sprites movable in single-line increments with one zero byte between sprites. If you have some sprites that only need half the resolution, you can use those too and still move them in single-line increments (again, with one zero between sprites). That's the approach I am using in the Littleman kernel. It cuts the masking table size in half, but it also splits your sprite data into two tables which is painful to generate and maintain without some sort of editor tool. Cheers! Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted October 3, 2008 Share Posted October 3, 2008 MaskedDraw should be still faster, but maybe have a look at SwitchDraw, which is a bit faster than SkipDraw. Quote Link to comment Share on other sites More sharing options...
Cybergoth Posted October 3, 2008 Share Posted October 3, 2008 it also splits your sprite data into two tables which is painful to generate and maintain without some sort of editor tool. I indeed hated doing this for my crazy balloon Quote Link to comment Share on other sites More sharing options...
vdub_bobby Posted October 6, 2008 Share Posted October 6, 2008 it also splits your sprite data into two tables which is painful to generate and maintain without some sort of editor tool. I indeed hated doing this for my crazy balloon Yeah, I hate it too but it is so useful I find it hard to avoid - it's almost a necessity if you use SwitchDraw. I used it in M-4, Squish 'Em, and Elevators Amiss. Didn't use it in Reindeer Rescue, BLiP Football, and Go Fish! Quote Link to comment Share on other sites More sharing options...
zilog_z80a Posted March 24, 2018 Share Posted March 24, 2018 (edited) hi, it's my imagination or when i put just one time in a scanline this... each instruction takes 5 cycles lda (Ggraphic0),y ;5 cycles lda (Graphic0Mask) ;5 cycles . . but.... in a single scanline IF I PUT THE SAME TWICE... THEN EVERY INSTRUCTION (or three) TAKES 6 CYCLES, WHY IS IT? lda (Ggraphic0),y ;6 cycles lda (Graphic0Mask) ;6 cycles lda (Ggraphic1),y ;6 cycles lda (Graphic1Mask) ;6 cycles . . cheers. ps: there is no crossing pages in my graphics all of them use ALIGN 256 and have less than 256 lines. Edited March 25, 2018 by zilog_z80a Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted March 24, 2018 Share Posted March 24, 2018 Assuming you have ",y" after each instruction only a page crossing can lead to 6 cycles. Are your pointers pointing a the graphics page or is adding Y to the pointers crossing a page? 1 Quote Link to comment Share on other sites More sharing options...
zilog_z80a Posted March 25, 2018 Share Posted March 25, 2018 (edited) Assuming you have ",y" after each instruction only a page crossing can lead to 6 cycles. Are your pointers pointing a the graphics page or is adding Y to the pointers crossing a page? . . Hi Thomas, this is the code . . processor 6502 include "vcs.h" include "macro.h" JOY0_RIGHT equ 128 JOY0_LEFT equ 64 JOY0_DOWN equ 32 JOY0_UP equ 16 seg.u variables org $80 mask0 ds 2 mask1 ds 2 Player0PTR ds 2 Magnet_PTR ds 2 Player0X ds 1 Player0Y ds 1 Player1X ds 1 Player1Y ds 1 time_of_jump ds 1 seg code org $F000 Start CLEAN_START LDA #$0 STA COLUBK LDA #$40 STA COLUPF lda #$1E sta COLUP0 sta COLUP1 lda #0 sta time_of_jump lda #140 sta Player0Y lda #70 sta Player0X lda #14 sta Player1Y MainLoop LDA #2 STA VSYNC STA WSYNC STA WSYNC STA WSYNC LDA #0 STA VSYNC jsr VerticalBlank jsr Test0Left jsr drawscreen jsr Boundries lda Player0X jsr HorizPositioning sta WSYNC sta HMOVE jsr do_overscan jmp MainLoop VerticalBlank LDA #46 STA TIM64T LDA Magnet_L SEC SBC Player1Y STA Magnet_PTR LDA Magnet_H SBC #0 STA Magnet_PTR+1 LDA Rectangle_L SEC SBC Player0Y STA Player0PTR LDA Rectangle_H SBC #0 STA Player0PTR+1 LDA #<Mask_magnet SEC SBC Player1Y STA mask1 LDA #>Mask_magnet SBC #0 STA mask1+1 LDA #<Mask SEC SBC Player0Y STA mask0 LDA #>Mask SBC #0 STA mask0+1 LDA #%10000000 BIT CXPPMM BEQ NoCollision LDA Player0X STA COLUPF NoCollision ; STA CXCLR rts drawscreen waste_blank LDA INTIM BNE waste_blank sta VBLANK LDY #63 kernel_0 sta WSYNC dey bne kernel_0 LDY #127 kernel_loop sta WSYNC lda (Player0PTR),y and (mask0),y ldX PLAYF0 stX PF0 ldx PLAYF0 stx PF1 sta GRP0 ldx PLAYF0 stx PF2 ldx PLAYF0 stx PF0 ldx PLAYF0 stx PF1 lda (Magnet_PTR),Y ldx PLAYF0 stx PF2 and (mask1),y sta GRP1 dey bne kernel_loop LDA #37 STA TIM64T lda #2 sta VBLANK rts Test0Left lda #JOY0_LEFT bit SWCHA bne Test0Right dec Player0X Test0Right lda #JOY0_RIGHT bit SWCHA bne boton inc Player0X boton bit INPT4 bmi cae1 lda Player0Y cmp #1 bne cae1 ldx #24 JUMP_LENGHT inc Player0Y dex bne JUMP_LENGHT cae1 lda Player0Y cmp #1 bne cae2 jmp J0Test_Over cae2 dec Player0Y J0Test_Over rts ;------------------------------------------------ Boundries SUBROUTINE lax Player0X cmp #158 bcc P0XLow lda #158 sta Player0X P0XLow cpx #6 bcs BTest_Over lda #6 sta Player0X BTest_Over rts HorizPositioning SUBROUTINE sec sta WSYNC .divideBy15 sbc #15 bcs .divideBy15 eor #7 asl asl asl asl sta HMP0 sta RESP0 rts do_overscan OverScanWait LDA INTIM BNE OverScanWait rts Magnet_L: .byte #<MAGNET_TABLE Magnet_H: .byte #>MAGNET_TABLE Rectangle_L: .byte #<Rectangle_table Rectangle_H: .byte #>Rectangle_table ALIGN 256 MAGNET_TABLE: .byte #%11000011 .byte #%11000011 .byte #%00000000 .byte #%11000011 .byte #%11000011 .byte #%11000011 .byte #%11000011 .byte #%11100111 .byte #%01111110 .byte #%01111110 .byte #%00111100 .byte #%00011000 align 256 ds 90, #0 Mask: ds 37, #$FF ds 90, #0 ALIGN 256 ds 115, #0 Mask_magnet: ds 12, #$FF ds 115, #0 align 256 Rectangle_table: .byte #0 ds 36, #$FF ALIGN 256 PLAYF0 .byte #%10101010 org $FFFC .word Start .word Start END . . Attached RECTANGLE DAY SOURCE: http://atari.rf.gd/RECTANGLE_DAY.asm Edited March 25, 2018 by zilog_z80a Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted March 25, 2018 Share Posted March 25, 2018 If you look into the Stella debugger, when the screen has settled, Player0Ptr ($f4ff) and Magnet_PTR ($f1f2) are clearly causing a page fault. Y values are ranging from 0..127, so the low pointer byte must not become larger than $80. I suppose you have to increase your 0 padding area for the masks. 1 Quote Link to comment Share on other sites More sharing options...
zilog_z80a Posted March 25, 2018 Share Posted March 25, 2018 (edited) If you look into the Stella debugger, when the screen has settled, Player0Ptr ($f4ff) and Magnet_PTR ($f1f2) are clearly causing a page fault. Y values are ranging from 0..127, so the low pointer byte must not become larger than $80. I suppose you have to increase your 0 padding area for the masks. Sorry Thomas, this answer is too advanced for me is there some explanation for dummies? i can learn if you can please tell me in a more simple way or how to recognize this, i cannot understand why this is happening. i swear i have opened my stella debugger but i'm not able to see this. ty in advance. Edited March 25, 2018 by zilog_z80a Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted March 25, 2018 Share Posted March 25, 2018 (edited) Do you know how to use Stella's debugger? It is very powerful and really makes problems transparent. As for your problem, when a pointer points to e.g $f4ff, then an extra cycle due to target address page crossing will happen with any Y value (except 0). This code won't work: LDA #>Mask SBC #0 STA mask0+1 mask0+1 always has to be #>Mask, never one less. Because then you have that page crossing. So this: LDA #<Mask SEC SBC Player0Y must never underflow. This means the 0 padding before Mask must be as large as the maximum value for Player0Y. And then you need the same amount of padding after Mask. So this won't work for Y ranges larger ~128. Then you need a 2nd Mask table (less padding before but more after). Or you have to look for an (slower) alternative to masked draw (e.g. SkipDraw) Edited March 25, 2018 by Thomas Jentzsch 1 Quote Link to comment Share on other sites More sharing options...
zilog_z80a Posted March 25, 2018 Share Posted March 25, 2018 (edited) Do you know how to use Stella's debugger? It is very powerful and really makes problems transparent. As for your problem, when a pointer points to e.g $f4ff, then an extra cycle due to target address page crossing will happen with any Y value (except 0). This code won't work: LDA #>Mask SBC #0 STA mask0+1 mask0+1 always has to be #>Mask, never one less. Because then you have that page crossing. So this: LDA #<Mask SEC SBC Player0Y must never underflow. This means the 0 padding before Mask must be as large as the maximum value for Player0Y. And then you need the same amount of padding after Mask. So this won't work for Y ranges larger ~128. Then you need a 2nd Mask table (less padding before but more after). Or you have to look for an (slower) alternative to masked draw (e.g. SkipDraw) Hi Thomas, thank you very much for this info and the time you has put for me to learn!! I'm still a relatively newbie, so there are subjects that i haven't cover in stella or asm coding, with your info in mind now is a new brick in all this nice puzzle. i have in mind to use sectors (or bands) and with this a little less padding, ok... and more memory... but i see this you have teached me... amd thanks to you i know that at least i can use six writes to playfield and two sprites per line. without this my project would be IMPOSSIBLE. ty again for all, i will rethink anyway all this info. cheers!! Edited March 25, 2018 by zilog_z80a Quote Link to comment Share on other sites More sharing options...
zilog_z80a Posted March 25, 2018 Share Posted March 25, 2018 (edited) Do you know how to use Stella's debugger? It is very powerful and really makes problems transparent. As for your problem, when a pointer points to e.g $f4ff, then an extra cycle due to target address page crossing will happen with any Y value (except 0). This code won't work: LDA #>Mask SBC #0 STA mask0+1 mask0+1 always has to be #>Mask, never one less. Because then you have that page crossing. So this: LDA #<Mask SEC SBC Player0Y must never underflow. This means the 0 padding before Mask must be as large as the maximum value for Player0Y. And then you need the same amount of padding after Mask. So this won't work for Y ranges larger ~128. Then you need a 2nd Mask table (less padding before but more after). Or you have to look for an (slower) alternative to masked draw (e.g. SkipDraw) Hi, i did a kernel of 60 scanlines, and there is the same issue. i think i must study more, i'm totally bump with this. i have near two years learning and learning. pls i need a page (s.o.s.) to learn more about masking Thomas, a step by step, if you can give me some url i will be really happy. tnx. for all Thomas. ps: cannot use switch draw, flip, do ... etc.... there is 6 writes to playfield, i cannot avoid it. Edited March 26, 2018 by zilog_z80a Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.