Advice on a masking kernel

TROGDOR · August 3, 2008

8k ROMs are a thing of beauty. Rather then spending all my time optimizing a 4k ROM, I can spend my time writing features, and not worrying as much about it being super optimized. It also allows me to try some ROM-intensive techniques to speed up my kernel.

In a recent post, vdub_bobby mentioned a nice masking technique to quickly draw a player sprite:

   lda (GfxPtr),Y
  and (MaskPtr),Y
  sta GRP0;+13 cycles
Pros: Still pretty fast. Only need to pad your Mask with zeroes, so doesn't use as much space.

Cons: Even padding one table with zeroes is a lot of wasted ROM

This code can even be used with animated sprites. If you set GfxPtr correctly, it will load garbage while the mask is on, and then load the proper graphics when the mask is off. So you can point GfxPtr to different animation cells on different frames, as long as their size matches the mask size. And you don't even have to worry about zero padding.

The problem I'm seeing with this implementation is timing. If either of the indirection calls crosses a page boundary, the instruction will take 6 cycles instead of 5 cycles. This has two negative implications. The algorithm will potentially take 15 cycles instead of 13, and you'll have to use a WSYNC because the time can vary.

The only way I see this as avoidable is if you use half vertical resolution. For example, you could use 88 zeros, then 8 player graphics, then another 88 zeros. Same with the mask.

In my case, I'm using full vertical resolution. That's 184 zeros, 8 mask bytes, and another 184 zeros, which will cross a page boundary at some point. The only way to avoid the page boundary is to use 3 pages:

184 zeros

8 mask

64 zeros

8 mask

184 zeros

124 zeros

8 mask

124 zeros

That eats 768 bytes of ROM, but it will allow you to pick a mask that spans 192 bytes and never crosses a page boundary. (Correct me if I'm wrong on this, or if there is an easier way to do it.)

The other problem is the animated sprites. There's no way I'm going to try to zero pad all my animations, so I'm going to live with the fact that a page boundary may be crossed when loading the player graphics. This will force me to use a WSYNC. Since I've already taken that hit, I may as well allow the boundary crossing on the mask as well. So my mask becomes:

184 zeros

8 mask

184 zeros

That's 376 bytes wasted. I can live with that in an 8k ROM. So the end result is I have a variable time graphics algorithm that will take at most 15 cycles, require a WSYNC, and will burn 376 bytes of ROM on a mask. Can anyone offer any optimization on this, or is this as good as it gets?

Edited August 8, 2008 by TROGDOR

+batari · August 3, 2008

If you are resorting to a WSYNC to compensate for page crossing, you are at best using 18 cycles for your routine now. I start to wonder if the masking method is worth it now, as there are non-masking, constant cycle drawing methods that may fit in 18 cycles, without wasting vast areas of ROM.

Cybergoth · August 3, 2008

Well, it depends on how cycle exact the rest of the timing has to be and lots of other factors. Of course no matter how many masked objects you use, you only need to WSYNC once per line. And a lot of my kernels could afford that (or needed syncing for other reasons anyway )

Also, even when doing a full resolution screen you can possibly still split it into a top and a bottom half or even more zones.

TROGDOR · August 3, 2008

I've considered the alternative, but it eats even more cycles:

   LDA #SPRITEHEIGHT; 2
  DCP SpriteTemp; 5
  BCC SkipDraw; 2
  LDA (GfxPtr),Y; 5
  STA GRP0; 3
ReturnFromSkipDraw;+17 cycles

This will work if you already have 0 stored in GRP0, and if you pad an extra 0 at the end of the player graphics. But, this won't work using VDEL. I need to write to GRP0 even when it's zero, to allow VDEL to work as expected. But, if I write GRP0, I'll have to make the SkipDraw section longer.

SkipDraw
; 3 cycles from the previous branch.  Total must be 12 to sync with kernel code.
LDA #0; 2
STA GRP0; force 4 cycle absolute.
JMP ReturnFromSkipDraw; 3

   LDA #SPRITEHEIGHT; 2
  DCP SpriteTemp; 5
  BCC SkipDraw; 2
  LDA (GfxPtr),Y; 5
  STA GRP0; 3
  NOP; 2
ReturnFromSkipDraw;+19 cycles

So I'm looking at 19+19 = 38 cycles for 2 sprites with skipdraw, or 15+15+3 = 33 cycles for 2 sprites with masking. Are there any other possibilities?

Ah, I've thought of one improvement. I only need VDEL on one of the two sprites. So I can squeeze it down to 19+17 = 36 cycles for two skipdraw sprites.

If I understand VDEL correctly, I can set VDEL on P0, store the P0 graphic in the VDEL buffer, then write P1 (in non-VDEL mode), and that will write out P0 and P1 simultaneously, during horizontal blank.

Edit:

I've shaved off another cycle:

SkipDraw
 ; 3 cycles from the previous branch.  Total must be 11 to sync with kernel code.
  LDA #0; 2
  STA GRP0; 3
  JMP ReturnFromSkipDraw; 3

  LDA #SPRITEHEIGHT; 2
  DCP SpriteTemp; 5
  BCC SkipDraw; 2
  LDA (GfxPtr),Y; 5
  STA GRP0; 4 (force absolute mode)
ReturnFromSkipDraw;+18 cycles

So now I'm down to 17+18 = 35 cycles for 2 sprite skipdraw with VDEL.

Edited August 3, 2008 by TROGDOR

+batari · August 3, 2008

If you are using a constant sprite height of 8, and you can be sure that bit 3 of the X register is always 0, you can use SAX to put a zero in GRP0 without needing the load.

Or if you post your whole kernel, some programmers here may be able to suggest something else

supercat · August 3, 2008

The only way I see this as avoidable is if you use half vertical resolution.

There are two ways to get around that:

-1- Subdivide the screen into 'zones', and use different pointers for each zones. One advantage of this approach is that if you use two zones of 96 pixels each you can make objects wrap cleanly from the top of the screen to the bottom and be partially-visible in both places. I know of no other practical means to do that.

-2- Use a two-line kernel that loads alternating sets of sprite data every line. Simplest style:

  ldy #95
lp:
 sta WSYNC; As though you're not going to be doing other stuff in kernel...
 lda (sprite1a),y
 and (mask1),y
 sta GRP0
 lda (color1a),y
 sta COLUP0
 sta WSYNC; As though you're not going to be doing other stuff in kernel...
 lda (sprite1b),y
 and (mask1),y
 sta GRP0
 lda (color1b),y
 sta COLUP0
 dey
 bpl lp

The interleaving (not sure about the masking) is the approach used in E.T. and it works very well. You'll probably have to write some utilities to separate out the even and odd lines of sprite data, but you get full-resolution color sprites movable in single-line increments with one zero byte between sprites. If you have some sprites that only need half the resolution, you can use those too and still move them in single-line increments (again, with one zero between sprites).

vdub_bobby · August 5, 2008

Edit:

I've shaved off another cycle:

SkipDraw
; 3 cycles from the previous branch.  Total must be 11 to sync with kernel code.
  LDA #0; 2
  STA GRP0; 3
  JMP ReturnFromSkipDraw; 3

  LDA #SPRITEHEIGHT; 2
  DCP SpriteTemp; 5
  BCC SkipDraw; 2
  LDA (GfxPtr),Y; 5
  STA GRP0; 4 (force absolute mode)
ReturnFromSkipDraw;+18 cycles

So now I'm down to 17+18 = 35 cycles for 2 sprite skipdraw with VDEL.

A (possibly) better way is to use the variation I call DoDraw:

   lda #SPRITEHEIGHT
  dcp SpriteTemp
  bcs DoDraw
  lda #0
  .byte $2C
DoDraw
  lda (GfxPtr),Y
  sta GRP0

Won't work in some of the more exotic bankswitch schemes (including the SuperCharger!) but it gets you a constant 18 cycles and writes to the graphics registers every time. Also doesn't require any outside branches, which can become a HUGE pain to manage in a complicated kernel.

Edited August 5, 2008 by vdub_bobby

TROGDOR · August 8, 2008

I've spent the last couple days working on the kernel for The Battle of Midway. Here's the evolution.

All three of the following kernels do the same thing. They display both planes, both missiles, and the bomb on every scanline, with full vertical resolution for both display and movement.

Kernel 1

The original kernel was crap, which I wrote 3 years ago. To its credit, it does actually work, but in a horribly inefficient way. When you see the code, you'll understand why I was itching to rewrite it.

NormalKernel
LDX #ENABL
TXS

LDX #184
LDY #0
LDA #0
STA Temp4
STA Temp5

;------------------------------------------------------------
JMP KernelLoop

align 256

SkipPlane0Draw
LDA #0			;2 (+1 from previous branch.)
STA Temp4		;3 Update the buffer.
NOP			;2
NOP			;2
NOP			;2
NOP			;2
BEQ ContinueKernel0	;3 Block = 17 cycles.

SkipPlane1Draw
LDA #0			;2 (+1 from previous branch.)
STA Temp5		;3 Update the buffer.
NOP			;2
NOP			;2
NOP			;2
NOP			;2
BEQ ContinueKernel1	;3 Block = 17 cycles.

KernelLoop
CPX BombYHi	 	;3 Preload the BombY comparison.
STA WSYNC		;3

; New scanline starts here.
PHP			;3 sets ENABL
STA GRP1		;3
STY GRP0		;3

CPX Bullet1Y 	;3
PHP			;3 sets ENAM1
CPX Bullet0Y 	;3
PHP			;3 sets ENAM0, Total 21

; Reset the stack
TXA			;2
LDX #ENABL		;2
TXS			;2
TAX			;2 Block = 8 cycles.

; Determine P0 graphics for next line.
SEC			;2
;	TXA			;2 A now holds scanline.  Careful!  This is implied from above.
SBC Plane0PosYHi	;3
ADC #9			;2
BCC SkipPlane0Draw	;2 or 3, Block = 9

TAY			;2
LDA (Plane0GfxLo),Y	;5
STA Temp4		;3 This is the buffer for the next player graphics.

INY			;2
LDA (Plane0GfxLo),Y	;5, Block = 17

ContinueKernel0
LDY Temp5		;3 Pull the graphics for P1 from the buffer.

DEX			;2
CPX BombYHi	 	;3 Preload the BombY comparison.
STA WSYNC		;3 Block = 11, Total = 66

;----------------------------------------
; Next scanline starts here.
PHP			;3 sets ENABL
STA GRP0		;3
STY GRP1		;3

CPX Bullet1Y	 	;3
PHP			;3 sets ENAM1
CPX Bullet0Y	 	;3
PHP			;3 sets ENAM0, Total 21

; Reset the stack
TXA			;2
LDX #ENABL		;2
TXS			;2
TAX			;2 Block = 8 cycles.

; Determine P1 graphics for next line.
SEC			;2
;	TXA			;2 A now holds scanline.  Careful!  Implied from stack hack.
SBC Plane1Y		;3
ADC #9			;2
BCC SkipPlane1Draw	;2 or 3, Block = 9

TAY			;2
LDA (Plane1GfxLo),Y	;5
STA Temp5		;3 This is the buffer for the next player graphics.

INY			;2
LDA (Plane1GfxLo),Y	;5, Block = 17

ContinueKernel1
LDY Temp4		;3 Pull the graphics for P1 from the buffer.

DEX			;2
TitleKernelEntry
CPX #ShipZone		;3
;	CPX BombYHi	 	;3 Implied above.
;	STA WSYNC		;3 Impiled above.
BNE KernelLoop		;2 or 3, Block  = 13, Total = 71

;------------------------------------------------------------

This beast would load up P0 graphics for the current line and the next line, storing the next line in a temp var. On the next line, it would do the same thing with P1. In this manner, it could write both P0 and P1 on every line. It barely fit in the time constraints.

Kernel 2

It only took me 4 years, but I finally fully understand how the VDEL buffers work and are meant to be used. Kernel 2 is a rewrite of the first kernel, but making proper use of VDEL for P0. This removes the need for the temp variables, and saves enough time that I was able to squeeze it down to a 1LK. It could be optimized further using the DCP skipdraw, which would shave off 6 more cycles. It currently uses all 76 cycles, although 3 of those cycles are a NOP. Is there a cleaner way to declare a 3 cycle NOP in dasm?

NormalKernel
LDX #ENABL
TXS

LDA #1
STA VDELP0		;Enable VDEL for P0, so it can write early in the line.
LDA #0
STA VDELP1		;Enable VDEL for P0, so it can write early in the line.

LDX #183
LDY #0
LDA #0
STA Temp4
STA Temp5

;------------------------------------------------------------
JMP KernelLoopEntry

align 256

SkipPlane0Draw
			;3 from branch
LDA Temp4		;3 cycle hack to load 0.
JMP ContinueKernel0	;3 Block = 9 cycles.

SkipPlane1Draw
			;3 from branch
LDA Temp4		;3 cycle hack to load 0.
JMP ContinueKernel1	;3 Block = 9 cycles.

;------------------------------------------------------------
KernelLoopEntry
STA WSYNC

KernelLoop
; New scanline starts here.
CPX BombYHi	 	;3
PHP			;3 sets ENABL
STA GRP1		;3

CPX Bullet1Y	 	;3
PHP			;3 sets ENAM1
CPX Bullet0Y 		;3
PHP			;3 sets ENAM0, Total 21

; Reset the stack
TXA			;2
LDX #ENABL		;2
TXS			;2
TAX			;2 Block = 8 cycles. (29)

; Determine P0 graphics for next line.
SEC			;2
;	TXA			;2 A now holds scanline.  Careful!  This is implied from above.
SBC Plane0PosYHi	;3
ADC #9			;2
BCC SkipPlane0Draw	;2 or 3,

TAY			;2
LDA (Plane0GfxLo),Y	;5
ContinueKernel0
STA GRP0		;3 Block = 19 (48)


SEC			;2
TXA			;2 A now holds scanline.
SBC Plane1Y		;3
ADC #9			;2
BCC SkipPlane1Draw	;2 or 3, Block = 9

TAY			;2
LDA (Plane1GfxLo),Y	;5 Block = 18 (66)

ContinueKernel1

DEX			;2
TitleKernelEntry
.byte #$04		;3 cycle NOP
.byte #$00
CPX #ShipZone		;2
BNE KernelLoop		;2 or 3, Block  = 10, Total = 76

Kernel 3

This kernel uses quick load masking. It does the same thing in only 60 cycles.

NormalKernel
LDX #ENABL		;2
TXS			;2

LDA #1			;2
STA VDELP0		;3 Enable VDEL for P0, so it can write early in the line.
LDA #0			;2
STA VDELP1		;3 Disable VDEL for P1.

LDX #ENABL		;2
LDY #184		;2
LDA #0			;2

;------------------------------------------------------------
JMP KernelLoop		;3 This jump is necessary because of the align.

align 256

KernelLoop
CPY BombYHi	 	;3
STA WSYNC		;3

; New scanline starts here.
PHP			;3 sets ENABL
STA GRP1		;3

CPY Bullet1Y	 	;3
PHP			;3 sets ENAM1
CPY Bullet0Y 		;3
PHP			;3 sets ENAM0, Total 18

; Reset the stack
TXS			;2 (20)

LDA (Plane0GfxLo),Y	;~6
AND (MaskGfx0Lo),Y	;~6
STA GRP0		;3 Block = 15 (35)

LDA (Plane1GfxLo),Y	;~6
AND (MaskGfx1Lo),Y	;~6 Block = 12 (47)

DEY			;2
TitleKernelEntry
CPY #ShipZone		;2
BNE KernelLoop		;2 or 3, Block = 7 (54)
			;+6 from above (60)

One interesting thing about this kernel is it uses the Y index for everything. This means I can preload X with #ENABL, and it only takes me one instruction to reset the stack hack. Even with the WSYNC I've got 16 more cycles to burn, which I could use to add multiline color or a variable height ball or missile.

The trickiest part about using the mask is correctly preloading the mask and graphics vectors. Here's the support code to set up the vectors:

UpdateSpriteDirs
LDX Plane0Dir
LDA SpriteLookupTable,X
SEC
SBC Plane0Y
STA Plane0GfxLo

LDX Plane1Dir
LDA SpriteLookupTable,X
SEC
SBC Plane1Y
STA Plane1GfxLo

LDY #>Plane00
STY Plane0GfxHi

LDY #>Plane00
STY Plane1GfxHi

...

LDA #>KernelMask	; Set up the mask for the normal kernel.
STA MaskGfx0Hi
STA MaskGfx1Hi

LDA #<KernelMask
SEC
SBC Plane0Y
STA MaskGfx0Lo

LDA #<KernelMask
SEC
SBC Plane1Y
STA MaskGfx1Lo

...

SpriteLookupTable
.byte #(<Plane00 + 
.byte #(<Plane01 + 
.byte #(<Plane01 + 
.byte #(<Plane02 + 
.byte #(<Plane02 + 
.byte #(<Plane02 + 
.byte #(<Plane03 + 
.byte #(<Plane03 + 
.byte #(<Plane04 + 
.byte #(<Plane05 + 
.byte #(<Plane05 + 
.byte #(<Plane06 + 
.byte #(<Plane06 + 
.byte #(<Plane06 + 
.byte #(<Plane07 + 
.byte #(<Plane07 + 

...

align 256

(184 bytes of whatever data you want to put here.)

Plane00
dc.b #%00111000
dc.b #%00010000
dc.b #%00111000
dc.b #%01111100
dc.b #%11111110
dc.b #%00010000
dc.b #%00010000
dc.b #%00010000

Plane01 etc...

The +8 in the LookupTable sets the origin at the bottom of the target graphics data. For the plane graphics data, you can put anything in the first 184 bytes of the page, but Plane00 has to be at least 184 bytes away from the start of the page. Also, be careful that you don't have so many animation cells that you go into the next page. For this game, I had 8 lines x 8 graphics = 64 bytes. 184 + 64 = 248, so I'm safe.

When setting the mask and graphics vectors, the vector high bytes are static. The vector low bytes are calculated by subtracting their vertical location from the mask origin label.

And the mask:

;-----------------------------------------------------
align 256

.byte #0
.byte #0

... repeat for a total of 184 zeros.

.byte #0
.byte #0
.byte #$FF
.byte #$FF
.byte #$FF
.byte #$FF
.byte #$FF
.byte #$FF
.byte #$FF
.byte #$FF
KernelMask
.byte #0
.byte #0

... repeat for another 184 zeros.

.byte #0
.byte #0

The key here is that the mask label goes right after the $FF "enable" bytes. The align is necessary to make sure the high byte of your mask vector points to the correct page.

Here's the latest Battle of Midway zip with the source, which now uses the masking kernel. It's almost a playable game now. My next blog update should have a playable version.

tbom19b.zip

Edited August 8, 2008 by TROGDOR

Cybergoth · August 8, 2008

Is there a cleaner way to declare a 3 cycle NOP in dasm?

Using the common macro.h it's as simple as:

SLEEP 3

vdub_bobby · August 12, 2008

And doesn't DASM support

   NOP 0

?

Or something similar?

Edited August 12, 2008 by vdub_bobby

Cybergoth · August 12, 2008

I think that works. Maybe it understands even DOP, though I think that particular mnemonic isn't as popular as for example LAX.

Robert M · October 3, 2008

-2- Use a two-line kernel that loads alternating sets of sprite data every line. Simplest style:
  ldy #95
lp:
 sta WSYNC; As though you're not going to be doing other stuff in kernel...
 lda (sprite1a),y
 and (mask1),y
 sta GRP0
 lda (color1a),y
 sta COLUP0
 sta WSYNC; As though you're not going to be doing other stuff in kernel...
 lda (sprite1b),y
 and (mask1),y
 sta GRP0
 lda (color1b),y
 sta COLUP0
 dey
 bpl lp
The interleaving (not sure about the masking) is the approach used in E.T. and it works very well. You'll probably have to write some utilities to separate out the even and odd lines of sprite data, but you get full-resolution color sprites movable in single-line increments with one zero byte between sprites. If you have some sprites that only need half the resolution, you can use those too and still move them in single-line increments (again, with one zero between sprites).

That's the approach I am using in the Littleman kernel. It cuts the masking table size in half, but it also splits your sprite data into two tables which is painful to generate and maintain without some sort of editor tool.

Cheers!

Thomas Jentzsch · October 3, 2008

MaskedDraw should be still faster, but maybe have a look at SwitchDraw, which is a bit faster than SkipDraw.

Cybergoth · October 3, 2008

it also splits your sprite data into two tables which is painful to generate and maintain without some sort of editor tool.

I indeed hated doing this for my crazy balloon

vdub_bobby · October 6, 2008

it also splits your sprite data into two tables which is painful to generate and maintain without some sort of editor tool.

I indeed hated doing this for my crazy balloon

Yeah, I hate it too but it is so useful I find it hard to avoid - it's almost a necessity if you use SwitchDraw.

I used it in M-4, Squish 'Em, and Elevators Amiss. Didn't use it in Reindeer Rescue, BLiP Football, and Go Fish!

zilog_z80a · March 24, 2018

hi, it's my imagination or when i put just one time in a scanline this... each instruction takes 5 cycles

lda (Ggraphic0),y  ;5 cycles
lda (Graphic0Mask) ;5 cycles

.

but.... in a single scanline IF I PUT THE SAME TWICE... THEN EVERY INSTRUCTION (or three) TAKES 6 CYCLES, WHY IS IT?

lda (Ggraphic0),y  ;6 cycles
lda (Graphic0Mask) ;6 cycles

lda (Ggraphic1),y  ;6 cycles
lda (Graphic1Mask) ;6 cycles

.

cheers.

ps: there is no crossing pages in my graphics all of them use ALIGN 256 and have less than 256 lines.

Edited March 25, 2018 by zilog_z80a

Thomas Jentzsch · March 24, 2018

Assuming you have ",y" after each instruction only a page crossing can lead to 6 cycles. Are your pointers pointing a the graphics page or is adding Y to the pointers crossing a page?

zilog_z80a · March 25, 2018

Assuming you have ",y" after each instruction only a page crossing can lead to 6 cycles. Are your pointers pointing a the graphics page or is adding Y to the pointers crossing a page?

.

Hi Thomas, this is the code

.

   processor 6502
   include "vcs.h"
   include "macro.h"

JOY0_RIGHT equ 128
JOY0_LEFT  equ 64
JOY0_DOWN  equ 32
JOY0_UP    equ 16

   seg.u variables
   org $80

mask0 ds 2
mask1 ds 2

Player0PTR ds 2
Magnet_PTR ds 2

Player0X ds 1
Player0Y ds 1

Player1X ds 1
Player1Y ds 1

time_of_jump ds 1

   seg code
   org $F000
Start
   CLEAN_START   
   
   LDA #$0      
   STA COLUBK   
   LDA #$40
   STA COLUPF
   lda #$1E
   sta COLUP0
   sta COLUP1
   lda #0
   sta time_of_jump

   lda #140
   sta Player0Y
   lda #70
   sta Player0X
   
   lda #14
   sta Player1Y
   
MainLoop
   LDA  #2
   STA  VSYNC   
   STA  WSYNC   
   STA  WSYNC    
   STA  WSYNC
   LDA #0
   STA VSYNC

   jsr VerticalBlank
   jsr Test0Left				
   jsr drawscreen
   jsr Boundries
   lda Player0X
   jsr HorizPositioning
   sta WSYNC
   sta HMOVE
   jsr do_overscan
   jmp MainLoop

VerticalBlank
   LDA  #46
   STA  TIM64T
   
   
   LDA Magnet_L
   SEC
   SBC Player1Y
   STA Magnet_PTR
   LDA Magnet_H
   SBC #0
   STA Magnet_PTR+1

   LDA Rectangle_L
   SEC
   SBC Player0Y
   STA Player0PTR
   LDA Rectangle_H
   SBC #0
   STA Player0PTR+1

   LDA #<Mask_magnet
   SEC
   SBC Player1Y
   STA mask1
   LDA #>Mask_magnet
   SBC #0
   STA mask1+1
 
   LDA #<Mask
   SEC
   SBC Player0Y
   STA mask0
   LDA #>Mask
   SBC #0
   STA mask0+1

   LDA #%10000000
   BIT CXPPMM
   BEQ NoCollision	
   LDA Player0X
   STA COLUPF	
NoCollision
;  STA CXCLR
   rts

   
drawscreen
waste_blank
   LDA INTIM   
   BNE waste_blank   
   sta VBLANK   

   LDY #63
kernel_0
   sta WSYNC
   dey		   
   bne kernel_0

   LDY #127
kernel_loop
   sta WSYNC
   lda (Player0PTR),y
   and (mask0),y
   ldX PLAYF0	
   stX PF0	    
   ldx PLAYF0	
   stx PF1	     
   sta GRP0	    
   ldx PLAYF0	
   stx PF2	    
   ldx PLAYF0	
   stx PF0	    
   ldx PLAYF0	
   stx PF1	    
   lda (Magnet_PTR),Y
   ldx PLAYF0   
   stx PF2	    
   and (mask1),y
   sta GRP1
   dey
   bne kernel_loop 

   LDA #37      
   STA TIM64T
   lda #2
   sta VBLANK
   rts
   

Test0Left
				lda #JOY0_LEFT
				bit SWCHA
				bne Test0Right
				dec Player0X
				
Test0Right
				lda #JOY0_RIGHT
				bit SWCHA
				bne boton
				inc Player0X
				

boton
				bit INPT4
				bmi cae1
				lda Player0Y
				cmp #1
				bne cae1
				ldx #24
JUMP_LENGHT
				inc Player0Y
				dex
				bne JUMP_LENGHT
cae1    		lda Player0Y
				cmp #1
				bne cae2
				jmp	J0Test_Over
cae2			dec Player0Y
J0Test_Over		rts
 ;------------------------------------------------
Boundries SUBROUTINE
				lax Player0X	
				cmp #158	
				bcc P0XLow	
				lda #158  	
				sta Player0X	
P0XLow
				cpx #6		
				bcs BTest_Over	
				lda #6  	
				sta Player0X	
BTest_Over			rts		

HorizPositioning SUBROUTINE
				sec
				sta WSYNC
.divideBy15
				sbc #15
				bcs .divideBy15
				eor #7
				asl
				asl
				asl
				asl
				sta HMP0
				sta RESP0
				rts

do_overscan   
OverScanWait
   LDA INTIM
   BNE OverScanWait
   rts

Magnet_L:
   .byte #<MAGNET_TABLE
Magnet_H:
   .byte #>MAGNET_TABLE
   
Rectangle_L:
   .byte #<Rectangle_table
Rectangle_H:
   .byte #>Rectangle_table

   ALIGN 256
MAGNET_TABLE:
   .byte #%11000011
   .byte #%11000011
   .byte #%00000000
   .byte #%11000011   
   .byte #%11000011
   .byte #%11000011
   .byte #%11000011
   .byte #%11100111
   .byte #%01111110
   .byte #%01111110
   .byte #%00111100
   .byte #%00011000


   align 256
        ds 90,  #0
Mask:   ds 37,  #$FF
        ds 90,  #0

   ALIGN 256
             ds 115, #0
Mask_magnet: ds 12,  #$FF
             ds 115, #0

   align 256
Rectangle_table: .byte  #0
                 ds 36, #$FF   
   ALIGN 256
PLAYF0	.byte #%10101010

   org $FFFC
   .word Start
   .word Start
   END

.

Attached RECTANGLE DAY SOURCE:

http://atari.rf.gd/RECTANGLE_DAY.asm

Edited March 25, 2018 by zilog_z80a

Thomas Jentzsch · March 25, 2018

If you look into the Stella debugger, when the screen has settled, Player0Ptr ($f4ff) and Magnet_PTR ($f1f2) are clearly causing a page fault. Y values are ranging from 0..127, so the low pointer byte must not become larger than $80.

I suppose you have to increase your 0 padding area for the masks.

zilog_z80a · March 25, 2018

If you look into the Stella debugger, when the screen has settled, Player0Ptr ($f4ff) and Magnet_PTR ($f1f2) are clearly causing a page fault. Y values are ranging from 0..127, so the low pointer byte must not become larger than $80.

I suppose you have to increase your 0 padding area for the masks.

Sorry Thomas, this answer is too advanced for me is there some explanation for dummies?

i can learn if you can please tell me in a more simple way or how to recognize this, i cannot understand why this is happening.

i swear i have opened my stella debugger but i'm not able to see this.

ty in advance.

Edited March 25, 2018 by zilog_z80a

Thomas Jentzsch · March 25, 2018

Do you know how to use Stella's debugger? It is very powerful and really makes problems transparent.

As for your problem, when a pointer points to e.g $f4ff, then an extra cycle due to target address page crossing will happen with any Y value (except 0).

This code won't work:

  LDA #>Mask
  SBC #0
  STA mask0+1

mask0+1 always has to be #>Mask, never one less. Because then you have that page crossing.

So this:

  LDA #<Mask
  SEC
  SBC Player0Y

must never underflow. This means the 0 padding before Mask must be as large as the maximum value for Player0Y. And then you need the same amount of padding after Mask. So this won't work for Y ranges larger ~128. Then you need a 2nd Mask table (less padding before but more after).

Or you have to look for an (slower) alternative to masked draw (e.g. SkipDraw)

Edited March 25, 2018 by Thomas Jentzsch

zilog_z80a · March 25, 2018

Do you know how to use Stella's debugger? It is very powerful and really makes problems transparent.

As for your problem, when a pointer points to e.g $f4ff, then an extra cycle due to target address page crossing will happen with any Y value (except 0).

This code won't work:
  LDA #>Mask
  SBC #0
  STA mask0+1
mask0+1 always has to be #>Mask, never one less. Because then you have that page crossing.

So this:
  LDA #<Mask
  SEC
  SBC Player0Y
must never underflow. This means the 0 padding before Mask must be as large as the maximum value for Player0Y. And then you need the same amount of padding after Mask. So this won't work for Y ranges larger ~128. Then you need a 2nd Mask table (less padding before but more after).

Or you have to look for an (slower) alternative to masked draw (e.g. SkipDraw)

Hi Thomas, thank you very much for this info and the time you has put for me to learn!!

I'm still a relatively newbie, so there are subjects that i haven't cover in stella or asm coding, with your info in mind now is a new brick in all this nice puzzle.

i have in mind to use sectors (or bands) and with this a little less padding, ok... and more memory... but i see this you have teached me...

amd thanks to you i know that at least i can use six writes to playfield and two sprites per line.

without this my project would be IMPOSSIBLE.

ty again for all, i will rethink anyway all this info.

cheers!!

Edited March 25, 2018 by zilog_z80a

zilog_z80a · March 25, 2018

Do you know how to use Stella's debugger? It is very powerful and really makes problems transparent.

As for your problem, when a pointer points to e.g $f4ff, then an extra cycle due to target address page crossing will happen with any Y value (except 0).

This code won't work:
  LDA #>Mask
  SBC #0
  STA mask0+1
mask0+1 always has to be #>Mask, never one less. Because then you have that page crossing.

So this:
  LDA #<Mask
  SEC
  SBC Player0Y
must never underflow. This means the 0 padding before Mask must be as large as the maximum value for Player0Y. And then you need the same amount of padding after Mask. So this won't work for Y ranges larger ~128. Then you need a 2nd Mask table (less padding before but more after).

Or you have to look for an (slower) alternative to masked draw (e.g. SkipDraw)

Hi, i did a kernel of 60 scanlines, and there is the same issue.

i think i must study more, i'm totally bump with this. i have near two years learning and learning.

pls i need a page (s.o.s.) to learn more about masking Thomas, a step by step, if you can give me some url i will be really happy.

tnx. for all Thomas.

ps: cannot use switch draw, flip, do ... etc.... there is 6 writes to playfield, i cannot avoid it.

Edited March 26, 2018 by zilog_z80a

Advice on a masking kernel

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members