One-line kernel strategies

+Karl G · June 28, 2020

How much is possible to do in a single line for a one-line kernel? It seems like the conditional code to display objects eats up the available cycles pretty quickly. What tips are there to squeeze in as much as possible? I know about using VDEL* to be able to write to one of the player registers and the ball register during the visible screen.

+SpiceWare · June 28, 2020

In Stay Frosty I used a mask to draw the snowman. I cover it here. Tradeoff for the mask is you use extra RAM and extra ROM.

The other thing I did was to use zones and within each zone the fireballs appeared in a fixed position. By using zero padded graphics they are drawn using just 8 cycles. Tradeoff of padded graphics is you use extra ROM:

    lda (imageptr),y
    sta GRP1

FireballImages:
    .word BlankGraphic
    .word FireballLargeA-6
    .word FireballLargeB-6
    .word FireballLargeC-6
    .word FireballLargeD-6
    .word FireballSmallA-6
    .word FireballSmallB-6
    .word FireballSmallC-6
    .word FireballSmallD-6
    
FireballSmallA:
        .byte zz__XXXXX_ ; 17
        .byte zz_XXXXX__ ; 16
        .byte zz_X_X_X__ ; 15
        .byte zz_X_X_X__ ; 14
        .byte zz_X_X_X__ ; 13
        .byte zz_XXXXX__ ; 12
        .byte zz_XXXX___ ; 11
        .byte zz_XXXX__X ; 10
        .byte zz_XXXX___ ;  9
        .byte zz_XXXX___ ;  8
        .byte zz__X_X___ ;  7
        .byte zz__X_____ ;  6
        .byte zz___X_X__ ;  5
        .byte zz________ ;  4  zero padding starts here
        .byte zz________ ;  3
        .byte zz________ ;  2
        .byte zz________ ;  1
        .byte zz________ ;  0
        .byte zz________ ; -1
        .byte zz________ ; -2
        .byte zz________ ; -3
        .byte zz________ ; -4

FireballSmallB:        
        .byte zz__XXXX__ ; 17
        .byte zz_XXXXXX_ ; 16
        .byte zz_X_X_X__ ; 15
        .byte zz_X_X_X__ ; 14
        .byte zz_X_X_X__ ; 13
        .byte zz_XXXXX__ ; 12
        .byte zz__XXXX__ ; 11
        .byte zz__XXX___ ; 10
        .byte zz__XXX___ ;  9
        .byte zz___X____ ;  8
        .byte zz________ ;  7
        .byte zz________ ;  6
        .byte zz________ ;  5
        .byte zz___X__X_ ;  4
        .byte zz___X____ ;  3
        .byte zz____X___ ;  2
        .byte zz________ ;  1 zero padding starts here
        .byte zz________ ;  0
        .byte zz________ ; -1
        .byte zz________ ; -2
        .byte zz________ ; -3
        .byte zz________ ; -4      

; first part of FrostyMask is used when there is not a fireball to draw
BlankGraphic
        repeat 50 ; 148-26
        .byte 0
        repend
        repeat 25
        .byte $ff
        repend
FrostyMask
        repeat 50 ; 148-26
        .byte 0
        repend

+Karl G · June 28, 2020

Thanks. I had contemplated using massive amounts of ROM by having the player graphics and ENA** values be surrounded by (arena height - graphic height) zeros on each side, position the pointers accordingly, and then just use the same scan line counter as an index for everything, and write to GRP*/ENA** unconditionally every line.

I understand the masking approach, but I don't understand why you use the value of 50 for the zeros on each side. Does that have to do with the height of your zones? If you weren't using zones, would the zero regions be (arena height - graphic height) on each side of the (graphic height) region of $FF?

CPC464Kid · June 28, 2020

I also used a mask for the Snakes game I'm currently writing. Please see source code on github:

https://github.com/RobinSergeant/2600-Snakes

I have an 8 line kernel, but it uses so many cycles manipulating the playfield to draw the snake that I didn't have time for conditions. Hence, the snake's head is drawn with a mask (the mask is 0 when the head should not be visible.). Most of the RAM is needed to store the snake data as well and so I ended up storing temporary variables like the mask on top of my stack ? This actually works quite well as I cannot call any subroutines in the kernel section anyway!

Something else you can do is try to make use of every spare cycle towards the end of the scanline to set things up for the next line. Pre-load all your registers etc. I did most of my logic here because my game doesn't use much of the screen. So using less of the horizontal real estate is another good option. It gives you more times to set things up at the start and free time at the end of each line.

Not using WSYNC also helps save 3 cycles when things get really tight.

+SpiceWare · June 29, 2020

8 hours ago, Karl G said:

I understand the masking approach, but I don't understand why you use the value of 50 for the zeros on each side. Does that have to do with the height of your zones?

Yes, the mask ($FF) had to be padded on either side with enough zeros to support various Y values as the snowman moved up/down thru the zone.

Quote

If you weren't using zones, would the zero regions be (arena height - graphic height) on each side of the (graphic height) region of $FF?

Yes - though be warned you'll most likely end up crossing a page boundary on a number of scanlines which will add 1 more cycle to the AND (MaskPtr),Y instruction.

DEBRO · June 29, 2020

Hi there,

On 6/28/2020 at 11:00 AM, Karl G said:

What tips are there to squeeze in as much as possible?

Constant cycle counting.

tokumaru · June 30, 2020

I feel like a true one line kernel, where you do EVERYTHING in a single line, would be way too limiting, but you can have a longer kernel, spreading tasks over multiple lines, and still update the graphics every line to get the most out of the TIA's resolution. If you divide the screen into multiple regions, you can prepare the data for the next region (graphics and mask pointers, counters, etc.) over the course of several lines while drawing the graphics of the current region. Or you can even jump between different kernels that do different tasks depending on what you need to display in each region.

You mentioned the conditional code to display objects as a big concern, but the biggest kernel killer IMO are the asymmetrical playfields. If you're doing that, there will be barely any time left for anything else.

JeremiahK · July 1, 2020

In my nyancat kernel, I also split the screen into zones (7 rows, each 14 lines high, and separated by 5 lines, 128 lines for the whole display).

I wanted the cat to be able to move "smoothly" from row to row (it can be drawn halfway between rows). For this reason, the cat kernel uses 2 rows. The other 5 don't try to draw the cat at all, so I was able to significantly decrease the amount of padding needed in the graphics.

In the most complex part of the kernel, I am reading graphics values 8 times, and writing to TIA registers 11 times, per line, if I recall. I used every one of the 76 cycles per line, and all the registers. The stack pointer was used to hold a color value for one of the sprites, since it can be loaded into X faster than RAM.

Note that the time-sensitive areas of the kernel must not cross page boundaries, unless the extra cycle is accounted for.

+Karl G · July 5, 2020

Thanks everyone for the advice. Another tip I need to remind myself of is to optimize for your specific need instead of for the general case.

Anyway, my solution uses 0s on each side of the graphics instead of a separate masking step, with two different zones to avoid crossing pages. I was able to draw all 5 objects as well as write to all 3 playfield registers (symmetric playfield), but no color changes. I use the same y register index for all objects, and set the pointers beforehand so that that's valid for each of them.

This will work fine for that idea I have in mind at the moment. Here's the kernel code in case anyone else might find it useful:

    ldy #83
    lda (PF1Ptr),y
    sta PF1
    lda #0
    sta WSYNC       ; 3     (0)
    sta VBLANK      ; 3     (3)
    jmp ____kernel_entry ;3 (6)

    align $100

KernelLoop
    lda (PF1Ptr),y  ; 5     (69)
    sta.a PF1       ; 4     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
____kernel_entry
    lda (M1Ptr),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr),y   ; 5     (51)
    lda (M0Ptr),y   ; 5     (56)   
    sta Temp        ; 3     (59)
    dey             ; 2     (61)
    bpl KernelLoop  ; 2/3   (63/64)

    ldy #83         ; 2     (65)

KernelLoop2
    lda (PF1Ptr2),y  ; 5     (70)
____kernel_bottom_entry
    sta PF1       ; 3     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
    lda (M1Ptr2),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr2),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr2),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr2),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr2),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr2),y   ; 5     (51)
    lda (M0Ptr2),y   ; 5     (56)   
    sta.a Temp        ; 4     (60)
    dey             ; 2     (62)
    bpl KernelLoop2 ; 2/3   (64/65)
    
    sta WSYNC
    lda #0
    sta GRP0
    sta ENAM0
    sta ENAM0
    sta ENABL
    sta GRP1
    sta PF0
    sta PF1
    sta PF2

Edit: It seems that the spoiler tage and the code tag don't play nicely together.

JeremiahK · July 5, 2020

2 hours ago, Karl G said:

Edit: It seems that the spoiler tage and the code tag don't play nicely together.

For larger snippets of code, I have started simply saving it to a file, and uploading as an attachment.

Kevin McGrath · July 15, 2020

On 7/5/2020 at 9:42 AM, Karl G said:
Thanks everyone for the advice. Another tip I need to remind myself of is to optimize for your specific need instead of for the general case.

Anyway, my solution uses 0s on each side of the graphics instead of a separate masking step, with two different zones to avoid crossing pages. I was able to draw all 5 objects as well as write to all 3 playfield registers (symmetric playfield), but no color changes. I use the same y register index for all objects, and set the pointers beforehand so that that's valid for each of them.

This will work fine for that idea I have in mind at the moment. Here's the kernel code in case anyone else might find it useful:
    ldy #83
    lda (PF1Ptr),y
    sta PF1
    lda #0
    sta WSYNC       ; 3     (0)
    sta VBLANK      ; 3     (3)
    jmp ____kernel_entry ;3 (6)

    align $100

KernelLoop
    lda (PF1Ptr),y  ; 5     (69)
    sta.a PF1       ; 4     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
____kernel_entry
    lda (M1Ptr),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr),y   ; 5     (51)
    lda (M0Ptr),y   ; 5     (56)   
    sta Temp        ; 3     (59)
    dey             ; 2     (61)
    bpl KernelLoop  ; 2/3   (63/64)

    ldy #83         ; 2     (65)

KernelLoop2
    lda (PF1Ptr2),y  ; 5     (70)
____kernel_bottom_entry
    sta PF1       ; 3     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
    lda (M1Ptr2),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr2),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr2),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr2),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr2),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr2),y   ; 5     (51)
    lda (M0Ptr2),y   ; 5     (56)   
    sta.a Temp        ; 4     (60)
    dey             ; 2     (62)
    bpl KernelLoop2 ; 2/3   (64/65)
    
    sta WSYNC
    lda #0
    sta GRP0
    sta ENAM0
    sta ENAM0
    sta ENABL
    sta GRP1
    sta PF0
    sta PF1
    sta PF2
Edit: It seems that the spoiler tage and the code tag don't play nicely together.

Super minor comment here, but it looks like you have a left over illegal/non-standard opcodes that you're not actually using the benefits of in your kernel (left over code perhaps?): LAX (load the accumulator and the X register), specifically the "lax (P1Ptr),y" and "lax (P1Ptr2),y" instructions. Those both look like they should be LDX instructions, as you're trashing the accumulator right after each LAX instruction anyway and use X for a store later. It won't change your timing or anything, just gets rid of a couple non-standard instructions that aren't being used anyway.

I do like your perfect 76 cycle timing though, very nice! ?

+Karl G · July 15, 2020

8 minutes ago, Kevin McGrath said:

Super minor comment here, but it looks like you have a left over illegal/non-standard opcodes that you're not actually using the benefits of in your kernel (left over code perhaps?): LAX (load the accumulator and the X register), specifically the "lax (P1Ptr),y" and "lax (P1Ptr2),y" instructions. Those both look like they should be LDX instructions, as you're trashing the accumulator right after each LAX instruction anyway and use X for a store later. It won't change your timing or anything, just gets rid of a couple non-standard instructions that aren't being used anyway.

I do like your perfect 76 cycle timing though, very nice! ?

Are you saying I was LAX in my coding?

Actually I do make use of it to load A and X, then just A, and I use the X value towards the top of my loop.

+splendidnut · July 15, 2020

The operation LDX (addr),Y doesn't exist as a standard 6502 instruction: http://www.6502.org/tutorials/6502opcodes.html#LDX

Kevin McGrath · July 15, 2020

Just now, splendidnut said:

The operation LDX (addr),Y doesn't exist as a standard 6502 instruction: http://www.6502.org/tutorials/6502opcodes.html#LDX

Oh damnit, you're right of course and I'm an idiot! ? I must've had "LDA (zp),y" stuck in head. Sorry about that!

+splendidnut · July 15, 2020

No worries, it happens.

I usually get tripped up with the limited STX/STY instructions and the lack of instructions supporting ZP,Y addressing mode.

One-line kernel strategies

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members