Jump to content
IGNORED

One-line kernel strategies


Karl G

Recommended Posts

How much is possible to do in a single line for a one-line kernel? It seems like the conditional code to display objects eats up the available cycles pretty quickly. What tips are there to squeeze in as much as possible? I know about using VDEL* to be able to write to one of the player registers and the ball register during the visible screen.

Link to comment
Share on other sites

In Stay Frosty I used a mask to draw the snowman.  I cover it here.  Tradeoff for the mask is you use extra RAM and extra ROM.

 

The other thing I did was to use zones and within each zone the fireballs appeared in a fixed position. By using zero padded graphics they are drawn using just 8 cycles. Tradeoff of padded graphics is you use extra ROM:

 

    lda (imageptr),y
    sta GRP1

 

FireballImages:
    .word BlankGraphic
    .word FireballLargeA-6
    .word FireballLargeB-6
    .word FireballLargeC-6
    .word FireballLargeD-6
    .word FireballSmallA-6
    .word FireballSmallB-6
    .word FireballSmallC-6
    .word FireballSmallD-6
    
FireballSmallA:
        .byte zz__XXXXX_ ; 17
        .byte zz_XXXXX__ ; 16
        .byte zz_X_X_X__ ; 15
        .byte zz_X_X_X__ ; 14
        .byte zz_X_X_X__ ; 13
        .byte zz_XXXXX__ ; 12
        .byte zz_XXXX___ ; 11
        .byte zz_XXXX__X ; 10
        .byte zz_XXXX___ ;  9
        .byte zz_XXXX___ ;  8
        .byte zz__X_X___ ;  7
        .byte zz__X_____ ;  6
        .byte zz___X_X__ ;  5
        .byte zz________ ;  4  zero padding starts here
        .byte zz________ ;  3
        .byte zz________ ;  2
        .byte zz________ ;  1
        .byte zz________ ;  0
        .byte zz________ ; -1
        .byte zz________ ; -2
        .byte zz________ ; -3
        .byte zz________ ; -4

FireballSmallB:        
        .byte zz__XXXX__ ; 17
        .byte zz_XXXXXX_ ; 16
        .byte zz_X_X_X__ ; 15
        .byte zz_X_X_X__ ; 14
        .byte zz_X_X_X__ ; 13
        .byte zz_XXXXX__ ; 12
        .byte zz__XXXX__ ; 11
        .byte zz__XXX___ ; 10
        .byte zz__XXX___ ;  9
        .byte zz___X____ ;  8
        .byte zz________ ;  7
        .byte zz________ ;  6
        .byte zz________ ;  5
        .byte zz___X__X_ ;  4
        .byte zz___X____ ;  3
        .byte zz____X___ ;  2
        .byte zz________ ;  1 zero padding starts here
        .byte zz________ ;  0
        .byte zz________ ; -1
        .byte zz________ ; -2
        .byte zz________ ; -3
        .byte zz________ ; -4      

; first part of FrostyMask is used when there is not a fireball to draw
BlankGraphic
        repeat 50 ; 148-26
        .byte 0
        repend
        repeat 25
        .byte $ff
        repend
FrostyMask
        repeat 50 ; 148-26
        .byte 0
        repend

 

 

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

Thanks. I had contemplated using massive amounts of ROM by having the player graphics and ENA** values be surrounded by (arena height - graphic height) zeros on each side, position the pointers accordingly, and then just use the same scan line counter as an index for everything, and write to GRP*/ENA** unconditionally every line.

 

I understand the masking approach, but I don't understand why you use the value of 50 for the zeros on each side. Does that have to do with the height of your zones? If you weren't using zones, would the zero regions be (arena height - graphic height) on each side of the (graphic height) region of $FF?

Link to comment
Share on other sites

I also used a mask for the Snakes game I'm currently writing.  Please see source code on github:

 

https://github.com/RobinSergeant/2600-Snakes

 

I have an 8 line kernel, but it uses so many cycles manipulating the playfield to draw the snake that I didn't have time for conditions.  Hence, the snake's head is drawn with a mask (the mask is 0 when the head should not be visible.).  Most of the RAM is needed to store the snake data as well and so I ended up storing temporary variables like the mask on top of my stack ?  This actually works quite well as I cannot call any subroutines in the kernel section anyway!

 

Something else you can do is try to make use of every spare cycle towards the end of the scanline to set things up for the next line.  Pre-load all your registers etc.  I did most of my logic here because my game doesn't use much of the screen.  So using less of the horizontal real estate is another good option.  It gives you more times to set things up at the start and free time at the end of each line.

 

Not using WSYNC also helps save 3 cycles when things get really tight.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

8 hours ago, Karl G said:

I understand the masking approach, but I don't understand why you use the value of 50 for the zeros on each side. Does that have to do with the height of your zones?

 

Yes, the mask ($FF) had to be padded on either side with enough zeros to support various Y values as the snowman moved up/down thru the zone.  

 

Quote

If you weren't using zones, would the zero regions be (arena height - graphic height) on each side of the (graphic height) region of $FF?

 

Yes - though be warned you'll most likely end up crossing a page boundary on a number of scanlines which will add 1 more cycle to the AND (MaskPtr),Y instruction.

 

843586429_ScreenShot2020-06-28at8_06_46PM.thumb.png.d015ca0b897f30dae42a8e156739efd5.png

  • Like 1
Link to comment
Share on other sites

I feel like a true one line kernel, where you do EVERYTHING in a single line, would be way too limiting, but you can have a longer kernel, spreading tasks over multiple lines, and still update the graphics every line to get the most out of the TIA's resolution. If you divide the screen into multiple regions, you can prepare the data for the next region (graphics and mask pointers, counters, etc.) over the course of several lines while drawing the graphics of the current region. Or you can even jump between different kernels that do different tasks depending on what you need to display in each region.

 

You mentioned the conditional code to display objects as a big concern, but the biggest kernel killer IMO are the asymmetrical playfields. If you're doing that, there will be barely any time left for anything else.

Link to comment
Share on other sites

In my nyancat kernel, I also split the screen into zones (7 rows, each 14 lines high, and separated by 5 lines, 128 lines for the whole display).

I wanted the cat to be able to move "smoothly" from row to row (it can be drawn halfway between rows).  For this reason, the cat kernel uses 2 rows.  The other 5 don't try to draw the cat at all, so I was able to significantly decrease the amount of padding needed in the graphics.

In the most complex part of the kernel, I am reading graphics values 8 times, and writing to TIA registers 11 times, per line, if I recall.  I used every one of the 76 cycles per line, and all the registers.  The stack pointer was used to hold a color value for one of the sprites, since it can be loaded into X faster than RAM.

Note that the time-sensitive areas of the kernel must not cross page boundaries, unless the extra cycle is accounted for.

  • Like 1
Link to comment
Share on other sites

Thanks everyone for the advice. Another tip I need to remind myself of is to optimize for your specific need instead of for the general case.

 

Anyway, my solution uses 0s on each side of the graphics instead of a separate masking step, with two different zones to avoid crossing pages. I was able to draw all 5 objects as well as write to all 3 playfield registers (symmetric playfield), but no color changes. I use the same y register index for all objects, and set the pointers beforehand so that that's valid for each of them.

 

This will work fine for that idea I have in mind at the moment. Here's the kernel code in case anyone else might find it useful:

    ldy #83
    lda (PF1Ptr),y
    sta PF1
    lda #0
    sta WSYNC       ; 3     (0)
    sta VBLANK      ; 3     (3)
    jmp ____kernel_entry ;3 (6)

    align $100

KernelLoop
    lda (PF1Ptr),y  ; 5     (69)
    sta.a PF1       ; 4     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
____kernel_entry
    lda (M1Ptr),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr),y   ; 5     (51)
    lda (M0Ptr),y   ; 5     (56)   
    sta Temp        ; 3     (59)
    dey             ; 2     (61)
    bpl KernelLoop  ; 2/3   (63/64)

    ldy #83         ; 2     (65)

KernelLoop2
    lda (PF1Ptr2),y  ; 5     (70)
____kernel_bottom_entry
    sta PF1       ; 3     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
    lda (M1Ptr2),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr2),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr2),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr2),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr2),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr2),y   ; 5     (51)
    lda (M0Ptr2),y   ; 5     (56)   
    sta.a Temp        ; 4     (60)
    dey             ; 2     (62)
    bpl KernelLoop2 ; 2/3   (64/65)
    
    sta WSYNC
    lda #0
    sta GRP0
    sta ENAM0
    sta ENAM0
    sta ENABL
    sta GRP1
    sta PF0
    sta PF1
    sta PF2

 

Edit: It seems that the spoiler tage and the code tag don't play nicely together.

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...
On 7/5/2020 at 9:42 AM, Karl G said:

Thanks everyone for the advice. Another tip I need to remind myself of is to optimize for your specific need instead of for the general case.

 

Anyway, my solution uses 0s on each side of the graphics instead of a separate masking step, with two different zones to avoid crossing pages. I was able to draw all 5 objects as well as write to all 3 playfield registers (symmetric playfield), but no color changes. I use the same y register index for all objects, and set the pointers beforehand so that that's valid for each of them.

 

This will work fine for that idea I have in mind at the moment. Here's the kernel code in case anyone else might find it useful:


    ldy #83
    lda (PF1Ptr),y
    sta PF1
    lda #0
    sta WSYNC       ; 3     (0)
    sta VBLANK      ; 3     (3)
    jmp ____kernel_entry ;3 (6)

    align $100

KernelLoop
    lda (PF1Ptr),y  ; 5     (69)
    sta.a PF1       ; 4     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
____kernel_entry
    lda (M1Ptr),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr),y   ; 5     (51)
    lda (M0Ptr),y   ; 5     (56)   
    sta Temp        ; 3     (59)
    dey             ; 2     (61)
    bpl KernelLoop  ; 2/3   (63/64)

    ldy #83         ; 2     (65)

KernelLoop2
    lda (PF1Ptr2),y  ; 5     (70)
____kernel_bottom_entry
    sta PF1       ; 3     (73)   
    lda Temp        ; 3     (0)
    stx GRP1        ; 3     (3)
    sta ENAM0       ; 3     (6)
    lda (M1Ptr2),y   ; 5     (11)
    sta ENAM1       ; 3     (14)
    lda (PF0Ptr2),y  ; 5     (19)
    sta PF0         ; 3     (22)
    lda (PF2Ptr2),y  ; 5     (27*)
    sta PF2         ; 3     (30)
    lda (BallPtr2),y ; 5     (35)
    sta ENABL       ; 3     (38)
    lda (P0Ptr2),y   ; 5     (43)
    sta GRP0        ; 3     (46)
    lax (P1Ptr2),y   ; 5     (51)
    lda (M0Ptr2),y   ; 5     (56)   
    sta.a Temp        ; 4     (60)
    dey             ; 2     (62)
    bpl KernelLoop2 ; 2/3   (64/65)
    
    sta WSYNC
    lda #0
    sta GRP0
    sta ENAM0
    sta ENAM0
    sta ENABL
    sta GRP1
    sta PF0
    sta PF1
    sta PF2

 

Edit: It seems that the spoiler tage and the code tag don't play nicely together.

 

Super minor comment here, but it looks like you have a left over illegal/non-standard opcodes that you're not actually using the benefits of in your kernel (left over code perhaps?): LAX (load the accumulator and the X register), specifically the "lax (P1Ptr),y" and "lax (P1Ptr2),y" instructions.  Those both look like they should be LDX instructions, as you're trashing the accumulator right after each LAX instruction anyway and use X for a store later.  It won't change your timing or anything, just gets rid of a couple non-standard instructions that aren't being used anyway.

 

I do like your perfect 76 cycle timing though, very nice!  ?

Link to comment
Share on other sites

8 minutes ago, Kevin McGrath said:

 

Super minor comment here, but it looks like you have a left over illegal/non-standard opcodes that you're not actually using the benefits of in your kernel (left over code perhaps?): LAX (load the accumulator and the X register), specifically the "lax (P1Ptr),y" and "lax (P1Ptr2),y" instructions.  Those both look like they should be LDX instructions, as you're trashing the accumulator right after each LAX instruction anyway and use X for a store later.  It won't change your timing or anything, just gets rid of a couple non-standard instructions that aren't being used anyway.

 

I do like your perfect 76 cycle timing though, very nice!  ?

Are you saying I was LAX in my coding? :P

Actually I do make use of it to load A and X, then just A, and I use the X value towards the top of my loop. 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...