Jump to content





Step 4 - 2 Line Kernel

Posted by SpiceWare, in Collect 03 July 2014 · 2,661 views

Let's review the TIA Timing diagram from last time:
Attached Image
 
We used that to determine when we could safely update the playfield data in order to draw the score and timer.  For moveable objects(player0, player1, missile0, missile1 and ball) if you update their graphics during the Visible Screen (cycles 23-76) you run the risk of shearing.  For something that's moving fast, like the snowball in Stay Frosty 2, shearing may be an acceptable design compromise:
Attached Image
That snowball should be square, but the left edge has sheared due to the ball object being updated mid-scanline.

To prevent shearing we need to update the objects on cycles 0-22.  There's a lot of calculations to be done in the kernel to draw just one player.  For Collect I'm using DoDraw, which looks like this for drawing player0:
 
DoDraw0:
        lda #HUMAN_HEIGHT-1 ; 2  2 - height of the humanoid graphics, subtract 1 due to starting with 0
        dcp HumanDraw       ; 5  7 - Decrement HumanDraw and compare with height
        bcs DoDrawGrp0      ; 2  9 - (3 10) if Carry is Set, then humanoid is on current scanline
        lda #0              ; 2 11 - otherwise use 0 to turn off player0
        .byte $2C           ; 4 15 - $2C = BIT with absolute addressing, trick that
                            ;        causes the lda (HumanPtr),y to be skipped
DoDrawGrp0:                 ;   10 - from bcs DoDrawGrp0
        lda (HumanPtr),y    ; 5 15 - load the shape for player0
        sta GRP0            ; 3 18 - update player0 to draw Human
 
 
That's 18 cycles to draw a single player. One way to make it easier to fit all the code in is to use a 2 Line Kernel (2LK). In a 2LK we update TIA's registers over 2 scanlines in order to build the display.  For Collect, the current routines are updating them like this:
  • player0, playfield
  • player1, playfield
The actual code looks like this:
ldy #ARENA_HEIGHT   ; 2  7 - the arena will be 180 scanlines (from 0-89)*2        
        
ArenaLoop:                  ;   13 - from bpl ArenaLoop
    ; continuation of line 2 of the 2LK
    ; this precalculates data that's used on line 1 of the 2LK
        lda #HUMAN_HEIGHT-1 ; 2 15 - height of the humanoid graphics, subtract 1 due to starting with 0
        dcp HumanDraw       ; 5 20 - Decrement HumanDraw and compare with height
        bcs DoDrawGrp0      ; 2 22 - (3 23) if Carry is Set, then humanoid is on current scanline
        lda #0              ; 2 24 - otherwise use 0 to turn off player0
        .byte $2C           ; 4 28 - $2C = BIT with absolute addressing, trick that
                            ;        causes the lda (HumanPtr),y to be skipped
DoDrawGrp0:                 ;   23 - from bcs DoDrawGrp0
        lda (HumanPtr),y    ; 5 28 - load the shape for player0
        sta WSYNC           ; 3 31
;---------------------------------------
    ; start of line 1 of the 2LK
        sta GRP0            ; 3  3 - @ 0-22, update player0 to draw Human
        ldx #%11111111      ; 2  5 - playfield pattern for vertical alignment testing
        stx PF0             ; 3  8 - @ 0-22
    ; precalculate data that's needed for line 2 of the 2LK        
        lda #HUMAN_HEIGHT-1 ; 2 10 - height of the humanoid graphics, 
        dcp BoxDraw         ; 5 15 - Decrement BoxDraw and compare with height
        bcs DoDrawGrp1      ; 2 17 - (3 18) if Carry is Set, then box is on current scanline
        lda #0              ; 2 19 - otherwise use 0 to turn off player1
        .byte $2C           ; 4 23 - $2C = BIT with absolute addressing, trick that
                            ;        causes the lda (BoxPtr),y to be skipped
DoDrawGrp1:                 ;   18 - from bcs DoDrawGRP1
        lda (BoxPtr),y      ; 5 23 - load the shape for the box
        sta WSYNC           ; 3 26
;---------------------------------------
    ; start of line 2 of the 2LK
        sta GRP1            ; 3  3 - @0-22, update player1 to draw box
        ldx #0              ; 2  5 - PF pattern for alignment testing
        stx PF0             ; 3  8 - @0-22
        dey                 ; 2 10 - decrease the 2LK loop counter
        bpl ArenaLoop       ; 2 12 - (3 13) branch if there's more Arena to draw
 
 
If you look at that closely, you'll see I'm splitting DoDraw a bit so that this is how the 2LK works:
  • updates player0, playfield, precalc player1 for line 2
  • updates player1, playfield, precalc player0 for line 1
By pre-calculating data during the visible portion of the scanline, we'll have more time during the critical 0-22 cycles for when we add the other objects.

Since we're updating the players on every other scanline, each byte of graphic data is displayed twice (compare the thickness of the humanoid pixels with the red lines drawn with the playfield). Also, the players never line up as they're never updated on the same scanlines:
Attached Image
closeup:
Attached Image

The designers of TIA planned for this by adding a Vertical Delay feature to the players and ball (though sadly not the missiles). The TIA registers for this are VDELP0, VDELP1 and VDELBL. For this update to Collect, I've tied the Vertical Delay to the difficulty switches, putting the switch in position A will turn on the delay for that player so we can experiment with how that works. For the next update I'll set the Vertical Delay based on the Y position of the player (this also means the maximum Y value will be double that of this build).

Left Difficulty A, Right Difficulty B so VDELP0 = 1 and VDELP1 = 0.  Sprites line up with the same Y
Attached Image
closeup:
Attached Image

Left Difficulty B, Right Difficulty A so VDELP0 = 0 and VDELP1 = 1.  Sprites line up when player1's Y = player0's Y + 1
Attached Image
Closeup:
Attached Image  


The code that preps the data used by DoDraw looks like this:
; HumanDraw = ARENA_HEIGHT + HUMAN_HEIGHT - Y position
        lda #(ARENA_HEIGHT + HUMAN_HEIGHT)
        sec
        sbc ObjectY
        sta HumanDraw
        
        ; HumanPtr = HumanGfx + HUMAN_HEIGHT - 1 - Y position
        lda #<(HumanGfx + HUMAN_HEIGHT - 1)
        sec
        sbc ObjectY
        sta HumanPtr
        lda #>(HumanGfx + HUMAN_HEIGHT - 1)
        sbc #0
        sta HumanPtr+1
        
        ; BoxDraw = ARENA_HEIGHT + HUMAN_HEIGHT - Y position
        lda #(ARENA_HEIGHT + HUMAN_HEIGHT)
        sec
        sbc ObjectY+1
        sta BoxDraw
        
        ; BoxPtr = HumanGfx + HUMAN_HEIGHT - 1 - Y position
        lda #<(HumanGfx + HUMAN_HEIGHT - 1)
        sec
        sbc ObjectY+1
        sta BoxPtr
        lda #>(HumanGfx + HUMAN_HEIGHT - 1)
        sbc #0
        sta BoxPtr+1

...
HumanGfx:
        .byte %00011100
        .byte %00011000
        .byte %00011000
        .byte %00011000
        .byte %01011010
        .byte %01011010
        .byte %00111100
        .byte %00000000
        .byte %00011000
        .byte %00011000
HUMAN_HEIGHT = * - HumanGfx        
 
 
The graphics are much easier to see using my mode file for jEdit:
Attached Image

I'm sure some of you are wondering why the human graphics are upside down. If you wanted to loop thru something 10 times, you'd normally think to write the code like this:
ldy #0
Loop:
  ; do some work
  iny
  cpy #10
  bne Loop
 
 
But the 6507 does an automatic check for 0 (as well as positive/negative) which lets you save 2 cycles of processing time by eliminating the CPY command:
ldy #10
Loop:
  ; do some work
  dey
  bne Loop
 
 
Alternatively, if your initial value is less than 128, you can use this:
ldy #(10-1)
Loop:
  ; do some work
  dey
  bpl Loop
 
 
Making the loop count down instead of up saves 2 cycles, but doing so requires the graphics to be upside down. 2 cycles doesn't sound like much, but in a scanline that's 2.6% of your processing time and saving it might be what allows you to update everything you want.  In Kernels I've written, I often use every cycle - and that includes eliminating the sta WSYNC to buy back 3 cycles of processing time.  See the reposition kernels in this post about Draconian.
 
I've also added joystick support that will let you move around the players. Pressing FIRE will slow down the movement, making it easier to line things up. The score (on the left) is used to display player0's Y position, and the timer is used for player1. As an added bonus, I'm showing how you can save ROM space by creating graphics that only face in one direction by using REFP0 and REFP1 (REFlect Player) to make the graphics face the other way. The routine's fairly sizable, so I'm not posting it here so download the source code and check it out!

ROM
Attached File  collect_20140703.bin (2KB)
downloads: 182
 
Source
Attached File  Collect_20140703.zip (43.2KB)
downloads: 255




Is there a different way to line up missiles since it doesn't use the vertical delay feature? If I wanted to say make a 2 player bomberman game and use the bombs as missiles, or maybe something like smash tv with the bullets being missiles what could be done?

  • Report

There's only so many cycles per line, so if you want to update both missiles on every line you'll have to drop something else.   As an example of such a compromise, in Stay Frosty I used a reflected playfield and dropped the updates to PF0.  That's why the upper level ice blocks and platforms never go to the edges of the screen.  You can see that in this blog post, where I compare Stay Frosty with Stay Frosty 2.

 

So how did I get Stay Frosty 2 to go the full width?  By using an in-cartridge coprocessor like they did back in the day for Pitfall II.   The coprocessor is known as DPC+, but that's beyond the level of a beginner course like Collect.  After you've finished working thru the Collect blog entries, go check out the Harmony DPC+ programming topic.

  • Report

Thanks. I'm taking my time making sure I grasp everything going on. So in arenaloop you use dcp followed by bcs. Could you clarify what flag is being set? I assume it's the carry flag, but that seems strange for an instruction to set comparing equality. Although if it's equal that just means humandraw is +1 greater than height so I guess that could make sense getting put into the carry flag. Also after that you use .byte $2C to skip a few bytes vs using a branch(to save those bytes). What would you need to change to the bit pattern to make it skip more or less bytes?

 

Thanks!

  • Report
I'm going to do this as 2 replies.  This reply is for DCP.
 
The 6502 was designed with 151 opcodes which are known by their mnemonics of LDA, STA, etc.  There's 256 possible opcodes, so the remaining 105 were undefined.  
 
Over time people figured out that some of the undefined opcodes did really useful things and assigned them names.  Some that are commonly used with 2600 development are DCP, LAX and SAX.  You can see a list of them here under the section titled Illegal opcodes.  Do note that some of them are unstable, and thus shouldn't be used.  If you'd like more information check this document, How MOS 6502 Illegal Opcodes really work.  Also note that these opcodes are interchangeably known as illegal opcodes as well as undefined opcodes.
 
DCP is named as such because it's a merging of the DEC and CMP opcodes.  Basically this bit of code, which takes 10 cycles to run:
        lda #HUMAN_HEIGHT-1 ; 2 15 - height of the humanoid graphics, subtract 1 due to starting with 0
        dec HumanDraw       ; 5 20 - Decrement HumanDraw by 1
        cmp HumanDraw       ; 3 23 - Compare HumanDraw with height
 
does exactly the same thing as this bit of code, which takes only 7 cycles to run:
        lda #HUMAN_HEIGHT-1 ; 2 15 - height of the humanoid graphics, subtract 1 due to starting with 0
        dcp HumanDraw       ; 5 20 - Decrement HumanDraw and compare with height
 
The 3 cycles savings is very handy when writing 2600 code.
  • Report
This reply is for .byte $2C
 
This is a 6502 trick I learned back in the 80s on my VIC-20.  It's a space savings trick to skip over a 2 byte instruction.   
 
If you take a look at that opcode matrix again, you'll see that $2C is the BIT abs opcode that takes 4 cycles to execute.  The abs means 2 bytes follow the opcode to specify an absolute address.  

If the bcs is taken, the 6507 skips over the lda #0 and .byte $2c and runs the code like this:
        bcs DoDrawGrp0      ; 2 22 - (3 23) if Carry is Set, then humanoid is on current scanline
DoDrawGrp0:                 ;   23 - from bcs DoDrawGrp0
        lda (HumanPtr),y    ; 5 28 - load the shape for player0
        sta WSYNC           ; 3 31
 
If the bcs is not taken, the 6507 runs the code like this:
        bcs DoDrawGrp0      ; 2 22 - (3 23) if Carry is Set, then humanoid is on current scanline
        lda #0              ; 2 24 - otherwise use 0 to turn off player0
        bit $93b1           ; 4 28 - $2C = BIT with absolute addressing, trick that
        sta WSYNC           ; 3 31
 
Looking at the listing created by DASM, you'll see the lda (HumanPtr),y instruction is compiled like this:
    324  f8d7		       b1 93		      lda	(HumanPtr),y	; 5 28 - load the shape for player0
 
The b1 93 is the $93b1 address after the BIT instruction.
 
There's two reasons for using the $2C trick - first reason it is saves ROM space.  Second reason is the code takes the same amount of cycles to execute, whether or not the branch is taken.  When writing a kernel, having consistent execution time is often critical.  For this kernel, due to the use of sta WSYNC, the time is not critical though we're happy to get the space savings.
 
This blog entry, 6502 Assembly - .BYTE $2C - Insane Coding Trick, by Johnny Star may also help explain the use of .byte $2C.
  • Report

In case it's not clear how the delay works:

  • If VDELP0 is on, any updates to GRP0 are delayed until GRP1 is written to.
  • If VDELP1 is on, any updates to GRP1 are delayed until GRP0 is written to.
  • Report

Good Morning,

 

I'm having a difficult time distinguishing between each Human variable. Could you please explain?

 

Does Humandraw = the Y position of the player on the screen?  Why decrement it in the scan loop?

Does HumanPtr = the line of the player's graphic to be drawn now?

HUMAN_HEIGHT = ???

 

Also I don't understand this syntax.  The asterisk mainly:

HUMAN_HEIGHT = * - HumanGfx

 

Thanks!

  • Report
Been out of town, so just a quick response on HUMAN_HEIGHT.  Will follow up on the rest later.
 
 
I always compile with the -s option to have dasm generate a symbol file.  Open up collect.sym from the zip and you'll find:
 
 
HUMAN_HEIGHT             000a
 
All the symbol values are in hex, so HUMAN_HEIGHT has a value of 10 in decimal.
 
I also compile with the -l option to generate the listing.  Open up collect.lst from the zip and you'll find:
 
 
    722  fa50    HumanGfx
    723  fa50        1c       .byte.b %00011100
    724  fa51        18       .byte.b %00011000
    725  fa52        18       .byte.b %00011000
    726  fa53        18       .byte.b %00011000
    727  fa54        5a       .byte.b %01011010
    728  fa55        5a       .byte.b %01011010
    729  fa56        3c       .byte.b %00111100
    730  fa57        00       .byte.b %00000000
    731  fa58        18       .byte.b %00011000
    732  fa59        18       .byte.b %00011000
    733  fa59        00 0a    HUMAN_HEIGHT = * - HumanGfx
    734  fa5a
 
The * denotes the current program counter (location in ROM). It's used as a synonym for .. From dasm's instruction file dasm.txt:

. -current program counter (as of the beginning of the instruction).

* -synonym for ., when not confused as an operator.


The listing is a little deceptive - the last byte value of 18 is located at fa59, so the * in the equation has a value of fa5a, not fa59 as the listing suggests.
 
HUMAN_HEIGHT = * - HumanGfx
HUMAN_HEIGHT = fa5a - fa50
HUMAN_HEIGHT = 000a
 
Basically I'm letting dasm calculate the size of the image. I do that because I usually start projects with placeholder graphics that are later replaced with images created by graphic artists such as Nathan Strum and David Vazquez. Do note those lists of projects they've contributed to should be larger - for instance Nathan did the graphics for Space Rocks, but that game's not yet in the database.
  • Report
Yes, HumanDraw is the Y position and then some.  Remember that TIA is scanline based so we need a way to programmatically determine if the sprite is to be drawn over a number of consecutive scanlines.  The DCP is a thrifty(fast) way to do that.
 
Splitting the DCP to it's components, we're running this bit of code on every scanline to determine if the sprite is drawn:
  lda #10 ; the height of the sprite
  dec HumanDraw
  cmp HumanDraw
  bcs DrawSprite ; if 'C'arry is set, the sprite is on this scanline
  lda #0; sprite not on this scanline, so use 0 to blank it out
  jmp UpdateTIA ; the .byte $2c trick is equivalent to this
DrawSprite:
  lda (HumanPtr),y ; fetch the shape for this particular scanline
UpdateTIA:
  sta GRP0
 
During the compare the Carry flag will be set if HumanDraw < 10.  So for 10 lines, when HumanDraw has the value from 0 to 9, the sprite data will be loaded into A.  All other times the 0 is loaded into A in order to blank out the sprite.
 
To have the sprite start drawing on the 10th scanline, we set HumanDraw to 19.  Due to the decrement BEFORE the compare, the code run on each scanline will check the following values of HumanDraw:
  • line 1 - is 18 < 10 - nope, blank the sprite
  • line 2 - is 17 < 10 - nope, blank the sprite
  • ...
  • line 9 - is 10 < 10 - nope, blank the sprite
  • line 10 - is 9 < 10 - yep, draw the sprite
  • line 11 - is 8 < 10 - yep, draw the sprite
  • ...
  • line 18 - is 1 < 10 - yep, draw the sprite
  • line 19 - is 0 < 10 - yep, draw the sprite
  • line 20 - is 255 < 10 - nope, blank the sprite*
  • line 21 - is 254 < 10 - nope, blank the sprite
  • ...
* when dealing with unsigned 1 byte values, 0 - 1 = 255, or in Hex that'd be $00 - $01 = $ff.
  • Report
And for the final part of the question: yes, HumanPtr is used to point to the line of graphics to be drawn now.

The tricky part is it's too time consuming to adjust HumanPtr in the Kernel like we do HumanDraw, so we have to use the Y register select the proper line of graphics for each scanline. However Y won't be 0-9 while the sprite is being drawn, it'll be another range of 10 numbers in sequence, so we need to adjust the value of HumanPtr to compensate.

In other words, we need HumanPtr + Y to be equal to HumanGfx + HUMAN_HEIGHT - 1 when we hit the very first scanline for "draw the sprite" (as denoted in the prior reply).
 
Since Y is counting downward in the kernel, HumanPtr + Y will equal HumanGfx on the last "draw the sprite" scanline.
 
Hope these three replies clear it up for you!
  • Report

Thanks! It took me quite a few reads to digest it.

 

I can follow it.. Don't ask me to recite it though!

 

So:

DASM knows that * - HumanGfx is the "distance" from where HumanGfx started, making HUMAN_HEIGHT equal to that number of positions-right?

 

One other thing.. I understand setting the carry flag, but why does DCP set the carry flag here? What's the logic of that? Is it just something you know or does it make sense somehow? I don't see what's carrying over..

  • Report
That's exactly right on * - HumanGfx.
 

The DCP instruction is really DEC and CMP, we use it because it's a few cycles faster than using the "legal instructions". From 6502.org Tutorials: Compare Instructions

The CMP, CPX, and CPY instructions are used for comparisons as their mnemonics suggest. The way they work is that they perform a subtraction. In fact,

    CMP NUM
 
is very similar to:
    SEC
    SBC NUM
 
Both affect the N, Z, and C flags in exactly the same way. However, unlike SBC, (a) the CMP subtraction is not affected by the D (decimal) flag, (b) the accumulator is not affected by a CMP, and © the V flag is not affected by a CMP. A useful property of CMP is that it performs an equality comparison and an unsigned comparison. After a CMP, the Z flag contains the equality comparison result and the C flag contains the unsigned comparison result, specifically:
  • If the Z flag is 0, then A <> NUM and BNE will branch
  • If the Z flag is 1, then A = NUM and BEQ will branch
  • If the C flag is 0, then A (unsigned) < NUM (unsigned) and BCC will branch
  • If the C flag is 1, then A (unsigned) >= NUM (unsigned) and BCS will branch
  • Report

Thanks for your explanation of DoDraw; now I finally get how this works.

 

One question though: when preparing the HumanPtr pointer and subtracting the y-position of the player, I think it is possible this ends up pointing to an address in the page prior to the page where the gfx is located. Which - depending on the y-position of the player - can sometimes result in an extra cycle when doing lda (HumarPtr),y.

 

So to be cycle-exact every time, you probably need to place the player gfx somewhere at the end of a page, right? 

  • Report

You are correct!  Surprisingly I'd not run into any problems due to that, most likely due to there beings enough slack in the kernel than an extra cycle didn't cause a problem.

  • Report

Search My Blog

Recent Entries

Recent Comments

Latest Visitors

1 user(s) viewing

0 members, 1 guests, 0 anonymous users