This session we're going to have a prelminary look at vertical movement of sprites.
In the previous sessions we have seen that there are two 8-pixel wide sprites, each represented by a single 8-bit register in the TIA itself. The TIA displays the contents of the sprite registers at the same horizontal position on each scanline, corresponding to where on an earlier scanline the RESP0 or RESP1 register was toggled. We explored how to use this knowledge to develop some generic "position at horizontal pixel x" code which greatly simplified the movement of sprites in a horizontal direction.
Instead of having to work with the odd RESPx timing, we have abstracted that aspect of the hardware and now reference the sprite position through a variable in RAM, and our code positions the sprite to the pixel number indicated by this variable.
Let's now have a look at how to position a sprite vertically.
Our examples so far have shown how sprites appear as a vertical strip the entire height of the screen. This is due, of course, to the single byte of sprite data (8 bits = 8 pixels) being duplicated by the TIA (for each sprite) on each scanline. If we change the data held in the TIA sprite graphics registers (ie: in GRP0 or GRP1), then the next time the TIA draws the relevant sprite, we see a change in the shape that the TIA draws on-screen. We still see 8 pixels, directly under the 8 pixels of the same sprite on the previous scanline - but if we've changed the relevant GRPx register then we will see different pixels on (solid) and different pixels off (transparent).
To achieve vertical movement of "a sprite" - and by this, we mean a recognisable shape like a baloon, for example - we need to modify the data that we are writing to the GRPx register. When we're on scanlines where the shape is not visible, then we should be writing 0 to the GRPx register - and when on scanlines where the shape is visible, we should be writing the appropriate line of that shape to the GRPx register. Doing this quickly, and with little RAM or ROM usage, is the trickiest bit. Conceptually, it's quite simple.
There are several ways to tackle the problem of writing the right line of the shape on the right line of the screen, and nothing when the shape isn't on the line we're drawing. Some of them take extra ROM, some require more RAM, and some of them require more cycles per line.
Most kernels keep one of the registers as a "line counter" for use in indexing into tables of data for playfield graphics - so that the correct line of data is placed in the graphics registers for each scanline. The kernels we've created so far also use this line counter to determine when we have done sufficient lines in our kernel. For example...
ldx #0 ;2 Kernel lda PF0Table,x ;4 sta PF0 ;3 lda PF1Table,x ;4 sta PF1 ;3 lda PF2Table,x ;4 sta PF2 ;3 sta WSYNC ;3 inx ;2 cpx #192 ;2 bne Kernel ;3(2)
The above code segment shows a loop which iterates the X register from 0 to 192 while it writes three playfield registers on each of the scanlines it 'generates'. We've covered all of this in previous sessions. The numbers after the semicolon (the comment area) indicate the number of cycles that instruction will take (not taking into account possible page-boundary crossing, etc). We can see that this simple symmetrical playfield modification will take at least 31 cycles of our available 76 cycles just to do the three playfield registers on each scanline. That leaves only 45 cycles to do sprites, missiles, ball -- and let's not forget the other three playfield writes if we're doing an asymmetrical playfield.
Clearly, our scanline loop is extremely starved of cycles, and any code we put in there must be extremely efficient. The biggest waste in the code above is the comparison. Remember earlier we indicated that the 6502 has a flags register, and some of these flags are set/cleared automatically after certain operations (on loads and arithmetic operations - including register increments and decrements, the negative and zero flags are automatically set/cleared). From now on we're going to use the 'standard' way of looping and instead of specifically comparing a line count with a desired value (eg: counting up to 192), we'll switch to starting at our top value and decrementing the line counter and branching UNTIL the counter gets to 0. By using our knowledge about the automatic flag setting, we are able to remove the comparison from our loop...
ldx #192 ;2 Kernel lda PF0Table,x ;4 sta PF0 ;3 lda PF1Table,x ;4 sta PF1 ;3 lda PF2Table,x ;4 sta PF2 ;3 sta WSYNC ;3 dex ;2 bne Kernel ;3(2)
The trick here is that the "dex" instruction will set the Z (zero) flag to 1 if the x register is zero after the instruction has executed, and 0 if it is non-zero. The "bne" instruction stands for "branch if Z is zero" or more memorably "branch if the result was not equal (to zero)". In short, the branch will be taken if the x register is non-zero. Thus we have removed two cycles from our inner scanline loop. But at what cost? Since the loop is now counting "down" instead of "up", our tables will now be accessed upside-down (that is, the first scanline will show data from the bottom of the tables), and our whole playfield will "flip" upside-down. That's fine - the solution for this is to change the tables themselves so they are upside-down, too!
All of that was a bit of a diversion - but it's important to understand how we are accessing our data in an upside-down fashion merely for the purposes of efficiency - in this case, saving us just 2 cycles per scanline. But those 2 cycles are some 2.6% of the time we have, and every little bit counts.
Even with this improvement, we have just 47 cycles left to do everything else. Let's have a look at what we need to add to this to get sprites up and running. Assume we are loading our sprite data from a table, just as with the playfield data. We'd need to add...
lda Sprite0Data,x ;4 sta GRP0 ;3
That's 7 cycles, which is OK - but we find that we have an immovable (we have no ability to change the vertical position) block of sprite data the whole height of the screen - read from the table 'Sprite0Data'. This setup would also require that our sprite data table is 192 lines high.
Let's assume, just for a minute, that Sprite0Data was in RAM. Then we'd have the ability to use this kernel to do the display and have another part of our program draw different shapes into that RAM table (ie: if we were drawing a PacMan sprite, we could have the first 20 'lines' of the table with 0, then the next 16 lines with the shape for the pacman sprite, then the remainder with 0). To move this sprite up or down, we'd simply change where in the RAM table we were drawing the sprite - and when our kernel came to do the display, it wouldn't really care where the sprite was, it would just draw the continuous strip of sprite data from the RAM table, and voila! Vertically moving sprites.
And this is exactly how the Atari home computers manage vertical movement of sprites. They, too, have a single register holding the sprite data - and they, too, modify this register on-the-fly to change the shape of the sprite that is being shown on each scanline. But the difference is that the Atari computers have a bit of hardware which does EXACTLY what our little kernel above does - that is, copy sprite data from a RAM buffer into the hardware sprite register.
The problem for Atari 2600 kernels is that we simply don't have 192 bytes of RAM to spend on a draw buffer/table for each player sprite. In fact, we only have 128 bytes RAM total for our entire program! So it's a nice solution - and certainly one that should be used if you are programming for some cartridge format with ample RAM - because it provides extremely quick (7 cycles) drawing of sprites.
But for normal usage, this technique is not possible or practical.
Unfortunately, the available alternatives are costly - in terms of processing time. The quickest 'generic sprite draw' that I'm aware of at the moment takes 18 cycles. Given our 47 cycles remaining in the scanline, 36 of these would be taken up drawing just two sprites - and that makes asymmetrical playfield, balls and missiles a very problematic task. How can we fit all of these into the remaining 11 cycles of time?
The short answer is: we can't. And this is why many games revert to what is termed a "2 scanline kernel". Instead of trying to fit ALL of the updates into a single scanline, the 2 scanline kernel tries to fit all of the updates into two scanlines - taking advantage of the TIA's persistant state so that registers which have been modifed on one scanline will remain the same until next modified. A typical two scanline kernel will modify the playfield (left side), sprite 0, playfield (right side) on the first scanline, then the playfield (left side), sprite 1, playfield (right side) on the second scanline - and then repeat the process.
The upshot of this is that our sprites have a maximum resoution of two scanlines - that is, we can only modify the shape of a sprite once every two lines - and in fact each sprite is updated on alternate lines. There's a bit of hardware (a graphics delay of 1 scanline) to compensate for this, so that the sprites APPEAR to update on the same scanline. This interesting hardware capability shows clearly that the designers of the '2600 were well aware of the time limitations inherent in trying to update playfield registers, sprites missiles and ball in a single scanline - and that they designed the hardware accordingly to mask this problem.
But we're not concerned with two scanline kernels this session. Please be aware that they are extremely common - and many games extend this concept to multiple-scanline kernels - where different tasks are performed in each scanline, and after n scanlines this process repeats to build up the screen out of 'meta-scanlines'. It's a useful technique to get around the limitations of cycles per line.
Before we continue, let's have a think about what we want a sprite draw to do - it's fine to be able to display a sprite shape anywhere on the screen (we've already touched on the horizontal positioning, and now we're well on the way to understanding how the vertical positioning works) - but sprites typically animate. How can we use the code shown so far to animate our sprites as well?
If we used the Atari computer method - presented above - of using a 'strip' of RAM to represent the table from which data is written to the screen, and modifying the data written to that table, then the problem is fairly simple - we just write different shapes to the table. But if we don't HAVE a RAM table, and we're forced to use a ROM table, then to get different shapes onscreen, we're going to have to use different tables. We can't modify the contents of tables in ROM! But the code above has the table hardwired into the code itself. That is...
lda Sprite0Data,x sta GRP0
The problem here is that the address of the table is hardwired at the time we write our code - and the assembler will happily predetermine where this table is in the ROM, and the code will always fetch the data from the same table. What we really want to do with a sprite routine is not only fetch the data from a table - but also be able to change WHICH table we fetch the data from.
And here is an ideal use for a new addressing mode of the 6502.
In the above code, 'zp' is a zero page two-byte variable which holds a memory address. The 6502 takes the contents of that variable (ie: the address of our table), adds the y register to it, and then uses the resulting address as our location from which to load a byte of data. It's quite an expensive instruction, taking 5 cycles to execute.
But now our code for drawing sprites (in principle) can look like this...
lda (SpriteTablePtr),y sta GRP0
The problem this introduces is that the Y register is used for indexing the data table, whereas we were previously using the X register. There's no way around this - the addressing mode does not work with the X register! So let's change our kernel around a bit, and instead of using the X register to count the scanlines, we'll switch to the Y register...
ldy #192 ;2 Kernel lda PF0Table,y ;4 sta PF0 ;3 lda PF1Table,y ;4 sta PF1 ;3 lda PF2Table,y ;4 sta PF2 ;3 lda (SpriteTablePtr),y; 5 sta GRP0; 3 sta WSYNC ;3 dey ;2 bne Kernel ;3(2)
This is a bit better - now (as long as we previously setup the zero page 2-byte variable to point to our table) we are able to display any sprite shape that we desire, using the one bit of code. Here's what you'd need to do to setup your variable to point to the sprite shape data...
lda #<Sprite0Data sta SpriteTablePtr lda #>SPrite0Data sta SpriteTablePtr+1
Additonally, the variable should be defined in the RAM segment like this...
SpriteTablePtr ds 2
Now let's review all of that and make sure we understand exactly what is happening... We have a zero page variable (2 bytes long) which holds the address of the sprite table containing the shape we want to display. Addresses are 16-bits long, and we've already seen how the 6502 represents 16-bit addresses by a pair of bytes - the low byte followed by the high byte (little-endian order). So into our sprite pointer variable, we are writing this byte-pair. The '>' operator tells the assembler to use the high byte of an address, and the '<' operator tells the assembler to use the low byte of an address. These are standard operators, but there's another way to do it...
lda #address&0xFF ; low byte sta var lda #address/256 ; high byte sta var+1
Other ways exist. It doesn't really matter which one you use - the result is the same. We end up with a zero page variable which POINTS to the table which is used to give the data for the shape of the sprite. In fact, the variable points to the very start of the table.
And this is our new problem! As we have earlier seen, if we had a RAM table, then we could move the sprite up and down by drawing it into that table and let our kernel display the whole 'strip' of sprite data. The effect would be that the sprite moved up and down on screen. But because we don't have that much RAM, we must programatically determine on which scanline(s) the sprite data is to be displayed from the table, and which scanline(s) should contain 0-data for the sprite.
Essentially the process consists of comparing the current line-counter (the Y register) with the vertical position required for the sprite. If the counter comparison indicates that the sprite should be visible on the current scanline, then the data is fetched from the table - else a 0 value is used for the sprite data. Rather than stepping through the entire process and deriving the optimum result, we're going to just drop in the method used by nearly all games these days...
sec ; 2 can often be guaranteed, and omitted tya ; 2 sbc SpriteEnd ; 3 adc #SPRITE_HEIGHT ; 2 bcs .MBDraw3 ; 2(3) nop ; 2 nop ; 2 sec ; 2 bcs .skipMBDraw3 ; 3 .MBDraw3 lda (Sprite),y ; 5 sta GRP0 ; 3 .skipMBDraw3
Now here things start to get a bit complex! What the above code shows is a sprite draw routine which effectively takes a constant 18 cycles of time to either draw the sprite data from a table (when it's visible), or skip the draw entirely (when it's not visible). There are a few assumptions here...
1) The last drawn line of a sprite is always 0 - thus subsequent lines onscreen do not need to be 'cleared' - the persistant state of the TIA GRP registers will be sufficient to ensure the sprite is not displayed after the sprite is finished drawing
2) A variable 'SpriteEnd' is pre-calculated to indicate the starting line number of the sprite.
3) The sprite is of constant height (here, SPRITE_HEIGHT).
4) The branches in this code are assumed to NOT cross over page boundaries. If they did, then each would incur an additional cycle penalty - and the timing for the scanline would be incorrect.
So, that's a bit much to deal with in one whack - and to be honest you don't really need to understand the intricacies. Basically the code has two different sections - one where the sprite data is drawn from the table, and one where the draw is skipped. Each section is carefully timed so that after they rejoin at the bottom, they have both taken EXACTLY the same number of cycles to execute.
Thomas Jentzsch has presented more optimal code, in the form of his 'skipdraw' routine - and frankly, I've not bothered taking the time to fully understand how it works, either! These sections of code are pretty much guaranteed to work efficiently and correctly, provided you setup the variables properly.
I'd like to invite Thomas to insert his wonderful code here, and explain to all of us exactly how it works!
In the meantime, though we have covered a lot of ground today I hope you will understand the basic principles of vertical sprite movement. In summary...
1) There is no hardware facility to 'move' sprites either horizontally or vertically. To achieve horizontal motion, we need to hit RESPx register at exactly the right horizontal position in a scanline, at which point the appropriate sprite will start drawing. To achieve vertical motion, we need to adjust what data we feed to the GRPx registers, so that the shape we are drawing starts on the appropriate scanline, and scanlines where it is not visible have 0-data.
2) There are precious few cycles available on sanlines, and many of these are taken up by playfield drawing and loop management. Sprite drawing can be done efficiently with large RAM buffers, but most cartridge configurations don't offer this luxury.
3) Drawing animated sprites can be done efficiently by using an indirect zero-page pointer to point to sprite data tables. These tables can then be used as source for the sprite draw.
4) The sprite draw needs to determine, for each scanline, if the sprite would be visible on that line - and either take data from the correct table, or use 0-data.
5) Kernels can be extended to multiple-lines (at the cost of vertical resolution) to allow all the necessary hardware updates to be performed.
1) Modify your current sprite drawing code to use a zero-page variable to point to a table of data in your ROM.
2) Create another data table, and use a variable to determine which of the two data tables to display. You might like to have it switch between these tables every second, or perhaps use the joystick button to determine which is displayed. As a hint - remember, you need to setup the zero page pointer to point to the table for your sprite draw to use - so all you need to do is change this pointer, and leave your kernel code alone.
3) THe more difficult task is to attempt to integrate the generic draw (either the code above, or Thomas's code, which should appear shortly) into your kernel. This is worth doing - and waiting for! - because once you have this installed, you'll have a totally generic kernel which can draw a sprite at practically any horizontal and vertical position on the screen and all you have to do is tell it WHERE to appear - and voila!
That should keep you busy. Enjoy!