This session we're going to have a bit of a play with horizontal positioninig code, and perhaps come to understand why even the simplest things on the '2600 are still an enjoyable challenge even to experienced programmers.
As previously noted, it is not possible to just tell the '2600 the x position at which you want your sprites to display. The x positioning of the sprites is a consequence of an internal (non-accessible) timer which triggers sprite display at the same point every scanline. You can reset the timer by writing to RESP0 for sprite 0 or RESP1 for sprite 1. And based on where on the scanline you reset the timer, you effectively reposition the sprite to that position.
The challenge for us this session is to develop code which can position a sprite to any one of the 160 pixels on the scanline!
Given any pixel position from 0 to 159, how would we go about 'moving' the sprite to that horizontal position? Well, as we now know, we can't do that. What we can do is wait until the correct pixel position and then hit a RESPx register. Once we've done that, the sprite will start drawing immediately. So if we delay until, say, TIA pixel 80 - and then hit RESP0, then at that point the sprite 0 would begin display. Likewise, for any pixel position on the scanline, if we delay to that pixel and then hit RESP0, the sprite 0 will display at the pixel where we did that.
So how do we delay to a particular pixel? It's not as easy as it sounds! What we have to do, it turns out, is keep a track of the exact execution time (cycle count) of instructions being executed by the 6502 and hit that RESPx register only at the right time. But it gets ugly - because as we know, although there are 228 TIA colour clocks on each scanline (160 of those being visible pixels), these correspond to only 76 cycles (228/3) of 6502 processing time. Consequently only 160/3 = 53 and 1/3 cycles of 6502 time in the visible part of the scanline. Since each 6502 cycle corresponds to 3 TIA clocks, it would seem that the best precision with which we could hit RESPx is within 3 pixels. But it gets uglier still, and we'll soon see why.
The SLEEP macro has been useful to us now, to delay a set number of 6502 cycles. Consider the following code...
sta WSYNC ; wait till start of line SLEEP 40 ; 20 cycle delay sta RESP0 ; reset sprite 0 position
Surely that's a simple and neat way to position the sprite to TIA colour-clock 120? The 120 comes from calculating the 6502 cycle number (40) x 3 TIA colour clocks per 6502 cycle. The answer to the question is "yes and no". Sure, it's a neat way to hardwire a specific delay to a specific position. But say you wanted to be able to adjust the position to an arbitrary spot. We could no longer use this sort of code. Remember, SLEEP is just a macro. What it does is insert code to achieve the nubmer of cycles delay you request. The above might look something more like this...
sta WSYNC nop ; 2 cycles nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 nop ; +2 sta RESP0
We don't really know what the sleep macro inserts, and we don't really care. It's documented to cause a delay of n cycles, if you pass it n. That's all we can know about it. If we wanted to change n to n+1 we could do it at compile time, but we couldn't use this sort of code for realtime changes of the delay. What we want is a bit of code which will wait a variable bit of time.
And here's where the fun really starts! There are, of course, many many ways to do this. And part of the fun of horizontal positioning code is that it's just begging for nifty and elegant solutions to doing just that. What we're going to do now is just develop a fairly simple, possibly inefficient, but workable solution.
The essence of our solution will be to use a loop to count down the delay, and when the loop terminates immediately write the RESPx register. So the longer the delay, the more our loop iterates. In principle, it's a fine idea. In practice we soon see the severe limitations. We should be familiar with simple looping contstructs - we have already used looping to count the scanlines in our kernels, for example. Here's a simple delay loop which will iterate exactly the number of times specified in the X register...
; assume X holds a delay loop count SimpleLoop dex bne SimpleLoop sta RESP0 ; now reset sprite position
That's as simple a loop as we can get. Each iteration through the loop the value in the X register is decremented by one, and the loop will continue until the Z flag is set (which happens when the value of the last operation performed by the processor returned a zero result - in this case, the last operation would be the 'dex' instruction). So as you can see, at just two instructions in size this is a pretty 'tight' loop. There's not much you can trim out of it and still have a loop! So what's the problem with using a loop like this in our horizontal positioning code? Let's have another look at this, but with cycle times added...
SimpleLoop dex ; 2 bne SimpleLoop ; 3 (2)
It has been fairly standard notation for a few years now to indicate cycle times in the fashion shown above. The number in the comment (after each semicolon) represents the number of 6502 cycles required to execute the instruction on that line. In this case, the 'dex' instruction takes 2 cycles. The 'bne' instruction takes 3 cycles (if the branch is taken) and 2 cycles if not. Unfortunately, life isn't always that simple. If the branch from the bne instruction to the actual branch location crossed over a page (a 256-byte boundary), then the processor takes another cycle! So we're faced with the situation where, as we add and remove code to other parts of our program, some of our loops take longer or shorter amounts of time to execute. No kidding! So when we come to doing tightly timed loops where timing is critical, we must also remember to somehow guarantee that this sort of shifting doesn't happen! That's not our problem today, though - let's assume that our branches are always within the same page.
So what's wrong with the above? Let's go back to our correspondence between 6502 cycles and TIA colour clocks. We know that each 6502 cycle is 3 TIA colour clocks. So a single iteration of the above loop would take 5 cycles of 6502 time - or a massive 15 TIA colour clocks. No matter what number of iterations of our loop we do, we can only hit the RESPx register with a finess of 15 TIA colour clocks! Is this a disaster? No, it's not. In fact, the TIA is specifically designed to cater for this situation. Before we delve into how, though, let's analyse this loop a bit more...
Since each iteration of the loop chews 15 TIA colour clocks, we must iterate (x/15) times, where X is the pixel number where we want our sprite to be positioned. Put another way, we need to know how many 15-pixel chunks to skip in our delay looping before we're at the correct position to hit RESPx and start sprite display. So when we come into this code with a desired horizontal position, we'll have to divide that value by 15 to give us a loop count. What's the divide instruction? There isn't one, of course!
So how do we divide by 15?
Another of those extremely enjoyable challenges of '2600 programming. Dividing by a power of 2 is easy. THe processor provides shifting instructions which shift all the bits in a byte to the left or to the right. Consider in decimal, if you shifted all digits of a number to the left by one place, and added a 0 at the end of the number, you'd have multiplied by 10. Similarly in binary, if you shift a number left once, and put a 0 on the end, you've muliplied by 2. Dividing by two is thus shifting to the right one digit position, and adding a 0 at the 'top' of the number. Typically, multiplication in particular and sometimes division are achieved by clever combination of shifting and adding numbers.
But we don't need to do that here. We know that there are ony 160 possible positions for the sprite. Why not have a 160 byte table, with each entry giving the loop counter for the delay loop for each position? Something like this...
Divide15 .POS SET 0 REPEAT 160 .byte .POS / 15 .POS SET .POS + 1 REPEND
DON'T do things by hand when the assembler can do it for you! What I've done here is write a little 'program' to control the assembler generation of a table of data. It has a repeat loop of 160 iterations, each iteration incrementing a counter by one and putting that counter value / 15 in the ROM (with the .byte pseudo-op). This code is equivalent to writing...
Divide15 .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 15 entries .byte 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1; 15 entries ; etc... lots more...
Me, I'd prefer the first example - easier to maintain and modify.
In any case, the idea of having a table is to give us a quick and easy way to divide by 15. To use it, we place our number in an index register, then load the divide by 15 result from the table, using the register to give us the offset into the table. Easier to show than explain..
ldx Xposition lda Divide15,x ; xPosition / 15 tax SimpleLoop dex bne SimpleLoop sta RESP0 ; start drawing the sprite
It's good, and it's bad. Bad because it can't cope with 'loop 0 times' - in fact, it will loop 256 times. So let's add one to all the entries in the table, which will 'fix' this problem. Just change the '.byte .POS / 15' to '.byte (.POS / 15) + 1'. But I think we're digressing, and what I really wanted to introduce was the concept of looping to delay for a certain (variable) time, and then hitting RESPx at the end of the loop. You can see the problems introduced by this method, though, where we had to find a way to divide by 15, where we only had 15 colour clock resoluion in our positioning. There are other - and arguably better - ways to do horizontal positioning, but let's not make the better the enemy of the good. What we're really after right now is a working solution.
So in theory, our positioning code so far consists of dividing the x position by 15, looping (skipping 15 colour clocks each loop) and then hitting the RESP0 register to start drawing the sprite. Is this all there is to it? Yes, in a nutshell. But the devil is in the detail. Let's integrate what we have so far into a kernel which constantly increments the desired X position for the sprite, then attempts to set the x position for the sprite each frame (see the source code and sample binary).
Now this is very interesting. Clearly our sprite is moving across the screen as our desired position is incrmenting. But it's moving in very big chunks. We have a bit of optimising to do before we have a sprite positioning system capable of pixel-precise horizontal positioning. But it's a start, and we understand it (I hope!).
There are some observations to make about this code and binary. I've introduced a little more 6502, which we can examine now...
inc SpriteXPosition; increment the desired position by 1 pixel ldx SpriteXPosition cpx #160 ; has it reached 160? bcc LT160 ; this is equivalent to branch if less than ldx #0 ; otherwise reload with 0 stx SpriteXPosition LT160 jsr PositionSprite; call the subroutine to position the sprite
This is the bit of code which does the adjustment of the desired position, loads it to the x register and calls a 'subroutine' to do the actual positioning code. This is our first introduction to the 'bcc' instruction, and to the 'jsr' and 'rts' (in the subroutine itself) instructions. We have previously encountered the Z flag and the use of flags in the processor's status register to determine if branches are taken or not. The delay loop uses exactly this. The Z flag isn't the only flag set or cleared when operations are performed by the processor. Sometimes the 'carry flag' is also set or cleared. Specifically, when arithmetic operations such as additon and subtraction, and also when comparisons are done (which are essentially achieved by doing an actual addition or subtraction but not storing the result to the register). In this case, we've compared the x register with the value 160 (cpx #160). This will clear the carry flag if the x register is LESS than 160, or set the carry flag if the X register is GREATER than or EQUAL to 160. I've always used the carry flag like this for unsigned comparisons. In the code above, we're saying 'if the x register is >= 160, then reset it to 0'. All branch instructions cost 3 cycles if taken, two if not taken, and an additional cycle if the branch taken crosses a page boundary. Branches can only be made to code within -128 or +127 bytes from the branch. For longer 'jumps' one can use the 'jmp' instruction, which is unconditional.
For long conditional branches, use this sort of code...
cpx #160 bcs GT160 ; NOT less than 160 (bcs is a GREATER or EQUAL comparison) jmp TooFarForLT ; IS less than 160 GT160 ; lots of code TooFarForLT; etc
But I digress! The 'jsr' instruction mnemonic stands for "Jump Subroutine". A subroutine is a small section of code somewhere in your program which can be 'called' to do a task, and then have program execution continue from where the call was made. Subroutines are useful to encapsulate often-used code so that it doesn't need to be repeated multiple times in your ROM. When the 6502 'calls' a subroutine, it keeps a track of where it is calling FROM, so that when the subroutine returns, it knows where to continue code execution. This 'return address' is placed on the 6502's 'stack', which we will learn about very soon now. The stack is really just a bit of our precious RAM where the 6502 stores these addresses, and sometimes other values. The 6502 uses as much of our RAM for its stack as it needs, and each subroutine call we make requires 2 bytes (the return address) which are freed (no longer used) when the subroutine returns. If we 'nest' our subroutines, by calling one subroutine from within another, then each nested level requires an additonal 2 bytes of stack space, and our stack 'grows' and starts taking increasing amounts of our RAM! So subroutines, though convenient, can also be costly. They also take a fair number of cycles for the 6502 to do all that stack manipulation - in fact it takes 6 cycles for the subroutine call (the 'jsr') and another 6 for the subroutine return (the 'rts'). So it's not often inside a kernel that we will see subroutine usage!
As noted, the 6502 maintains its stack in our RAM area. It has a register called the 'stack pointer' which gives it the address of the next available byte in RAM for it to use. As the 6502 fills up the stack, it decrements this pointer (thus, the stack 'grows' downwards in RAM). As the 6502 releases values from the stack, it increments this pointer. Generally we don't play with the stack pointer, but in case you're wondering, it can be set to any value only by transfering that value from the X register via the 'txs' instruction. If you've been following closely, you have noticed I added a bit to the initialisation section!
ldx #$FF txs ; initialise stack pointer
Without that initialisation, the stack pointer could point to anywhere in RAM (or even to TIA registers) and when we called a subroutine, the 6502 would attempt to store its return address to wherever the stack pointer was pointing. Probably with disasterous consequences!
Positioning sprites is a complex task. This session we've started to explore the problem, and have some working code which does manage to roughly position the sprite at any given horizontal position we ask. Next session we're going to dig into much more robust horizontal positioning code, and learn how the TIA provides us that fine control we need to get the horizontal positioning code precise enough to allow TIA-pixel-precise positioning. Once we've achieved that, we can pretty much forget about how this works forever more, and use the horizontal positioning code as a black box. Or perhaps a woodgrain box might be more appropriate
See you next time!