Since the minimal time for a single loop iteration is 5 cycles (involving a register decrement, and a branch), and 5 cycles corresponds to 15 TIA colour-clocks, it follows that our delay-loop approach can only position RESPx writes with an accuracy of 15 TIA colour-clocks. This is fine, though, as the hardware capability of fine-positioning sprites by -8 to +7 pixels perfectly allows the correct position of the sprite to be established.
The approach taken previously has been to effectively divide the position by 15 (either through a table-lookup, or 'clever' code which simulated a divide by 15 using a divide by 16 (quick) + adjustment) and use that value as the iteration counter in a delay loop. This approach works, and has been fairly standard for a number of years. This is the approach presented in our earlier tutorial.
A recent posting to the [stella] list of an independant discovery of a 'new' method much improves on this technique. In actual fact, the technique was already known and documented in the list... but for various reasons these things don't always become well-known. The 'new' technique of horizontal positioning rolls the divide-by-15 and the delay loop into a single entity.
sec .Div15 sbc #15 ; 2 bcs .Div15 ; 3(2)
Now that may not look like much, but it's absolutely brilliant! Every iteration through the loop, the accumulator is decremented by 15. When the subtraction results in a carry, the accumulator has gone 'past' 0, and our loop ends. Each iteration takes exactly 5 cycles (with an extra 2 cycles added for the initial 'sec' and one less for the final branch not taken). The real beauty of the code is that we also, 'for free', get the correct -8 to +7 adjustment for the fine-tuning of the position (which with a little bit of fine-tuning can be used for the HMP0 register)! Read the relevant post on [stella] here... http://www.biglist.c...3/msg00260.html
For this brilliant bit of coding, our thanks go to R. Mundschau
; Positions an object horizontally ; Inputs: A = Desired position. ; X = Desired object to be positioned (0-5). ; scanlines: If control comes on or before cycle 73 then 1 scanline is consumed. ; If control comes after cycle 73 then 2 scanlines are consumed. ; Outputs: X = unchanged ; A = Fine Adjustment value. ; Y = the "remainder" of the division by 15 minus an additional 15. ; control is returned on cycle 6 of the next scanline. PosObject SUBROUTINE sta WSYNC ; 00 Sync to start of scanline. sec ; 02 Set the carry flag so no borrow will be applied during the division. .divideby15 sbc #15 ; 04 Waste the necessary amount of time dividing X-pos by 15! bcs .divideby15 ; 06/07 11/16/21/26/31/36/41/46/51/56/61/66 tay lda fineAdjustTable,y ; 13 -> Consume 5 cycles by guaranteeing we cross a page boundary sta HMP0,x sta RESP0,x ; 21/ 26/31/36/41/46/51/56/61/66/71 - Set the rough position. rts ;----------------------------- ; This table converts the "remainder" of the division by 15 (-1 to -15) to the correct ; fine adjustment value. This table is on a page boundary to guarantee the processor ; will cross a page boundary and waste a cycle in order to be at the precise position ; for a RESP0,x write ORG $F000 fineAdjustBegin DC.B %01110000; Left 7 DC.B %01100000; Left 6 DC.B %01010000; Left 5 DC.B %01000000; Left 4 DC.B %00110000; Left 3 DC.B %00100000; Left 2 DC.B %00010000; Left 1 DC.B %00000000; No movement. DC.B %11110000; Right 1 DC.B %11100000; Right 2 DC.B %11010000; Right 3 DC.B %11000000; Right 4 DC.B %10110000; Right 5 DC.B %10100000; Right 6 DC.B %10010000; Right 7 fineAdjustTable EQU fineAdjustBegin - %11110001; NOTE: %11110001 = -15
One interesting aspect of this code is the access to the table with a (conceptual) negative index (-1 to -15 inclusive). Negative numbers are represented in two's complement form, so -1 is %11111111 which is *exactly* the same as 255 (%11111111). So how can we use negative numbers as indexes? We can't! All indexing is considered to be with positive numbers. So if our index was -1, we would actually index 255 bytes past the beginning of our table. The neat bit of code at the bottom sets the conceptual start of our table to 241 bytes BEFORE the start of the actual data so that when we attempt to access the -15th element of the table, we ACTUALLY end up at the very first byte of the "fineAdjustBegin" table. Likewise, when accessing the -1th element, we ACTUALLY access the last element of the table. It's all very neat!
Finally, since we need to account for every cycle in this code very carefully (as the horizontal position depends on exactly where we write the RESP0 value), we need to take into account the possibility that an extra cycle is being thrown in when we access fineAdjustTable,y and that access crosses a page boundary. By positioning the table being accessed exactly on a page boundary, the code guarantees that every access incurs an extra cycle 'penalty' and is therefore consistent for all cases.
I don't take any credit for this, I just admire it. I consider this a BRILLIANT bit of coding, so hats-off to R. Mundschau and thanks for sharing!
Another "BRILLIANT" bit of code, but this time from yours truly, is the 8-byte system clear. We touched on this earlier in Session 12, but I thought I'd give a quick run-down on exactly how that code works...
ldx #0 txa Clear dex txs pha bne Clear
We assume that when this code starts, the system is in a totally unknown state. Firstly, X and A are set to 0, and we enter the loop.
The loop begins: X-register is decremented (to 255) and this value is placed in the stack pointer (now $FF)
the accumulator(0) is then pushed onto the stack, so memory/hardware location $FF is set to 0, and the stack pointer decrements to $FE
since the tsx and pha don't affect the flags, the branch will be based on the decrement of the x register
if non-zero, then we repeat the loop. 0 will be written to 256 consecutive memory locations starting with $FF and ending with 0 (inclusive). Loop will terminate after 256 iterations.
On the final pass through, x would be decremented to 0, and this placed in the stack pointer. We then push the accumulator (0) onto the stack (which effectively writes it to memory (TIA) location 0) and as a consequence the stack pointer decrements (and wraps!) back to $FF
At the conclusion of the above, X = 0, A = 0, SP = $FF, a near-perfect init!
That could be the best 8-bytes ever written