Jump to content
  • entries
    334
  • comments
    900
  • views
    258,197

Fractional positioning


EricBall

1,521 views

I just had an interesting thought. Typically with fractional positioning the fractional byte is treated as an unsigned value/256. So $10.10 = 16+16/256.

 

For Leprechaun I see that causing two problems. First when the sprite moves left, it will move one pixel left on the first frame (3+n/256), but it will take multiple frame to move right one pixel. Second, my collision detection only checks the integer bounding boxes, which is okay from a sprite perspective, but is a little sloppy numerically.

 

But what if that fractional byte is instead treated as a signed value? So that first move left would be 4-a/b and the sprite wouldn't move on the first frame. The integer bounding box would be n-a/b to n+a/b, thus a more accurate representation of what is shown onscreen.

 

Okay, but how would it work? Simple addition an subtraction would work normally, the trick is in the transition from one integer value to the next.

 

My first idea as to consider the positive and negative values to be overlapping, e.g. when the fractional value overflowed from +127 to -128 I'd increment the integer byte and subtract 128 from the fractional byte. But that doesn't work that well going the other direction. Hmmm

 

Maybe just look at the fractional byte as a continum, almost list the unsigned version, but shifted left. So instead of n+0 - n+255/256 you have n-128/256 through n+127/256. The transistion is still at the overflow from positive to negative (or vice versa) but I don't adjust the fractional byte. Sounds very possible, though I need to check the oVerflow bit behaviour with ADC/SBC to make sure it works the way I need it.

36 Comments


Recommended Comments



I guess I don't quite see the problem with the normal straightforward method. If someone is climbing a ladder or falling or otherwise has a 'pegged' X position, the LSB should be 128 rather than zero; if that is done, I would think everything should be fine.

 

On the other hand, if you will require motion on no more than 127 frames every 256, there might be some slight advantages to using the 'signed-LSB' method if you take advantage of the overflow flag. The code would be:

 clc ; If you don't already know that it's clear
 lda pos_lsb
 adc vel_lsb
 sta pos_lsb
 bvc no_move
 bpl move_neg
move_pos:
 inc pos_msb
 bcc move_done  ; Carry will be clear from earlier addition
move_neg:
 dec vel_lsb   ; Note: Carry will be set here
no_move:
; Carry may be set or clear here
; If you want carry clear down below, this would be a good place to clear it.
move_done:

This code gains a little on efficiency because there's no need to test whether velocity is positive or negative. If there's no overflow, it doesn't matter; if there is an overflow, the sign of pos_lsb will be the opposite of the sign of direction.

Link to comment

I admit that I don't fully understand the V bit (I know how it's triggered, but its utility is still somewhat of a mystery to me.)

 

Anyway, there was a little discussion about fixed point math here:

http://www.atariage.com/forums/index.php?showtopic=84593&hl=

 

My objection in the thread was that the suggested method was wasteful of RAM, but for an SC game, that's a non-issue.

 

I guess the prevailing idea was to do arithmetic on sets of two bytes as if they were a 16-bit signed number. As long as all values were 16-bit, (position, velocity, acceleretion) there would be no need to check signs or do anything special at all. If you choose to put the decimal between bytes ( s8.8 ), you'd get -128 to 127, which might not give sufficient range for positions. But there's no need for that - you could do a little shifting and use a s9.7 instead.

 

It might be a little more elegant/easy to do this way (keep in mind that I don't grok Supercat's code, so I may be wrong.)

Link to comment
I admit that I don't fully understand the V bit (I know how it's triggered, but its utility is still somewhat of a mystery to me.)

 

If you think of the values 0-255 as being wrapped around a circle, and addition as moving a point up to 127 units in the positive direction or up to 128 units in the negative direction, the V flag will be set any time the point crosses the threshhold between 127 and 128. Note that crossing the threshhold from 255 to 0 will not set the V flag.

Link to comment
...and addition as moving a point up to 127 units in the positive direction or up to 128 units in the negative direction

 

Slight addendum: if carry is clear, addition will move -128 to +127; if carry is set, it will move -127 to +128.

 

Another way of looking at things is that with addition, overflow occurs if the sign of both operands was the same, but the sign of the result is the opposite of both of them. With subtraction, overflow occurs if the sign of the accumulator and memory operand differed but became the same.

Link to comment
I admit that I don't fully understand the V bit (I know how it's triggered, but its utility is still somewhat of a mystery to me.)

 

If you think of the values 0-255 as being wrapped around a circle, and addition as moving a point up to 127 units in the positive direction or up to 128 units in the negative direction, the V flag will be set any time the point crosses the threshhold between 127 and 128. Note that crossing the threshhold from 255 to 0 will not set the V flag.

That's a good way of looking at it. So just like the carry is a virtual 9th bit for unsigned numbers, does the V bit simply act as a virtual 9th bit for signed numbers?

Link to comment
That's a good way of looking at it. So just like the carry is a virtual 9th bit for unsigned numbers, does the V bit simply act as a virtual 9th bit for signed numbers?

 

Sort of, except that unlike the carry flag it's not designed for value propagation (btw, am I the only guy who wishes the 650x had a carry-enable flag similar in concept to the "D" flag?) Further, unlike the carry flag whose meaning is opposite for addition and subtraction, the overflow flag has the same meaning for both (if set, it means an overflow occurred).

Link to comment
I guess I don't quite see the problem with the normal straightforward method. If someone is climbing a ladder or falling or otherwise has a 'pegged' X position, the LSB should be 128 rather than zero; if that is done, I would think everything should be fine.

That would be another way of handling it except that then means I need an extra CMP to determine if the sprite is on grid instead of LDA XFPOS / BNE off_grid.

 

On the other hand, if you will require motion on no more than 127 frames every 256, there might be some slight advantages to using the 'signed-LSB' method if you take advantage of the overflow flag. This code gains a little on efficiency because there's no need to test whether velocity is positive or negative. If there's no overflow, it doesn't matter; if there is an overflow, the sign of pos_lsb will be the opposite of the sign of direction.

That's a nice bit of code. I think I'll revamp SpaceWar! 7800 (which uses a lot of signed fractional addition) to use it rather than how I'm doing sign extension now (after I'm finished Leprechaun of course). For Leprechaun the sprites have an action rather than a velocity so I use SBC/ADC. Thus I can INC/DEC based on the overflow flag alone. (Of course, I'm doing a complete rewrite of that code, so who knows what it will look like.)

 

Oh, that's one problem with having a signed fractional byte rather than an unsigned fractional byte (which is the usual when treating a 16 bit (signed or unsigned) value as an 8.8 fixed point value - you can't use the carry register to easily add two 16 bit values. You'd need code to set/clear carry based on the overflow flag before adding the second byte. Not as elegant. Ummm... thinking about that more, I'm not sure that would work right. Let's just say it's not as simple when doing x.x + y.y with signed fractional bytes versus x.x + 0.y and leave it at that.

Link to comment
(btw, am I the only guy who wishes the 650x had a carry-enable flag similar in concept to the "D" flag?)

I've always wished for ADD and SUB instructions (without carry) so I don't have to put in CLC/SEC for an extra byte & two! cycles. A disable carry would only be useful if you had to do carry affecting instructions between a carry affecting instruction and a carry effected instruction. Not something I've run into that often.

Link to comment
(btw, am I the only guy who wishes the 650x had a carry-enable flag similar in concept to the "D" flag?)

I've always wished for ADD and SUB instructions (without carry) so I don't have to put in CLC/SEC for an extra byte & two! cycles. A disable carry would only be useful if you had to do carry affecting instructions between a carry affecting instruction and a carry effected instruction. Not something I've run into that often.

 

My thought would be that if there were a carry-enable flag, it could in many cases be left clear except when doing multi-precision maths. Although, thinking about it, an even nicer approach (assuming the opcode space isn't available for separate instructions) might be to have "carry out" and "carry in" flags, along with instructions to set or clear the carry in flag, or copy the carry-out flag to the carry-in flag. Instructions which use carry-in would clear it.

 

The net effect would be to add two cycles to carry-propagating math operations, but zero cycles to other math operations (except when responding to an interrupt or other similar circumstance). The two-cycle penalty might be slightly annoying, but would be far less bad than not being able to do a carry-propagate add directly (the special cases necessary to simulate one can be quite bothersome)

Link to comment

That sounds...extremely complicated, all because you don't like to clear the carry before you add? :_(

 

There are other things I would add to the 650x before some complicated scheme so I didn't have to set/clear the carry flag. Like BRA, a corrected JMP (indirect), and indexed CPX/CPY opcodes.

 

Besides which, it generally isn't difficult to set up your routines so that the carry is known at all points and then compensate accordingly.

Link to comment
That sounds...extremely complicated, all because you don't like to clear the carry before you add? :_(

 

In some loops, that can add a lot of cycles. Actually, if the INC and DEC supported accumulator mode, that would probably take care of the most common cases.

 

There are other things I would add to the 650x before some complicated scheme so I didn't have to set/clear the carry flag. Like BRA, a corrected JMP (indirect), and indexed CPX/CPY opcodes.
There are, I believe, 23 instructions that use absolute-mode addressing to read and/or write a byte operand (the JMP instruction sets PC, but does not itself read the target address, so it doesn't count). All the other instructions combined use fewer than 64 other opcodes.

 

I wonder why the 6502's designers didn't simply use the same addressing logic for all of those instructions (basically say that if the two LSBs of an opcode are not both zero, the next three bits set the addressing mode). That would seem much easier than having all sorts of special-case logic to handle instructions like BIT, CPY, and CPX.

 

Besides which, it generally isn't difficult to set up your routines so that the carry is known at all points and then compensate accordingly.

 

Usually it isn't, but sometimes an extra instruction to set or clear carry can be unavoidable, and in a tight loop that can be costly.

Link to comment
I wonder why the 6502's designers didn't simply use the same addressing logic for all of those instructions (basically say that if the two LSBs of an opcode are not both zero, the next three bits set the addressing mode). That would seem much easier than having all sorts of special-case logic to handle instructions like BIT, CPY, and CPX.

Actually, there is a certain logic to the addressing modes for the different opcodes. The main ALU ops (LDA, STA, ADC, CMP, SBC,AND,OR,EOR) all have eight addressing modes. Then there's another (larger) group of opcodes which also share (a smaller number of) similar addressing modes. Then there are the other opcodes which didn't fit. But if you start playing around with the instructions, you'll see that there aren't quite enough bits to give every instruction the same number of addressing modes

 

That's not to say there aren't some quirks and places where the symetry is broken for no apparent reason. (Or things which seem like they might have been done more efficiently.)

Link to comment
Actually, there is a certain logic to the addressing modes for the different opcodes. The main ALU ops (LDA, STA, ADC, CMP, SBC,AND,OR,EOR) all have eight addressing modes. Then there's another (larger) group of opcodes which also share (a smaller number of) similar addressing modes. Then there are the other opcodes which didn't fit. But if you start playing around with the instructions, you'll see that there aren't quite enough bits to give every instruction the same number of addressing modes

 

Let's give 8 addressing modes to these:

LDA STA LDX STX LDY STY CMP CPX CPY (9)

LSR ROR ASL ROL INC DEC (6)

AND ORA EOR ADC SBC BIT (6)

For the fun of it, let's throw in LAX, SAX, and DCP. That gets us up to 24 instructions, 192 opcodes.

 

Remaining opcodes:

BRK JSR RTS RTI JMP JMPind (6)

BRA B** (9 including adding BRA)

SE*/CL* (8--including adding SEV)

PHP PLP PHA PLA PHX PHY PLX PLY (8--including four added ones)

NOP (1)

TAX TAY TXA TYA TXY TYX (6--including two added ones)

 

By my count that's 38, including some nice "bonus" instructions. So there would be 26 opcodes left over--room for still more goodies.

 

What am I missing?

Link to comment

If you look at this opcode map, you will notice that certain opcodes only appear in certain rows and certain addressing modes always appear in certain columns (with exceptions of course).

 

I suppose this was done to save costs and maybe increasing decoding speed, even though it "wastes" quite a few opcodes.

Link to comment
But if you start playing around with the instructions, you'll see that there aren't quite enough bits to give every instruction the same number of addressing modes
By my count that's 38, including some nice "bonus" instructions. So there would be 26 opcodes left over--room for still more goodies. What am I missing?

You're correct. I was remembering a think exercise where I fantasized* about adding additional addressing modes across the board (mostly making X & Y orthoganal) and using those addressing modes for more opcodes (plus adding opcodes like ADD & SUB). In that case I ran out of bits.

 

If you look at this opcode map, you will notice that certain opcodes only appear in certain rows and certain addressing modes always appear in certain columns (with exceptions of course). I suppose this was done to save costs and maybe increasing decoding speed, even though it "wastes" quite a few opcodes.

Actually, if you make a table with 00, 04, 08, 0C, 10, 14, 18, 1C across the top (addressing modes) and then group the remaining bits by the two LSBs (00, 01, 02, 03), i.e. 01,21,41,61,81,A1,C1,E1 are the ALU opcodes; you will see the logic. Really the main offenders in the whole scheme are NOP,INX,DEX,INY,DEY, and TYA. They don't seem to fall into place. In fact, all of the X & Y opcodes seem slightly misplaced.

 

But you're right; other than those exceptions, the opcode bytecodes were designed to simplify decoding and reduce the number of gates required to implement the 6502. The whole reason why the whole 03 block of 64 opcodes wasn't defined (leading to instructions like LAX) was it simplified decoding each block to a NAND gate and two wires instead of a full 1 in 4 demux.

 

 

* What? You don't fanasize about creating the ultimate 8-bit ISA?

Link to comment

IHMO the following corrects the decoding quirks:

 

INX E8 -> EA (part of the INX row, same as DEX wrt DEC)

NOP EA -> B8 (which is the lost SEV instruction)

TYA B8 -> 88 (part of the STY row, same as TXA wrt STX)

DEY 88 -> C8 (one bit off DEX)

INY C8 -> E8 (one bit off the new position of INX)

 

This also introduces a new opcode for EB, which would probably be (A AND X) SBC # -> A -> X

 

I'd also add the missing addressing modes to STX, STY, CPX, CPY and BIT. I have no idea why they were excluded. (Well, maybe some additional logic would be needed to allow CPX to use the LDX addressing modes.)

Link to comment
Really the main offenders in the whole scheme are NOP,INX,DEX,INY,DEY, and TYA. They don't seem to fall into place. In fact, all of the X & Y opcodes seem slightly misplaced.

I wonder if those NOPs are in fact opcodes like ORA A, AND A, TXX etc. which just don't affect the flags. :_(

Link to comment
Besides which, it generally isn't difficult to set up your routines so that the carry is known at all points and then compensate accordingly.

 

Usually it isn't, but sometimes an extra instruction to set or clear carry can be unavoidable, and in a tight loop that can be costly.

I know...I know. :_(

 

But you could apply the same logic to many, many situations - for long loops, the restriction on branching range can add many cycles to each loop:

Instead of:

   dex
  bpl LoopStartWayBackWhen

You have to do this:

   dex
  bmi LoopOver
  jmp LoopStartWayBackWhen
LoopOver

Which adds 2 cycles to every loop. So it would be awful nice to have branch instructions that were unidirectional with a full page's worth of range.

There's always something...I suppose it's fun to speculate and dream. :_(

Link to comment
But you're right; other than those exceptions, the opcode bytecodes were designed to simplify decoding and reduce the number of gates required to implement the 6502. The whole reason why the whole 03 block of 64 opcodes wasn't defined (leading to instructions like LAX) was it simplified decoding each block to a NAND gate and two wires instead of a full 1 in 4 demux.

Funny that these are called "don't care" states. Obviously we care...

 

Regarding the 6502, things sure are clear in hindsight. But even such, it's hard to see how anyone saw real utility for (ZP,X) addressing. I'd like to see the logic diagram for the 6502, so I could see how much logic was wasted on this.

Link to comment
Regarding the 6502, things sure are clear in hindsight. But even such, it's hard to see how anyone saw real utility for (ZP,X) addressing. I'd like to see the logic diagram for the 6502, so I could see how much logic was wasted on this.

Interestingly enough, just yesterday I wrote a routine that makes extensive use of (ZP,X) addressing. I'll probably post it soon, as soon as I test it (I wrote it on paper).

Link to comment
Regarding the 6502, things sure are clear in hindsight. But even such, it's hard to see how anyone saw real utility for (ZP,X) addressing. I'd like to see the logic diagram for the 6502, so I could see how much logic was wasted on this.

Interestingly enough, just yesterday I wrote a routine that makes extensive use of (ZP,X) addressing. I'll probably post it soon, as soon as I test it (I wrote it on paper).

Does it rely on the fact that the 6502's stack is mirrored in zeropage on the 2600? Because the 6502's design actually places the stack at $100-$1FF, which makes (ZP,X) that much less useful on any other 6502 machine.

Link to comment
Does it rely on the fact that the 6502's stack is mirrored in zeropage on the 2600? Because the 6502's design actually places the stack at $100-$1FF, which makes (ZP,X) that much less useful on any other 6502 machine.

Nope, it doesn't.

 

EDIT: The use for (ZP,X) is when you want to double indexing - something to this effect:

   lda (Ptr,X),Y

Obviously, you can't do that (hey, there's another idea for a new addressing mode), so your two options are this:

   lda PtrList,X
  sta MiscPtr
  lda PtrList+1,X
  sta MiscPtr+1
  lda (MiscPtr),Y

Or you can ditch Y altogether (instead of updating an index, you update the pointer directly) and then just do this:

   lda (PtrList,X)

This works best when you only want to pull one byte of data; pulling 2+ bytes becomes extremely unwieldy:

   lda PtrList,X
  sta MiscPtr
  lda PtrList+1,X
  sta MiscPtr+1
  lda (MiscPtr),Y
;--do something with the value
  iny
  lda (MiscPtr),Y

Versus:

   lda (PtrList,X)
;--do something with the value
  lda PtrList,X
  clc
  adc #1
  sta PtrList,X
  lda PtrList+1,X
  adc #0
  sta PtrList+1,X
  lda (PtrList,X)

It just so happens that I am rewriting my music driver to only use 1 byte per note, and I wanted to index into AUDxx registers to keep ROM-usage down - so I have many instances of double indexing and, since I am only reading 1 byte at a time, this is easier than copying the pointers to a temp location. Especially since the value I read often needs to be used to index into a lookup table itself, so what I really need is a double-indexed addressing mode plus another index (Z?) register. :_(

Link to comment

I see... makes sense.

 

BTW:

Versus:

   lda (PtrList,X)
;--do something with the value
  lda PtrList,X
  clc
  adc #1
  sta PtrList,X
  lda PtrList+1,X
  adc #0
  sta PtrList+1,X
  lda (PtrList,X)

 

How about:

 

  inc PtrList,X
  bne .1
  inc PtrList+1,X   
.1
  lda (PtrList,X)

Link to comment
Funny that these are called "don't care" states. Obviously we care...

No, not "don't care" - undefined. Well, I guess they are "don't care"; the ex-6800 engineering team who created the instruction set and schematics (which were then were layed out by hand; and even more amazingly, successfully the first time) didn't care what those opcodes did. They simply tried to use the smallest number of gates to achieve the list of features Chuck Peddle specified.

 

it's hard to see how anyone saw real utility for (ZP,X) addressing.

Yes, (ZP,X) isn't frequently used on the 2600. But imagine that you have an array of pointers to lists, i.e. *strptr[X]. Must more useful when you have more RAM and ROM space. Think of the Apple ][ and manipulating an array of strings or multiple pseudo stack pointers.

Link to comment

:_(

 

I've often lamented the fact that increment/decrement opcodes don't set the carry flag; usually I am decrementing and nothing tells you if it goes from 0->255. Never occurred to me that going the other way sets the zero flag! :_(

 

So, thanks!

Link to comment

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...