EricBall's Tech Projects

supercat · April 5, 2006

I guess I don't quite see the problem with the normal straightforward method. If someone is climbing a ladder or falling or otherwise has a 'pegged' X position, the LSB should be 128 rather than zero; if that is done, I would think everything should be fine.

On the other hand, if you will require motion on no more than 127 frames every 256, there might be some slight advantages to using the 'signed-LSB' method if you take advantage of the overflow flag. The code would be:

 clc ; If you don't already know that it's clear
 lda pos_lsb
 adc vel_lsb
 sta pos_lsb
 bvc no_move
 bpl move_neg
move_pos:
 inc pos_msb
 bcc move_done  ; Carry will be clear from earlier addition
move_neg:
 dec vel_lsb   ; Note: Carry will be set here
no_move:
; Carry may be set or clear here
; If you want carry clear down below, this would be a good place to clear it.
move_done:

This code gains a little on efficiency because there's no need to test whether velocity is positive or negative. If there's no overflow, it doesn't matter; if there is an overflow, the sign of pos_lsb will be the opposite of the sign of direction.

batari · April 5, 2006

I admit that I don't fully understand the V bit (I know how it's triggered, but its utility is still somewhat of a mystery to me.)

Anyway, there was a little discussion about fixed point math here:

http://www.atariage.com/forums/index.php?showtopic=84593&hl=

My objection in the thread was that the suggested method was wasteful of RAM, but for an SC game, that's a non-issue.

I guess the prevailing idea was to do arithmetic on sets of two bytes as if they were a 16-bit signed number. As long as all values were 16-bit, (position, velocity, acceleretion) there would be no need to check signs or do anything special at all. If you choose to put the decimal between bytes ( s8.8 ), you'd get -128 to 127, which might not give sufficient range for positions. But there's no need for that - you could do a little shifting and use a s9.7 instead.

It might be a little more elegant/easy to do this way (keep in mind that I don't grok Supercat's code, so I may be wrong.)

supercat · April 5, 2006

I admit that I don't fully understand the V bit (I know how it's triggered, but its utility is still somewhat of a mystery to me.)

If you think of the values 0-255 as being wrapped around a circle, and addition as moving a point up to 127 units in the positive direction or up to 128 units in the negative direction, the V flag will be set any time the point crosses the threshhold between 127 and 128. Note that crossing the threshhold from 255 to 0 will not set the V flag.

supercat · April 5, 2006

...and addition as moving a point up to 127 units in the positive direction or up to 128 units in the negative direction

Slight addendum: if carry is clear, addition will move -128 to +127; if carry is set, it will move -127 to +128.

Another way of looking at things is that with addition, overflow occurs if the sign of both operands was the same, but the sign of the result is the opposite of both of them. With subtraction, overflow occurs if the sign of the accumulator and memory operand differed but became the same.

batari · April 5, 2006

I admit that I don't fully understand the V bit (I know how it's triggered, but its utility is still somewhat of a mystery to me.)

If you think of the values 0-255 as being wrapped around a circle, and addition as moving a point up to 127 units in the positive direction or up to 128 units in the negative direction, the V flag will be set any time the point crosses the threshhold between 127 and 128. Note that crossing the threshhold from 255 to 0 will not set the V flag.

That's a good way of looking at it. So just like the carry is a virtual 9th bit for unsigned numbers, does the V bit simply act as a virtual 9th bit for signed numbers?

supercat · April 5, 2006

That's a good way of looking at it. So just like the carry is a virtual 9th bit for unsigned numbers, does the V bit simply act as a virtual 9th bit for signed numbers?

Sort of, except that unlike the carry flag it's not designed for value propagation (btw, am I the only guy who wishes the 650x had a carry-enable flag similar in concept to the "D" flag?) Further, unlike the carry flag whose meaning is opposite for addition and subtraction, the overflow flag has the same meaning for both (if set, it means an overflow occurred).

EricBall · April 5, 2006

I guess I don't quite see the problem with the normal straightforward method. If someone is climbing a ladder or falling or otherwise has a 'pegged' X position, the LSB should be 128 rather than zero; if that is done, I would think everything should be fine.

That would be another way of handling it except that then means I need an extra CMP to determine if the sprite is on grid instead of LDA XFPOS / BNE off_grid.

On the other hand, if you will require motion on no more than 127 frames every 256, there might be some slight advantages to using the 'signed-LSB' method if you take advantage of the overflow flag. This code gains a little on efficiency because there's no need to test whether velocity is positive or negative. If there's no overflow, it doesn't matter; if there is an overflow, the sign of pos_lsb will be the opposite of the sign of direction.

That's a nice bit of code. I think I'll revamp SpaceWar! 7800 (which uses a lot of signed fractional addition) to use it rather than how I'm doing sign extension now (after I'm finished Leprechaun of course). For Leprechaun the sprites have an action rather than a velocity so I use SBC/ADC. Thus I can INC/DEC based on the overflow flag alone. (Of course, I'm doing a complete rewrite of that code, so who knows what it will look like.)

Oh, that's one problem with having a signed fractional byte rather than an unsigned fractional byte (which is the usual when treating a 16 bit (signed or unsigned) value as an 8.8 fixed point value - you can't use the carry register to easily add two 16 bit values. You'd need code to set/clear carry based on the overflow flag before adding the second byte. Not as elegant. Ummm... thinking about that more, I'm not sure that would work right. Let's just say it's not as simple when doing x.x + y.y with signed fractional bytes versus x.x + 0.y and leave it at that.

EricBall · April 5, 2006

(btw, am I the only guy who wishes the 650x had a carry-enable flag similar in concept to the "D" flag?)

I've always wished for ADD and SUB instructions (without carry) so I don't have to put in CLC/SEC for an extra byte & two! cycles. A disable carry would only be useful if you had to do carry affecting instructions between a carry affecting instruction and a carry effected instruction. Not something I've run into that often.

supercat · April 5, 2006

(btw, am I the only guy who wishes the 650x had a carry-enable flag similar in concept to the "D" flag?)

I've always wished for ADD and SUB instructions (without carry) so I don't have to put in CLC/SEC for an extra byte & two! cycles. A disable carry would only be useful if you had to do carry affecting instructions between a carry affecting instruction and a carry effected instruction. Not something I've run into that often.

My thought would be that if there were a carry-enable flag, it could in many cases be left clear except when doing multi-precision maths. Although, thinking about it, an even nicer approach (assuming the opcode space isn't available for separate instructions) might be to have "carry out" and "carry in" flags, along with instructions to set or clear the carry in flag, or copy the carry-out flag to the carry-in flag. Instructions which use carry-in would clear it.

The net effect would be to add two cycles to carry-propagating math operations, but zero cycles to other math operations (except when responding to an interrupt or other similar circumstance). The two-cycle penalty might be slightly annoying, but would be far less bad than not being able to do a carry-propagate add directly (the special cases necessary to simulate one can be quite bothersome)

vdub_bobby · April 5, 2006

That sounds...extremely complicated, all because you don't like to clear the carry before you add? :_(

There are other things I would add to the 650x before some complicated scheme so I didn't have to set/clear the carry flag. Like BRA, a corrected JMP (indirect), and indexed CPX/CPY opcodes.

Besides which, it generally isn't difficult to set up your routines so that the carry is known at all points and then compensate accordingly.

supercat · April 5, 2006

That sounds...extremely complicated, all because you don't like to clear the carry before you add?

In some loops, that can add a lot of cycles. Actually, if the INC and DEC supported accumulator mode, that would probably take care of the most common cases.

There are other things I would add to the 650x before some complicated scheme so I didn't have to set/clear the carry flag. Like BRA, a corrected JMP (indirect), and indexed CPX/CPY opcodes.

There are, I believe, 23 instructions that use absolute-mode addressing to read and/or write a byte operand (the JMP instruction sets PC, but does not itself read the target address, so it doesn't count). All the other instructions combined use fewer than 64 other opcodes.

I wonder why the 6502's designers didn't simply use the same addressing logic for all of those instructions (basically say that if the two LSBs of an opcode are not both zero, the next three bits set the addressing mode). That would seem much easier than having all sorts of special-case logic to handle instructions like BIT, CPY, and CPX.

Besides which, it generally isn't difficult to set up your routines so that the carry is known at all points and then compensate accordingly.

Usually it isn't, but sometimes an extra instruction to set or clear carry can be unavoidable, and in a tight loop that can be costly.

EricBall · April 6, 2006

I wonder why the 6502's designers didn't simply use the same addressing logic for all of those instructions (basically say that if the two LSBs of an opcode are not both zero, the next three bits set the addressing mode). That would seem much easier than having all sorts of special-case logic to handle instructions like BIT, CPY, and CPX.

Actually, there is a certain logic to the addressing modes for the different opcodes. The main ALU ops (LDA, STA, ADC, CMP, SBC,AND,OR,EOR) all have eight addressing modes. Then there's another (larger) group of opcodes which also share (a smaller number of) similar addressing modes. Then there are the other opcodes which didn't fit. But if you start playing around with the instructions, you'll see that there aren't quite enough bits to give every instruction the same number of addressing modes

That's not to say there aren't some quirks and places where the symetry is broken for no apparent reason. (Or things which seem like they might have been done more efficiently.)

supercat · April 6, 2006

Actually, there is a certain logic to the addressing modes for the different opcodes. The main ALU ops (LDA, STA, ADC, CMP, SBC,AND,OR,EOR) all have eight addressing modes. Then there's another (larger) group of opcodes which also share (a smaller number of) similar addressing modes. Then there are the other opcodes which didn't fit. But if you start playing around with the instructions, you'll see that there aren't quite enough bits to give every instruction the same number of addressing modes

Let's give 8 addressing modes to these:

LDA STA LDX STX LDY STY CMP CPX CPY (9)

LSR ROR ASL ROL INC DEC (6)

AND ORA EOR ADC SBC BIT (6)

For the fun of it, let's throw in LAX, SAX, and DCP. That gets us up to 24 instructions, 192 opcodes.

Remaining opcodes:

BRK JSR RTS RTI JMP JMPind (6)

BRA B** (9 including adding BRA)

SE*/CL* (8--including adding SEV)

PHP PLP PHA PLA PHX PHY PLX PLY (8--including four added ones)

NOP (1)

TAX TAY TXA TYA TXY TYX (6--including two added ones)

By my count that's 38, including some nice "bonus" instructions. So there would be 26 opcodes left over--room for still more goodies.

What am I missing?

Thomas Jentzsch · April 6, 2006

If you look at this opcode map, you will notice that certain opcodes only appear in certain rows and certain addressing modes always appear in certain columns (with exceptions of course).

I suppose this was done to save costs and maybe increasing decoding speed, even though it "wastes" quite a few opcodes.

EricBall · April 6, 2006

But if you start playing around with the instructions, you'll see that there aren't quite enough bits to give every instruction the same number of addressing modes

By my count that's 38, including some nice "bonus" instructions. So there would be 26 opcodes left over--room for still more goodies. What am I missing?

You're correct. I was remembering a think exercise where I fantasized* about adding additional addressing modes across the board (mostly making X & Y orthoganal) and using those addressing modes for more opcodes (plus adding opcodes like ADD & SUB). In that case I ran out of bits.

If you look at this opcode map, you will notice that certain opcodes only appear in certain rows and certain addressing modes always appear in certain columns (with exceptions of course). I suppose this was done to save costs and maybe increasing decoding speed, even though it "wastes" quite a few opcodes.

Actually, if you make a table with 00, 04, 08, 0C, 10, 14, 18, 1C across the top (addressing modes) and then group the remaining bits by the two LSBs (00, 01, 02, 03), i.e. 01,21,41,61,81,A1,C1,E1 are the ALU opcodes; you will see the logic. Really the main offenders in the whole scheme are NOP,INX,DEX,INY,DEY, and TYA. They don't seem to fall into place. In fact, all of the X & Y opcodes seem slightly misplaced.

But you're right; other than those exceptions, the opcode bytecodes were designed to simplify decoding and reduce the number of gates required to implement the 6502. The whole reason why the whole 03 block of 64 opcodes wasn't defined (leading to instructions like LAX) was it simplified decoding each block to a NAND gate and two wires instead of a full 1 in 4 demux.

* What? You don't fanasize about creating the ultimate 8-bit ISA?

EricBall · April 6, 2006

IHMO the following corrects the decoding quirks:

INX E8 -> EA (part of the INX row, same as DEX wrt DEC)

NOP EA -> B8 (which is the lost SEV instruction)

TYA B8 -> 88 (part of the STY row, same as TXA wrt STX)

DEY 88 -> C8 (one bit off DEX)

INY C8 -> E8 (one bit off the new position of INX)

This also introduces a new opcode for EB, which would probably be (A AND X) SBC # -> A -> X

I'd also add the missing addressing modes to STX, STY, CPX, CPY and BIT. I have no idea why they were excluded. (Well, maybe some additional logic would be needed to allow CPX to use the LDX addressing modes.)

Thomas Jentzsch · April 6, 2006

Really the main offenders in the whole scheme are NOP,INX,DEX,INY,DEY, and TYA. They don't seem to fall into place. In fact, all of the X & Y opcodes seem slightly misplaced.

I wonder if those NOPs are in fact opcodes like ORA A, AND A, TXX etc. which just don't affect the flags. :_(

vdub_bobby · April 6, 2006

Besides which, it generally isn't difficult to set up your routines so that the carry is known at all points and then compensate accordingly.

Usually it isn't, but sometimes an extra instruction to set or clear carry can be unavoidable, and in a tight loop that can be costly.

I know...I know. :_(

But you could apply the same logic to many, many situations - for long loops, the restriction on branching range can add many cycles to each loop:

Instead of:

   dex
  bpl LoopStartWayBackWhen

You have to do this:

   dex
  bmi LoopOver
  jmp LoopStartWayBackWhen
LoopOver

Which adds 2 cycles to every loop. So it would be awful nice to have branch instructions that were unidirectional with a full page's worth of range.

There's always something...I suppose it's fun to speculate and dream. :_(

batari · April 6, 2006

But you're right; other than those exceptions, the opcode bytecodes were designed to simplify decoding and reduce the number of gates required to implement the 6502. The whole reason why the whole 03 block of 64 opcodes wasn't defined (leading to instructions like LAX) was it simplified decoding each block to a NAND gate and two wires instead of a full 1 in 4 demux.

Funny that these are called "don't care" states. Obviously we care...

Regarding the 6502, things sure are clear in hindsight. But even such, it's hard to see how anyone saw real utility for (ZP,X) addressing. I'd like to see the logic diagram for the 6502, so I could see how much logic was wasted on this.

vdub_bobby · April 6, 2006

Regarding the 6502, things sure are clear in hindsight. But even such, it's hard to see how anyone saw real utility for (ZP,X) addressing. I'd like to see the logic diagram for the 6502, so I could see how much logic was wasted on this.

Interestingly enough, just yesterday I wrote a routine that makes extensive use of (ZP,X) addressing. I'll probably post it soon, as soon as I test it (I wrote it on paper).

batari · April 6, 2006

Regarding the 6502, things sure are clear in hindsight. But even such, it's hard to see how anyone saw real utility for (ZP,X) addressing. I'd like to see the logic diagram for the 6502, so I could see how much logic was wasted on this.

Interestingly enough, just yesterday I wrote a routine that makes extensive use of (ZP,X) addressing. I'll probably post it soon, as soon as I test it (I wrote it on paper).

Does it rely on the fact that the 6502's stack is mirrored in zeropage on the 2600? Because the 6502's design actually places the stack at $100-$1FF, which makes (ZP,X) that much less useful on any other 6502 machine.

vdub_bobby · April 6, 2006

Does it rely on the fact that the 6502's stack is mirrored in zeropage on the 2600? Because the 6502's design actually places the stack at $100-$1FF, which makes (ZP,X) that much less useful on any other 6502 machine.

Nope, it doesn't.

EDIT: The use for (ZP,X) is when you want to double indexing - something to this effect:

   lda (Ptr,X),Y

Obviously, you can't do that (hey, there's another idea for a new addressing mode), so your two options are this:

   lda PtrList,X
  sta MiscPtr
  lda PtrList+1,X
  sta MiscPtr+1
  lda (MiscPtr),Y

Or you can ditch Y altogether (instead of updating an index, you update the pointer directly) and then just do this:

   lda (PtrList,X)

This works best when you only want to pull one byte of data; pulling 2+ bytes becomes extremely unwieldy:

   lda PtrList,X
  sta MiscPtr
  lda PtrList+1,X
  sta MiscPtr+1
  lda (MiscPtr),Y
;--do something with the value
  iny
  lda (MiscPtr),Y

Versus:

   lda (PtrList,X)
;--do something with the value
  lda PtrList,X
  clc
  adc #1
  sta PtrList,X
  lda PtrList+1,X
  adc #0
  sta PtrList+1,X
  lda (PtrList,X)

It just so happens that I am rewriting my music driver to only use 1 byte per note, and I wanted to index into AUDxx registers to keep ROM-usage down - so I have many instances of double indexing and, since I am only reading 1 byte at a time, this is easier than copying the pointers to a temp location. Especially since the value I read often needs to be used to index into a lookup table itself, so what I really need is a double-indexed addressing mode plus another index (Z?) register. :_(

batari · April 6, 2006

I see... makes sense.

BTW:

Versus:

   lda (PtrList,X)
;--do something with the value
  lda PtrList,X
  clc
  adc #1
  sta PtrList,X
  lda PtrList+1,X
  adc #0
  sta PtrList+1,X
  lda (PtrList,X)

How about:

  inc PtrList,X
  bne .1
  inc PtrList+1,X   
.1
  lda (PtrList,X)

EricBall · April 6, 2006

Funny that these are called "don't care" states. Obviously we care...

No, not "don't care" - undefined. Well, I guess they are "don't care"; the ex-6800 engineering team who created the instruction set and schematics (which were then were layed out by hand; and even more amazingly, successfully the first time) didn't care what those opcodes did. They simply tried to use the smallest number of gates to achieve the list of features Chuck Peddle specified.

it's hard to see how anyone saw real utility for (ZP,X) addressing.

Yes, (ZP,X) isn't frequently used on the 2600. But imagine that you have an array of pointers to lists, i.e. *strptr[X]. Must more useful when you have more RAM and ROM space. Think of the Apple ][ and manipulating an array of strings or multiple pseudo stack pointers.

vdub_bobby · April 6, 2006

:_(

I've often lamented the fact that increment/decrement opcodes don't set the carry flag; usually I am decrementing and nothing tells you if it goes from 0->255. Never occurred to me that going the other way sets the zero flag! :_(

So, thanks!

36 Comments

Recommended Comments

supercat 125

Link to comment

batari 4,516

Link to comment

supercat 125

Link to comment

supercat 125

Link to comment

batari 4,516

Link to comment

supercat 125

Link to comment

EricBall 239

Link to comment

EricBall 239

Link to comment

supercat 125

Link to comment

vdub_bobby 226

Link to comment

supercat 125

Link to comment

EricBall 239

Link to comment

supercat 125

Link to comment

Thomas Jentzsch 10,823

Link to comment

EricBall 239

Link to comment

EricBall 239

Link to comment

Thomas Jentzsch 10,823

Link to comment

vdub_bobby 226

Link to comment

batari 4,516

Link to comment

vdub_bobby 226

Link to comment

batari 4,516

Link to comment

vdub_bobby 226

Link to comment

batari 4,516

Link to comment

EricBall 239

Link to comment

vdub_bobby 226

Link to comment

Recently Browsing 0 members

Apps

My Activity Streams

More