Machine cycles

texacala · February 10, 2013

Can anyone point me to a reference for calculating machine cycles for assembly language instructions? I'm using WUDSN for Eclipse and then testing code in Altirra. I'm interested in learning ballpark numbers for different instructions and getting a sense of what is fast. For example, I'm guessing an instruction like CLC is extremely fast but ADC would be slower. I'd also like to know more about addressing, i.e. how fast is simple LDA,STA versus using indexed addressing like LDA (LOCATION),Y, STA (LOCATION),X? And how much speed do I lose moving around with JSR, JMP? Thank you in advance.

Tezz · February 10, 2013

Hi, when I'm manually cycle counting and need to double check my memory I refer to the list of opcodes at 6502.org

After a short while coding you'll learn the machine cycles by heart.

http://www.6502.org/tutorials/6502opcodes.html

potatohead · February 10, 2013

http://www.youtube.com/watch?v=K5miMbqYB4E

This is a fun talk, and it covers cycle counting in a clear, easy to understand way.

phaeron · February 10, 2013

For most instructions you can predict cycle counts by the addressing mode and the basic class of instruction -- the specific operation doesn't matter.

Implied (no arguments) - 2 cycles
Branches - 2 cycles not taken, 3-4 cycles taken
zp load/store - 3 cycles
zp,X load/store - 4 cycles
abs - 4 cycles
abs,X or abs,Y load - 4 or 5 cycles
abs,X or abs,Y store - 5 cycles
zp read/modify/write - 5 cycles
(zp),Y load - 5 or 6 cycles
abs read/modify/write - 6 cycles
zp,X read/modify/write - 6 cycles
(zp),Y store - 6 cycles
(zp,X) - 6 cycles
abs,X read/modify/write - 7 cycles

Altirra can also show you the number of cycles code is actually taking in its History window. Right click and change the timestamp format to unhalted cycles, then set the timestamp origin to where you want cycle 0 to be.

texacala · February 11, 2013

Perfect, just the help I was looking for! That talk is great. I had wondered about indexed addressing and whether or not it was time-consuming. I am using it a lot because I find it so powerful. With JSR or JMP, does the 6502 slow down if the jump is farther away in memory? For example, does it take longer to JSR/RTS from $1000 to code located at $7000 than say from $1000 to $2000? And is it generally advisable to minimize the number of jumps when programming or does speed typically not suffer? Thanks again for the help.

potatohead · February 11, 2013

No. There is an extra cycle when crossing page boundaries to think about. Say you've got LDA $3FF0, X. If X contains $1, then A gets loaded from $3FF1. ($3FF0 + $1 = $3FF1) No big deal, that's right off the cycle count list above. 4 cycles. Let's say X contains $14. Now A gets loaded from $4004! ($3FF0 + $14 = $4004)

A page is 256 bytes and it's identified by the most significant byte of an address changing. Since the 6502 is an 8 bit CPU, an overflow on address add takes another cycle because that overflow has to be added to the most significant byte right after the primary addition happens on the least significant byte. In this case, going from page $3F to page $40 takes the extra cycle, for a total of 5.

That kind of thing aside, the "distance" between jumps isn't important. Just think of the page and that the 6502 can only work with 8 bits at a time. A JMP to $1000 isn't any different than a jump to $9000. The Program Counter still needs to get loaded with a 16 bit address. Both bytes get stuffed in there, then the 6502 carries on from that address.

If the address is calculated, then you've got to think about the page and that extra cycle on overflow. as mentioned above.

Rybags · February 11, 2013

JMP reloads the entire program counter, source vs destination makes no difference ie there's no time saving if it's a short hop.

It's only a 3 cycle instruction which is relatively fast, a taken branch that crosses a page boundary is actually slower. But JMP uses 3 bytes vs 2 for a branch and JMP is also not relocatable.

potatohead · February 11, 2013

As for minimizing the number of JMP operations, you don't want to JMP unless you need to. Like anything, excessive operations slow things down. Best case is to think about it, write it, get it working, then optimize it. You might find that a simple look up table containing a list of addresses to jump to on condition X is a lot faster than a series of compare operations and jump operations is.

Think ON X GOTO $1000, $2000, $3000

vs

If X = 1 GOTO $1000

If X = 2 GOTO $2000

If X = 3 GOTO $3000

Even though the JMP instruction for a table is slower because it's indirect indexed, using it can be way faster than several operations combined with simple absolute JMP instructions! Additionally, it can take consistent amounts of time to jump on X condition using a table, where the compares vary depending on what X is due to each comparison taking time and having to do them sequentially to reach the desired condition.

That's the kind of thing you want to consider. There is also code space vs code speed. Tables can speed up lots of stuff, but they take space which might be needed to actually get the program written! Always a trade off there. Where you've got a loop running all the time and a table would pay off big, generally you do it. If it's out on the periphery that isn't in the main execute path, maybe don't do it because the space costs too much.

Same thing goes with indexing. Generally speaking, indexing is worth it because it takes many more operations to get stuff done without it. Count 'em up, look at space vs speed and where your critical paths are and make your best call.

A critical path is a part of the code that gets run a lot, or that just needs to be quick when it runs. Something like a menu that starts a program or chooses an option doesn't need to be fast. Something that is writing to the disk might not need to be fast either. Playing music or animating things on the screen? Probably needs to be fast.

flashjazzcat · February 11, 2013

I hate having to use JMP... I juggle code around all kind of ways to make sure I can almost always do relative branches, unless it's a "far" JMP. There's nearly always a dependable flag in the processor status register to force a branch.

phaeron · February 12, 2013

Using Bcc instead of a JMP is a code size optimization, though, not a speed one.

One trick I've been using lately to eliminate jumps is to use multiple entry points for a procedure:

ComputeAddress = ComputeLineAddress.with_x
.proc ComputeLineAddress
ldx #0
with_x:
...
.endp

flashjazzcat · February 12, 2013

Using Bcc instead of a JMP is a code size optimization, though, not a speed one.

Sometimes, but not always. If the branch doesn't cross a page boundary, it takes two cycles instead of three. Moreover, consider the following (since JMP is unconditional):

lda flag
bne destination

If "destination" is out of branching range and we have to use a JMP, we end up with:

lda flag
beq *+5
jmp destination

So I think the speed benefits of carefully arranging code to use branches should be obvious.

Edited February 12, 2013 by flashjazzcat

MaPa · February 12, 2013

Sometimes, but not always. If the branch doesn't cross a page boundary, it takes two cycles instead of three. Moreover, consider the following (since JMP is unconditional):

Branch takes 2 cycles if it will not take the branch and continues with next instruction, if it takes branch it's 3 cycles long and if it will cross page boundary when branching it takes 4 cycles.

flashjazzcat · February 12, 2013

Branch takes 2 cycles if it will not take the branch and continues with next instruction, if it takes branch it's 3 cycles long and if it will cross page boundary when branching it takes 4 cycles.

Yes - thanks for the correction - my mistake. I'll shut up now.

EDIT: the example I gave is irrelevant, too, since we were talking about unconditional branches anyway. I'm having one of those days...

Avery makes a good point, though: going out of my way to use branches for unconditional jumps - though concise - is wasting a cycle probably twenty-five per cent of the time (when a page boundary is crossed).

Edited February 12, 2013 by flashjazzcat

atari8warez · February 12, 2013

I hate having to use JMP... I juggle code around all kind of ways to make sure I can almost always do relative branches, unless it's a "far" JMP. There's nearly always a dependable flag in the processor status register to force a branch.

Jon don't hate the jump, it's a useful op code unless you care about relocatability, same can also be said for JSRs.

atari8warez · February 12, 2013

going out of my way to use branches for unconditional jumps - though concise - is wasting a cycle probably twenty-five per cent of the time (when a page boundary is crossed).

Really!... i found 9 JMP statements just in this source file alone.....

Jumps sometimes are necessary evils especially if there is no other way to move around more than 127 bytes,

Unless the OP requires to know the absolute fastest ways to code (perhaps for some time critical code, or for some heavy duty number crunching routine etc...), the obsession about speed is moot anyway.

What I truly hate is not the JMP but self-modifying code which some people are fond of using liberally, makes for some very difficult reading especially when not commented enough but that's another hate related subject

tprintf.asm

Edited February 12, 2013 by atari8warez

Rybags · February 12, 2013

JMP (adr,X) was a sorely missed omission from the original as well as BRA unconditional.

Then again with all those spare opcodes, conditional JMPs would have been nice too.

flashjazzcat · February 13, 2013

Jon don't hate the jump, it's a useful op code unless you care about relocatability, same can also be said for JSRs.

Excellent... I never used either before, but they both sound handy so I think I'll start...

Really!... i found 9 JMP statements just in this source file alone.....

What's the reasoning behind digging out this old code and presenting it here (still rubbing my eyes in disbelief)? Did you actually think I had banished JMP from my code? The point I had originally made was that I like to use branches because 1) they can be as fast as JMP, 2) they're capitalising on the state of a processor status register bit, and 3) they undisputably take up two thirds of the space of JMP.

The TPRINTF routine (written about eighteen years ago) would actually benefit from a few JMPs converted to branches, I note, since one purpose of the library was to be as compact as possible (and looking at it now, it doesn't compare well with the PRINTF routine I wrote last year). It's surprising the number of times I've had to recoup half a dozen bytes of code and changing JMPs to branches has allowed me to do just that.

Jumps sometimes are necessary evils especially if there is no other way to move around more than 127 bytes

No 6502 programmer could disagree with this...

Unless the OP requires to know the absolute fastest ways to code (perhaps for some time critical code, or for some heavy duty number crunching routine etc...), the obsession about speed is moot anyway.

In other words: unless you need the code to be fast, the speed is unimportant. Hmmm... Well, the topic is about cycle counting.

Edited February 13, 2013 by flashjazzcat

atari8warez · February 13, 2013

What's the reasoning behind digging out this old code and presenting it here

Could it be: "going out of my way to use branches for unconditional jumps.....". I don't know what changed in 6502 coding during the last 18 years to cause a change of heart in you, as the code shows you seemed to have no problem with using JMPs liberally. . By the way this code is available to the public on your own website so no digging was necessary

In other words: unless you need the code to be fast, the speed is unimportant. Hmmm... Well, the topic is about cycle counting.

And let me see... what could be the reason for cycle counting ... am I missing something here?

Edited February 13, 2013 by atari8warez

atari8warez · February 13, 2013

JMP (adr,X) was a sorely missed omission from the original as well as BRA unconditional.

Then again with all those spare opcodes, conditional JMPs would have been nice too.

Yes that could have been real handy...

Nukey Shay · February 13, 2013

Jumps are always 3 cycles, tho...so there is a slight advantage over using unconditional branches (where page-crossing might become an issue if you aren't keeping tabs on things during development).

However, there is a way trim this down to 2 cycles if the code is only skipping 1 byte (such as using the CMP# opcode).

Irgendwer · February 13, 2013

What I truly hate is not the JMP but self-modifying code which some people are fond of using liberally, makes for some very difficult reading especially when not commented enough but that's another hate related subject

http://www.cc65.org/snapshot-doc/smc.html

Machine cycles

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members