Jump to content
IGNORED

Machine cycles


texacala

Recommended Posts

Can anyone point me to a reference for calculating machine cycles for assembly language instructions? I'm using WUDSN for Eclipse and then testing code in Altirra. I'm interested in learning ballpark numbers for different instructions and getting a sense of what is fast. For example, I'm guessing an instruction like CLC is extremely fast but ADC would be slower. I'd also like to know more about addressing, i.e. how fast is simple LDA,STA versus using indexed addressing like LDA (LOCATION),Y, STA (LOCATION),X? And how much speed do I lose moving around with JSR, JMP? Thank you in advance.

Link to comment
Share on other sites

For most instructions you can predict cycle counts by the addressing mode and the basic class of instruction -- the specific operation doesn't matter.

  • Implied (no arguments) - 2 cycles
  • Branches - 2 cycles not taken, 3-4 cycles taken
  • zp load/store - 3 cycles
  • zp,X load/store - 4 cycles
  • abs - 4 cycles
  • abs,X or abs,Y load - 4 or 5 cycles
  • abs,X or abs,Y store - 5 cycles
  • zp read/modify/write - 5 cycles
  • (zp),Y load - 5 or 6 cycles
  • abs read/modify/write - 6 cycles
  • zp,X read/modify/write - 6 cycles
  • (zp),Y store - 6 cycles
  • (zp,X) - 6 cycles
  • abs,X read/modify/write - 7 cycles

 

Altirra can also show you the number of cycles code is actually taking in its History window. Right click and change the timestamp format to unhalted cycles, then set the timestamp origin to where you want cycle 0 to be.

  • Like 1
Link to comment
Share on other sites

Perfect, just the help I was looking for! That talk is great. I had wondered about indexed addressing and whether or not it was time-consuming. I am using it a lot because I find it so powerful. With JSR or JMP, does the 6502 slow down if the jump is farther away in memory? For example, does it take longer to JSR/RTS from $1000 to code located at $7000 than say from $1000 to $2000? And is it generally advisable to minimize the number of jumps when programming or does speed typically not suffer? Thanks again for the help.

Link to comment
Share on other sites

No. There is an extra cycle when crossing page boundaries to think about. Say you've got LDA $3FF0, X. If X contains $1, then A gets loaded from $3FF1. ($3FF0 + $1 = $3FF1) No big deal, that's right off the cycle count list above. 4 cycles. Let's say X contains $14. Now A gets loaded from $4004! ($3FF0 + $14 = $4004)

 

A page is 256 bytes and it's identified by the most significant byte of an address changing. Since the 6502 is an 8 bit CPU, an overflow on address add takes another cycle because that overflow has to be added to the most significant byte right after the primary addition happens on the least significant byte. In this case, going from page $3F to page $40 takes the extra cycle, for a total of 5.

 

That kind of thing aside, the "distance" between jumps isn't important. Just think of the page and that the 6502 can only work with 8 bits at a time. A JMP to $1000 isn't any different than a jump to $9000. The Program Counter still needs to get loaded with a 16 bit address. Both bytes get stuffed in there, then the 6502 carries on from that address.

 

If the address is calculated, then you've got to think about the page and that extra cycle on overflow. as mentioned above.

Link to comment
Share on other sites

JMP reloads the entire program counter, source vs destination makes no difference ie there's no time saving if it's a short hop.

 

It's only a 3 cycle instruction which is relatively fast, a taken branch that crosses a page boundary is actually slower. But JMP uses 3 bytes vs 2 for a branch and JMP is also not relocatable.

Link to comment
Share on other sites

As for minimizing the number of JMP operations, you don't want to JMP unless you need to. Like anything, excessive operations slow things down. Best case is to think about it, write it, get it working, then optimize it. You might find that a simple look up table containing a list of addresses to jump to on condition X is a lot faster than a series of compare operations and jump operations is.

 

Think ON X GOTO $1000, $2000, $3000

 

vs

 

If X = 1 GOTO $1000

If X = 2 GOTO $2000

If X = 3 GOTO $3000

 

Even though the JMP instruction for a table is slower because it's indirect indexed, using it can be way faster than several operations combined with simple absolute JMP instructions! Additionally, it can take consistent amounts of time to jump on X condition using a table, where the compares vary depending on what X is due to each comparison taking time and having to do them sequentially to reach the desired condition.

 

That's the kind of thing you want to consider. There is also code space vs code speed. Tables can speed up lots of stuff, but they take space which might be needed to actually get the program written! Always a trade off there. Where you've got a loop running all the time and a table would pay off big, generally you do it. If it's out on the periphery that isn't in the main execute path, maybe don't do it because the space costs too much.

 

Same thing goes with indexing. Generally speaking, indexing is worth it because it takes many more operations to get stuff done without it. Count 'em up, look at space vs speed and where your critical paths are and make your best call.

 

A critical path is a part of the code that gets run a lot, or that just needs to be quick when it runs. Something like a menu that starts a program or chooses an option doesn't need to be fast. Something that is writing to the disk might not need to be fast either. Playing music or animating things on the screen? Probably needs to be fast.

Link to comment
Share on other sites

Using Bcc instead of a JMP is a code size optimization, though, not a speed one.

 

One trick I've been using lately to eliminate jumps is to use multiple entry points for a procedure:

 

ComputeAddress = ComputeLineAddress.with_x
.proc ComputeLineAddress
ldx #0
with_x:
...
.endp

Link to comment
Share on other sites

Using Bcc instead of a JMP is a code size optimization, though, not a speed one.

 

Sometimes, but not always. If the branch doesn't cross a page boundary, it takes two cycles instead of three. Moreover, consider the following (since JMP is unconditional):

 

lda flag
bne destination

 

If "destination" is out of branching range and we have to use a JMP, we end up with:

 

lda flag
beq *+5
jmp destination

 

So I think the speed benefits of carefully arranging code to use branches should be obvious.

Edited by flashjazzcat
Link to comment
Share on other sites

Sometimes, but not always. If the branch doesn't cross a page boundary, it takes two cycles instead of three. Moreover, consider the following (since JMP is unconditional):

Branch takes 2 cycles if it will not take the branch and continues with next instruction, if it takes branch it's 3 cycles long and if it will cross page boundary when branching it takes 4 cycles.

Link to comment
Share on other sites

Branch takes 2 cycles if it will not take the branch and continues with next instruction, if it takes branch it's 3 cycles long and if it will cross page boundary when branching it takes 4 cycles.

 

Yes - thanks for the correction - my mistake. I'll shut up now. :)

 

EDIT: the example I gave is irrelevant, too, since we were talking about unconditional branches anyway. ;) I'm having one of those days...

 

Avery makes a good point, though: going out of my way to use branches for unconditional jumps - though concise - is wasting a cycle probably twenty-five per cent of the time (when a page boundary is crossed).

Edited by flashjazzcat
Link to comment
Share on other sites

I hate having to use JMP... I juggle code around all kind of ways to make sure I can almost always do relative branches, unless it's a "far" JMP. There's nearly always a dependable flag in the processor status register to force a branch. ;)

 

Jon don't hate the jump, it's a useful op code unless you care about relocatability, same can also be said for JSRs.

Link to comment
Share on other sites

going out of my way to use branches for unconditional jumps - though concise - is wasting a cycle probably twenty-five per cent of the time (when a page boundary is crossed).

 

Really!... i found 9 JMP statements just in this source file alone..... ;)

 

Jumps sometimes are necessary evils especially if there is no other way to move around more than 127 bytes,

 

Unless the OP requires to know the absolute fastest ways to code (perhaps for some time critical code, or for some heavy duty number crunching routine etc...), the obsession about speed is moot anyway.

 

What I truly hate is not the JMP but self-modifying code which some people are fond of using liberally, makes for some very difficult reading especially when not commented enough but that's another hate related subject :)

tprintf.asm

Edited by atari8warez
Link to comment
Share on other sites

Jon don't hate the jump, it's a useful op code unless you care about relocatability, same can also be said for JSRs.

 

Excellent... I never used either before, but they both sound handy so I think I'll start...

 

Really!... i found 9 JMP statements just in this source file alone..... ;)

 

What's the reasoning behind digging out this old code and presenting it here (still rubbing my eyes in disbelief)? Did you actually think I had banished JMP from my code? The point I had originally made was that I like to use branches because 1) they can be as fast as JMP, 2) they're capitalising on the state of a processor status register bit, and 3) they undisputably take up two thirds of the space of JMP.

 

The TPRINTF routine (written about eighteen years ago) would actually benefit from a few JMPs converted to branches, I note, since one purpose of the library was to be as compact as possible (and looking at it now, it doesn't compare well with the PRINTF routine I wrote last year). It's surprising the number of times I've had to recoup half a dozen bytes of code and changing JMPs to branches has allowed me to do just that.

 

Jumps sometimes are necessary evils especially if there is no other way to move around more than 127 bytes

 

No 6502 programmer could disagree with this...

 

Unless the OP requires to know the absolute fastest ways to code (perhaps for some time critical code, or for some heavy duty number crunching routine etc...), the obsession about speed is moot anyway.

 

In other words: unless you need the code to be fast, the speed is unimportant. Hmmm... Well, the topic is about cycle counting.

Edited by flashjazzcat
Link to comment
Share on other sites

What's the reasoning behind digging out this old code and presenting it here

 

Could it be: "going out of my way to use branches for unconditional jumps.....". I don't know what changed in 6502 coding during the last 18 years to cause a change of heart in you, as the code shows you seemed to have no problem with using JMPs liberally. :) . By the way this code is available to the public on your own website so no digging was necessary

 

 

In other words: unless you need the code to be fast, the speed is unimportant. Hmmm... Well, the topic is about cycle counting.

 

And let me see... what could be the reason for cycle counting :? ... am I missing something here?

Edited by atari8warez
Link to comment
Share on other sites

Jumps are always 3 cycles, tho...so there is a slight advantage over using unconditional branches (where page-crossing might become an issue if you aren't keeping tabs on things during development).

 

However, there is a way trim this down to 2 cycles if the code is only skipping 1 byte (such as using the CMP# opcode).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...