texacala Posted February 10, 2013 Share Posted February 10, 2013 Can anyone point me to a reference for calculating machine cycles for assembly language instructions? I'm using WUDSN for Eclipse and then testing code in Altirra. I'm interested in learning ballpark numbers for different instructions and getting a sense of what is fast. For example, I'm guessing an instruction like CLC is extremely fast but ADC would be slower. I'd also like to know more about addressing, i.e. how fast is simple LDA,STA versus using indexed addressing like LDA (LOCATION),Y, STA (LOCATION),X? And how much speed do I lose moving around with JSR, JMP? Thank you in advance. Quote Link to comment Share on other sites More sharing options...
Tezz Posted February 10, 2013 Share Posted February 10, 2013 Hi, when I'm manually cycle counting and need to double check my memory I refer to the list of opcodes at 6502.org After a short while coding you'll learn the machine cycles by heart. http://www.6502.org/tutorials/6502opcodes.html Quote Link to comment Share on other sites More sharing options...
potatohead Posted February 10, 2013 Share Posted February 10, 2013 http://www.youtube.com/watch?v=K5miMbqYB4E This is a fun talk, and it covers cycle counting in a clear, easy to understand way. 1 Quote Link to comment Share on other sites More sharing options...
phaeron Posted February 10, 2013 Share Posted February 10, 2013 For most instructions you can predict cycle counts by the addressing mode and the basic class of instruction -- the specific operation doesn't matter. Implied (no arguments) - 2 cycles Branches - 2 cycles not taken, 3-4 cycles taken zp load/store - 3 cycles zp,X load/store - 4 cycles abs - 4 cycles abs,X or abs,Y load - 4 or 5 cycles abs,X or abs,Y store - 5 cycles zp read/modify/write - 5 cycles (zp),Y load - 5 or 6 cycles abs read/modify/write - 6 cycles zp,X read/modify/write - 6 cycles (zp),Y store - 6 cycles (zp,X) - 6 cycles abs,X read/modify/write - 7 cycles Altirra can also show you the number of cycles code is actually taking in its History window. Right click and change the timestamp format to unhalted cycles, then set the timestamp origin to where you want cycle 0 to be. 1 Quote Link to comment Share on other sites More sharing options...
texacala Posted February 11, 2013 Author Share Posted February 11, 2013 Perfect, just the help I was looking for! That talk is great. I had wondered about indexed addressing and whether or not it was time-consuming. I am using it a lot because I find it so powerful. With JSR or JMP, does the 6502 slow down if the jump is farther away in memory? For example, does it take longer to JSR/RTS from $1000 to code located at $7000 than say from $1000 to $2000? And is it generally advisable to minimize the number of jumps when programming or does speed typically not suffer? Thanks again for the help. Quote Link to comment Share on other sites More sharing options...
potatohead Posted February 11, 2013 Share Posted February 11, 2013 No. There is an extra cycle when crossing page boundaries to think about. Say you've got LDA $3FF0, X. If X contains $1, then A gets loaded from $3FF1. ($3FF0 + $1 = $3FF1) No big deal, that's right off the cycle count list above. 4 cycles. Let's say X contains $14. Now A gets loaded from $4004! ($3FF0 + $14 = $4004) A page is 256 bytes and it's identified by the most significant byte of an address changing. Since the 6502 is an 8 bit CPU, an overflow on address add takes another cycle because that overflow has to be added to the most significant byte right after the primary addition happens on the least significant byte. In this case, going from page $3F to page $40 takes the extra cycle, for a total of 5. That kind of thing aside, the "distance" between jumps isn't important. Just think of the page and that the 6502 can only work with 8 bits at a time. A JMP to $1000 isn't any different than a jump to $9000. The Program Counter still needs to get loaded with a 16 bit address. Both bytes get stuffed in there, then the 6502 carries on from that address. If the address is calculated, then you've got to think about the page and that extra cycle on overflow. as mentioned above. Quote Link to comment Share on other sites More sharing options...
Rybags Posted February 11, 2013 Share Posted February 11, 2013 JMP reloads the entire program counter, source vs destination makes no difference ie there's no time saving if it's a short hop. It's only a 3 cycle instruction which is relatively fast, a taken branch that crosses a page boundary is actually slower. But JMP uses 3 bytes vs 2 for a branch and JMP is also not relocatable. Quote Link to comment Share on other sites More sharing options...
potatohead Posted February 11, 2013 Share Posted February 11, 2013 As for minimizing the number of JMP operations, you don't want to JMP unless you need to. Like anything, excessive operations slow things down. Best case is to think about it, write it, get it working, then optimize it. You might find that a simple look up table containing a list of addresses to jump to on condition X is a lot faster than a series of compare operations and jump operations is. Think ON X GOTO $1000, $2000, $3000 vs If X = 1 GOTO $1000 If X = 2 GOTO $2000 If X = 3 GOTO $3000 Even though the JMP instruction for a table is slower because it's indirect indexed, using it can be way faster than several operations combined with simple absolute JMP instructions! Additionally, it can take consistent amounts of time to jump on X condition using a table, where the compares vary depending on what X is due to each comparison taking time and having to do them sequentially to reach the desired condition. That's the kind of thing you want to consider. There is also code space vs code speed. Tables can speed up lots of stuff, but they take space which might be needed to actually get the program written! Always a trade off there. Where you've got a loop running all the time and a table would pay off big, generally you do it. If it's out on the periphery that isn't in the main execute path, maybe don't do it because the space costs too much. Same thing goes with indexing. Generally speaking, indexing is worth it because it takes many more operations to get stuff done without it. Count 'em up, look at space vs speed and where your critical paths are and make your best call. A critical path is a part of the code that gets run a lot, or that just needs to be quick when it runs. Something like a menu that starts a program or chooses an option doesn't need to be fast. Something that is writing to the disk might not need to be fast either. Playing music or animating things on the screen? Probably needs to be fast. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted February 11, 2013 Share Posted February 11, 2013 I hate having to use JMP... I juggle code around all kind of ways to make sure I can almost always do relative branches, unless it's a "far" JMP. There's nearly always a dependable flag in the processor status register to force a branch. 1 Quote Link to comment Share on other sites More sharing options...
phaeron Posted February 12, 2013 Share Posted February 12, 2013 Using Bcc instead of a JMP is a code size optimization, though, not a speed one. One trick I've been using lately to eliminate jumps is to use multiple entry points for a procedure: ComputeAddress = ComputeLineAddress.with_x .proc ComputeLineAddress ldx #0 with_x: ... .endp Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted February 12, 2013 Share Posted February 12, 2013 (edited) Using Bcc instead of a JMP is a code size optimization, though, not a speed one. Sometimes, but not always. If the branch doesn't cross a page boundary, it takes two cycles instead of three. Moreover, consider the following (since JMP is unconditional): lda flag bne destination If "destination" is out of branching range and we have to use a JMP, we end up with: lda flag beq *+5 jmp destination So I think the speed benefits of carefully arranging code to use branches should be obvious. Edited February 12, 2013 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
MaPa Posted February 12, 2013 Share Posted February 12, 2013 Sometimes, but not always. If the branch doesn't cross a page boundary, it takes two cycles instead of three. Moreover, consider the following (since JMP is unconditional): Branch takes 2 cycles if it will not take the branch and continues with next instruction, if it takes branch it's 3 cycles long and if it will cross page boundary when branching it takes 4 cycles. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted February 12, 2013 Share Posted February 12, 2013 (edited) Branch takes 2 cycles if it will not take the branch and continues with next instruction, if it takes branch it's 3 cycles long and if it will cross page boundary when branching it takes 4 cycles. Yes - thanks for the correction - my mistake. I'll shut up now. EDIT: the example I gave is irrelevant, too, since we were talking about unconditional branches anyway. I'm having one of those days... Avery makes a good point, though: going out of my way to use branches for unconditional jumps - though concise - is wasting a cycle probably twenty-five per cent of the time (when a page boundary is crossed). Edited February 12, 2013 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
atari8warez Posted February 12, 2013 Share Posted February 12, 2013 I hate having to use JMP... I juggle code around all kind of ways to make sure I can almost always do relative branches, unless it's a "far" JMP. There's nearly always a dependable flag in the processor status register to force a branch. Jon don't hate the jump, it's a useful op code unless you care about relocatability, same can also be said for JSRs. Quote Link to comment Share on other sites More sharing options...
atari8warez Posted February 12, 2013 Share Posted February 12, 2013 (edited) going out of my way to use branches for unconditional jumps - though concise - is wasting a cycle probably twenty-five per cent of the time (when a page boundary is crossed). Really!... i found 9 JMP statements just in this source file alone..... Jumps sometimes are necessary evils especially if there is no other way to move around more than 127 bytes, Unless the OP requires to know the absolute fastest ways to code (perhaps for some time critical code, or for some heavy duty number crunching routine etc...), the obsession about speed is moot anyway. What I truly hate is not the JMP but self-modifying code which some people are fond of using liberally, makes for some very difficult reading especially when not commented enough but that's another hate related subject tprintf.asm Edited February 12, 2013 by atari8warez Quote Link to comment Share on other sites More sharing options...
Rybags Posted February 12, 2013 Share Posted February 12, 2013 JMP (adr,X) was a sorely missed omission from the original as well as BRA unconditional. Then again with all those spare opcodes, conditional JMPs would have been nice too. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted February 13, 2013 Share Posted February 13, 2013 (edited) Jon don't hate the jump, it's a useful op code unless you care about relocatability, same can also be said for JSRs. Excellent... I never used either before, but they both sound handy so I think I'll start... Really!... i found 9 JMP statements just in this source file alone..... What's the reasoning behind digging out this old code and presenting it here (still rubbing my eyes in disbelief)? Did you actually think I had banished JMP from my code? The point I had originally made was that I like to use branches because 1) they can be as fast as JMP, 2) they're capitalising on the state of a processor status register bit, and 3) they undisputably take up two thirds of the space of JMP. The TPRINTF routine (written about eighteen years ago) would actually benefit from a few JMPs converted to branches, I note, since one purpose of the library was to be as compact as possible (and looking at it now, it doesn't compare well with the PRINTF routine I wrote last year). It's surprising the number of times I've had to recoup half a dozen bytes of code and changing JMPs to branches has allowed me to do just that. Jumps sometimes are necessary evils especially if there is no other way to move around more than 127 bytes No 6502 programmer could disagree with this... Unless the OP requires to know the absolute fastest ways to code (perhaps for some time critical code, or for some heavy duty number crunching routine etc...), the obsession about speed is moot anyway. In other words: unless you need the code to be fast, the speed is unimportant. Hmmm... Well, the topic is about cycle counting. Edited February 13, 2013 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
atari8warez Posted February 13, 2013 Share Posted February 13, 2013 (edited) What's the reasoning behind digging out this old code and presenting it here Could it be: "going out of my way to use branches for unconditional jumps.....". I don't know what changed in 6502 coding during the last 18 years to cause a change of heart in you, as the code shows you seemed to have no problem with using JMPs liberally. . By the way this code is available to the public on your own website so no digging was necessary In other words: unless you need the code to be fast, the speed is unimportant. Hmmm... Well, the topic is about cycle counting. And let me see... what could be the reason for cycle counting ... am I missing something here? Edited February 13, 2013 by atari8warez Quote Link to comment Share on other sites More sharing options...
atari8warez Posted February 13, 2013 Share Posted February 13, 2013 JMP (adr,X) was a sorely missed omission from the original as well as BRA unconditional. Then again with all those spare opcodes, conditional JMPs would have been nice too. Yes that could have been real handy... Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted February 13, 2013 Share Posted February 13, 2013 Jumps are always 3 cycles, tho...so there is a slight advantage over using unconditional branches (where page-crossing might become an issue if you aren't keeping tabs on things during development). However, there is a way trim this down to 2 cycles if the code is only skipping 1 byte (such as using the CMP# opcode). Quote Link to comment Share on other sites More sharing options...
Irgendwer Posted February 13, 2013 Share Posted February 13, 2013 What I truly hate is not the JMP but self-modifying code which some people are fond of using liberally, makes for some very difficult reading especially when not commented enough but that's another hate related subject http://www.cc65.org/snapshot-doc/smc.html Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.