accousticguitar Posted December 23, 2007 Share Posted December 23, 2007 (edited) I looked in a few books, but didn't find a definative answer. Suppose you have 3 routines; A, B, and C. In routine A you JSR to routine B. Routine B JSR's to routine C which has an RTS. Will the program go ABCBA or will it go ABCA? Routine B ends in a JMP rather than an RTS (it JMP's to routine D which ends in an RTS). Edited December 23, 2007 by accousticguitar Quote Link to comment Share on other sites More sharing options...
Devin Posted December 23, 2007 Share Posted December 23, 2007 (edited) When you jump to a sub-routine, the processor pushes the current address on the stack. This allows multiple subroutines to be called. A JSR B .... B JSR C JSR D RTS C LDA #111 RTS D LDA #222 RTS The order of execution will be A B C D. The accumulator will contain 222. When RTS is encountered, the system will take whatever two bytes are on the stack - whether it is an address or data! Make sure that you *never* JMP to a subroutine instead of using JSR! Edited December 23, 2007 by Devin Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted December 23, 2007 Share Posted December 23, 2007 When you jump to a sub-routine, the processor pushes the current address on the stack. This allows multiple subroutines to be called. A JSR B .... B JSR C JSR D RTS C LDA #111 RTS D LDA #222 RTS The order of execution will be A B C D. The accumulator will contain 222. When RTS is encountered, the system will take whatever two bytes are on the stack - whether it is an address or data! Make sure that you *never* JMP to a subroutine instead of using JSR! Actually, you can JMP or branch to a subroutine as long as you know what you're doing. For example, you could rewrite the code above as follows, which (in this example) would save 1 byte of code, 9 machine cycles, and 2 bytes on the stack: A JSR B .... B JSR C JMP D C LDA #111 RTS D LDA #222 RTS Now, that's obviously a made-up example, and if it were part of a real program, then it could be optimized even more, such as by putting D between B and C, and letting B fall into D after it returns from C. But in general, anytime you have a JSR that's immediately followed (after returning from the subroutine) by a RTS (as in the original version of B above), you can change the JSR to a JMP, and do away with the RTS (as in the revised version of B above). Obviously, you don't want to try these sorts of optimizations unless you know what you're doing, because if you mess up, it could mess up the program flow big time! But there might be times when it would come in handy. For example, batari Basic has a very limited stack space, so you can't have very many nested GOSUBs, or the stack will run into the variable space, and the return addresses on the stack can get mucked up. That can also happen with straight assembly programs, although a straight assembly program would hopefully have more unused bytes available at the top of page zero for the stack. In any case, you need to be careful not to have too many nested JSRs or GOSUBs, to avoid situations where the stack can run into the variable space, and-- in some situations-- you can save a byte or two, save some machine cycles, and reduce the amount of stack space needed, by JMPing or GOTOing to a subroutine instead of JSRing or GOSUBing to it. Michael Quote Link to comment Share on other sites More sharing options...
supercat Posted December 23, 2007 Share Posted December 23, 2007 I looked in a few books, but didn't find a definative answer. Suppose you have 3 routines; A, B, and C. In routine A you JSR to routine B. Routine B JSR's to routine C which has an RTS. Will the program go ABCBA or will it go ABCA? Routine B ends in a JMP rather than an RTS (it JMP's to routine D which ends in an RTS). Each time a JSR is executed, the address of the last byte of the JSR instruction is stored at the address pointed to by the stack pointer and the byte below that; the stack pointer is decremented by two. When an RTS is executed, the two bytes following the stack pointer are copied into the program counter, the stack pointer is incremented by two, and the program counter is advanced by one. If nothing disturbs the stack, the execution sequence would be "ABCbDa". If the stack gets disturbed (deliberately or accidentally) other sequences could occur. Make sure that you *never* JMP to a subroutine instead of using JSR! In most cases, the sequence jsr subroutine rts may be replaced by a JMP to the subroutine. This can be useful shortcut; it saves a byte of code and cuts six cycles off the return time. It may also save a stack level. In your particular example it probably doesn't (since B needed a stack level available to call C even if it doesn't use one to jump to D) but if D needs the bottom two bytes of stack storage the savings could be significant. Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted December 23, 2007 Share Posted December 23, 2007 In most cases, the sequence jsr subroutine rts may be replaced by a JMP to the subroutine. This can be useful shortcut; it saves a byte of code and cuts six cycles off the return time. It saves 6 cycles by eliminating a RTS, but also saves another 3 cycles by replacing JSR (6 cycles) with JMP (3 cycles), for a total savings of 9 cycles. Michael Quote Link to comment Share on other sites More sharing options...
supercat Posted December 23, 2007 Share Posted December 23, 2007 That can also happen with straight assembly programs, although a straight assembly program would hopefully have more unused bytes available at the top of page zero for the stack. Well, sometimes. Often if a program has more than a couple bytes of RAM left that's a sign that it needs more features. Seriously, while programming in assembly language has less overhead than bB, the memory freed up by eliminating that overhead is often used to do things which would not be possible in bB. I don't remember how much RAM I had left in Toyshop Trouble. I wasn't ludicrously tight, but I didn't have a whole lot to spare. Strat-O-Gems was tight enough that while I limited my general stack depth to two levels (four bytes) there were some routines which could only be called directly from the outer level since they to reuse some stack storage for other purposes. Quote Link to comment Share on other sites More sharing options...
supercat Posted December 23, 2007 Share Posted December 23, 2007 It saves 6 cycles by eliminating a RTS, but also saves another 3 cycles by replacing JSR (6 cycles) with JMP (3 cycles), for a total savings of 9 cycles. That's true. In the Stella's Stocking menu code, the RTS savings are of somewhat greater significance because I have four pieces of code that must run, in rotation, on every scan line (all 264 of them). Each such piece of code takes 46 cycles and is available as a macro or a subroutine (add 6 cycles to start and end for the subroutine call). In a typical routine like my "fetch byte" routine, I do something like: fetchbyte: PART1 ; Should start 6 cycles after WSYNC ldy #0 cmp (mdatabank),y lda (mdataptr),y sta temp nop MCODEBANK jsr Part2_5 ; Part2 with five cycles padding inc mdataptr bne no_pagewrap inc mdataptr+1 no_pagewrap: sta WSYNC jmp Part3_3; Part3 with three cycles padding The "jsr" to fetchbyte should occur immediately following a WSYNC (or at an equivalent time); the routine will return at the same time as would a jsr to a normal "part". Quote Link to comment Share on other sites More sharing options...
Devin Posted December 24, 2007 Share Posted December 24, 2007 Out of curiosity, where does our friend the 6502 put the stack? Page 1? How much stack space do we have on the Atari - I can't imagine much. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted December 24, 2007 Share Posted December 24, 2007 Out of curiosity, where does our friend the 6502 put the stack? Page 1? How much stack space do we have on the Atari - I can't imagine much. The stack page resides in the zeropage. It is shared with the RAM (variables), 128 bytes in total. Since each JSR costs you two bytes, you can theoretically (no variables) do 64 nested JSRs. Quote Link to comment Share on other sites More sharing options...
Tom Posted December 24, 2007 Share Posted December 24, 2007 Out of curiosity, where does our friend the 6502 put the stack? Page 1? How much stack space do we have on the Atari - I can't imagine much. The stack page resides in the zeropage. It is shared with the RAM (variables), 128 bytes in total. Since each JSR costs you two bytes, you can theoretically (no variables) do 64 nested JSRs. Depends on how you look at it From the 6502's point of view the stack IS in page 1. That is, when it uses the stack pointer for addressing, the address it generates consists always of the stack pointer (lower 8 bits) and 0x01 (upper 8 bits). Because the 2600's 128 bytes of RAM are mapped into the zero page and page 1, the effect is as if the stack was in page zero. Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted December 24, 2007 Share Posted December 24, 2007 Seriously, while programming in assembly language has less overhead than bB, the memory freed up by eliminating that overhead is often used to do things which would not be possible in bB. I don't remember how much RAM I had left in Toyshop Trouble. I wasn't ludicrously tight, but I didn't have a whole lot to spare. Strat-O-Gems was tight enough that while I limited my general stack depth to two levels (four bytes) there were some routines which could only be called directly from the outer level since they to reuse some stack storage for other purposes. I had similar problems when I did "E.T. Book Cart." In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed. So I had to make sure that both the RAM variables and the stack didn't run into the RAM-resident code! Michael Quote Link to comment Share on other sites More sharing options...
accousticguitar Posted December 24, 2007 Author Share Posted December 24, 2007 Thanks for all the replies. I can see now that I should have put it in code form like Devin did to help me to visualize it. RoutineA JSR RoutineB RoutineB JSR RoutineC JMP RoutineD RoutineC RTS RoutineD JSR RoutineE RTS RoutineE RTS So here would it go ABCBDEDA? First it JSRs to B and puts A on the stack, then it JSRs to C and puts B on the stack. It goes back to B and takes B off the stack leaving A on the stack. It jumps to D leaving A on the stack. It JSRs to E and puts D on the stack. It goes back to D leaving A on the stack, then returns to A. Am I following this correctly? Quote Link to comment Share on other sites More sharing options...
supercat Posted December 24, 2007 Share Posted December 24, 2007 In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed. In a magical mystery cartridge, I need to access over 5K of data in every group of four scan lines--2.25K of that on every scan line, without any time for bank switching. There are some tricks available to magical kitties (hardware wizards) not available to most people. >:*3 Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted December 24, 2007 Share Posted December 24, 2007 In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed. In a magical mystery cartridge, I need to access over 5K of data in every group of four scan lines--2.25K of that on every scan line, without any time for bank switching. There are some tricks available to magical kitties (hardware wizards) not available to most people. >:*3 A hardware wizard I ain't. I'm not even much of a software wizard. And I don't know what I was saying before-- of course I located my RAM-resident routine in the lowest area of RAM, followed by the RAM variables, followed by the stack. So the only possible conflict I had was between the RAM variables and the stack. Although trying to understand it might make my head explode, I'm curious to hear how you can access over 5K of data without doing some kind of bankswitching! Michael Quote Link to comment Share on other sites More sharing options...
+batari Posted December 24, 2007 Share Posted December 24, 2007 In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed. In a magical mystery cartridge, I need to access over 5K of data in every group of four scan lines--2.25K of that on every scan line, without any time for bank switching. There are some tricks available to magical kitties (hardware wizards) not available to most people. >:*3 A hardware wizard I ain't. I'm not even much of a software wizard. And I don't know what I was saying before-- of course I located my RAM-resident routine in the lowest area of RAM, followed by the RAM variables, followed by the stack. So the only possible conflict I had was between the RAM variables and the stack. Although trying to understand it might make my head explode, I'm curious to hear how you can access over 5K of data without doing some kind of bankswitching! Michael He means conventional bankswitching. That's the beauty of hardware-software codesign Quote Link to comment Share on other sites More sharing options...
supercat Posted December 25, 2007 Share Posted December 25, 2007 He means conventional bankswitching. That's the beauty of hardware-software codesign Yup. I figure out when the bank switching will be necessary, and make it so that under certain conditions it will be tripped by operations I need to do anyway (I need to be in the alternate bank for cycles 6-52. There's an HMOVE store on cycle 5, and an AUDV1 store on cycle 52. Since this game uses a custom PLD anyway, it's possible to make those necessary stores trigger the banking. Of course, using that PLD with code that isn't designed for it could wreak havoc on things. Quote Link to comment Share on other sites More sharing options...
Glade Swope Posted December 28, 2013 Share Posted December 28, 2013 *never* JMP to a subroutine instead of using JSR! That is actually useful! jmp subQ is equivalent to jsr subQ rts (saves 6 cycles and 2 bytes of stack) Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted January 18, 2014 Share Posted January 18, 2014 9 cycles. jmp is 3 less than jsr, and rts (6 cycles) is eliminated. Quote Link to comment Share on other sites More sharing options...
Glade Swope Posted January 20, 2014 Share Posted January 20, 2014 9 cycles. jmp is 3 less than jsr, and rts (6 cycles) is eliminated. Oh I forgot... I guess two r(a)ts are better than one! Quote Link to comment Share on other sites More sharing options...
Nukey Shay Posted January 20, 2014 Share Posted January 20, 2014 Plus the bonus that the stack does not contain 2 more values. So just as you can JMP into a subroutine (in some circumstances), the reverse can also be true (jumping out of a subroutine). Just get rid of the most-recent return address (via PLA PLA or TXS, for example). There are many games that fake a return address as a means of indirect jumping...sticking the destination-1 into the stack via PHA and then using RTS to perform the jump. BTW sometimes it's better to use JMP(indirect) as a means of returning. You have control over which bytes of ram hold the address to return to, and the return does not need to be where you came from (useful if your code is littered with JSR+JMP pairs). The downside is that you need a few additional cycles and a register to write the address. The stack pointer itself can be exploited. If your code isn't using the stack, you could use the pointer as an extra register (kinda). Or you can have it point at the address to ENAble the ball or missile sprites. Just remember to fix the stack pointer afterward if you are using the stack someplace else. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.