Jump to content
IGNORED

Can you JSR twice?


accousticguitar

Recommended Posts

I looked in a few books, but didn't find a definative answer. Suppose you have 3 routines; A, B, and C. In routine A you JSR to routine B. Routine B JSR's to routine C which has an RTS. Will the program go ABCBA or will it go ABCA? Routine B ends in a JMP rather than an RTS (it JMP's to routine D which ends in an RTS).

Edited by accousticguitar
Link to comment
Share on other sites

When you jump to a sub-routine, the processor pushes the current address on the stack. This allows multiple subroutines to be called.

 

A
JSR B
....

B
JSR C
JSR D
RTS

C
LDA #111
RTS

D
LDA #222
RTS

 

The order of execution will be A B C D. The accumulator will contain 222. When RTS is encountered, the system will take whatever two bytes are on the stack - whether it is an address or data! Make sure that you *never* JMP to a subroutine instead of using JSR!

Edited by Devin
Link to comment
Share on other sites

When you jump to a sub-routine, the processor pushes the current address on the stack. This allows multiple subroutines to be called.

 

A
JSR B
....

B
JSR C
JSR D
RTS

C
LDA #111
RTS

D
LDA #222
RTS

 

The order of execution will be A B C D. The accumulator will contain 222. When RTS is encountered, the system will take whatever two bytes are on the stack - whether it is an address or data! Make sure that you *never* JMP to a subroutine instead of using JSR!

Actually, you can JMP or branch to a subroutine as long as you know what you're doing. For example, you could rewrite the code above as follows, which (in this example) would save 1 byte of code, 9 machine cycles, and 2 bytes on the stack:

 

A
JSR B
....

B
JSR C
JMP D

C
LDA #111
RTS

D
LDA #222
RTS

Now, that's obviously a made-up example, and if it were part of a real program, then it could be optimized even more, such as by putting D between B and C, and letting B fall into D after it returns from C. But in general, anytime you have a JSR that's immediately followed (after returning from the subroutine) by a RTS (as in the original version of B above), you can change the JSR to a JMP, and do away with the RTS (as in the revised version of B above).

 

Obviously, you don't want to try these sorts of optimizations unless you know what you're doing, because if you mess up, it could mess up the program flow big time! But there might be times when it would come in handy. For example, batari Basic has a very limited stack space, so you can't have very many nested GOSUBs, or the stack will run into the variable space, and the return addresses on the stack can get mucked up. That can also happen with straight assembly programs, although a straight assembly program would hopefully have more unused bytes available at the top of page zero for the stack. In any case, you need to be careful not to have too many nested JSRs or GOSUBs, to avoid situations where the stack can run into the variable space, and-- in some situations-- you can save a byte or two, save some machine cycles, and reduce the amount of stack space needed, by JMPing or GOTOing to a subroutine instead of JSRing or GOSUBing to it.

 

Michael

Link to comment
Share on other sites

I looked in a few books, but didn't find a definative answer. Suppose you have 3 routines; A, B, and C. In routine A you JSR to routine B. Routine B JSR's to routine C which has an RTS. Will the program go ABCBA or will it go ABCA? Routine B ends in a JMP rather than an RTS (it JMP's to routine D which ends in an RTS).

Each time a JSR is executed, the address of the last byte of the JSR instruction is stored at the address pointed to by the stack pointer and the byte below that; the stack pointer is decremented by two. When an RTS is executed, the two bytes following the stack pointer are copied into the program counter, the stack pointer is incremented by two, and the program counter is advanced by one.

 

If nothing disturbs the stack, the execution sequence would be "ABCbDa". If the stack gets disturbed (deliberately or accidentally) other sequences could occur.

 

Make sure that you *never* JMP to a subroutine instead of using JSR!

 

In most cases, the sequence

  jsr subroutine
 rts

may be replaced by a JMP to the subroutine. This can be useful shortcut; it saves a byte of code and cuts six cycles off the return time. It may also save a stack level. In your particular example it probably doesn't (since B needed a stack level available to call C even if it doesn't use one to jump to D) but if D needs the bottom two bytes of stack storage the savings could be significant.

Link to comment
Share on other sites

In most cases, the sequence
  jsr subroutine
 rts

may be replaced by a JMP to the subroutine. This can be useful shortcut; it saves a byte of code and cuts six cycles off the return time.

It saves 6 cycles by eliminating a RTS, but also saves another 3 cycles by replacing JSR (6 cycles) with JMP (3 cycles), for a total savings of 9 cycles. :)

 

Michael

Link to comment
Share on other sites

That can also happen with straight assembly programs, although a straight assembly program would hopefully have more unused bytes available at the top of page zero for the stack.

 

Well, sometimes. Often if a program has more than a couple bytes of RAM left that's a sign that it needs more features. :-D

 

Seriously, while programming in assembly language has less overhead than bB, the memory freed up by eliminating that overhead is often used to do things which would not be possible in bB.

 

I don't remember how much RAM I had left in Toyshop Trouble. I wasn't ludicrously tight, but I didn't have a whole lot to spare. Strat-O-Gems was tight enough that while I limited my general stack depth to two levels (four bytes) there were some routines which could only be called directly from the outer level since they to reuse some stack storage for other purposes.

Link to comment
Share on other sites

It saves 6 cycles by eliminating a RTS, but also saves another 3 cycles by replacing JSR (6 cycles) with JMP (3 cycles), for a total savings of 9 cycles. :)

 

That's true. In the Stella's Stocking menu code, the RTS savings are of somewhat greater significance because I have four pieces of code that must run, in rotation, on every scan line (all 264 of them). Each such piece of code takes 46 cycles and is available as a macro or a subroutine (add 6 cycles to start and end for the subroutine call). In a typical routine like my "fetch byte" routine, I do something like:

fetchbyte:
 PART1  ; Should start 6 cycles after WSYNC
 ldy #0
 cmp (mdatabank),y
 lda (mdataptr),y
 sta temp
 nop MCODEBANK
 jsr Part2_5 ; Part2 with five cycles padding
 inc mdataptr
 bne no_pagewrap
 inc mdataptr+1
no_pagewrap:
 sta WSYNC
 jmp Part3_3; Part3 with three cycles padding

The "jsr" to fetchbyte should occur immediately following a WSYNC (or at an equivalent time); the routine will return at the same time as would a jsr to a normal "part".

Link to comment
Share on other sites

Out of curiosity, where does our friend the 6502 put the stack? Page 1? How much stack space do we have on the Atari - I can't imagine much.

The stack page resides in the zeropage. It is shared with the RAM (variables), 128 bytes in total.

 

Since each JSR costs you two bytes, you can theoretically (no variables) do 64 nested JSRs.

Link to comment
Share on other sites

Out of curiosity, where does our friend the 6502 put the stack? Page 1? How much stack space do we have on the Atari - I can't imagine much.

The stack page resides in the zeropage. It is shared with the RAM (variables), 128 bytes in total.

 

Since each JSR costs you two bytes, you can theoretically (no variables) do 64 nested JSRs.

 

Depends on how you look at it :)

 

From the 6502's point of view the stack IS in page 1. That is, when it uses the stack pointer for addressing, the address it generates consists always of the stack pointer (lower 8 bits) and 0x01 (upper 8 bits).

 

Because the 2600's 128 bytes of RAM are mapped into the zero page and page 1, the effect is as if the stack was in page zero.

Link to comment
Share on other sites

Seriously, while programming in assembly language has less overhead than bB, the memory freed up by eliminating that overhead is often used to do things which would not be possible in bB.

 

I don't remember how much RAM I had left in Toyshop Trouble. I wasn't ludicrously tight, but I didn't have a whole lot to spare. Strat-O-Gems was tight enough that while I limited my general stack depth to two levels (four bytes) there were some routines which could only be called directly from the outer level since they to reuse some stack storage for other purposes.

I had similar problems when I did "E.T. Book Cart." In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed. So I had to make sure that both the RAM variables and the stack didn't run into the RAM-resident code! :)

 

Michael

Link to comment
Share on other sites

Thanks for all the replies. I can see now that I should have put it in code form like Devin did to help me to visualize it.

 

RoutineA
JSR	RoutineB

RoutineB
JSR	RoutineC
JMP	RoutineD

RoutineC
RTS

RoutineD
JSR	RoutineE
RTS

RoutineE
RTS

So here would it go ABCBDEDA? First it JSRs to B and puts A on the stack, then it JSRs to C and puts B on the stack. It goes back to B and takes B off the stack leaving A on the stack. It jumps to D leaving A on the stack. It JSRs to E and puts D on the stack. It goes back to D leaving A on the stack, then returns to A. Am I following this correctly?

Link to comment
Share on other sites

In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed.

 

In a magical mystery cartridge, I need to access over 5K of data in every group of four scan lines--2.25K of that on every scan line, without any time for bank switching. There are some tricks available to magical kitties (hardware wizards) not available to most people. >:*3

Link to comment
Share on other sites

In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed.

 

In a magical mystery cartridge, I need to access over 5K of data in every group of four scan lines--2.25K of that on every scan line, without any time for bank switching. There are some tricks available to magical kitties (hardware wizards) not available to most people. >:*3

A hardware wizard I ain't. :( I'm not even much of a software wizard. :( And I don't know what I was saying before-- of course I located my RAM-resident routine in the lowest area of RAM, followed by the RAM variables, followed by the stack. So the only possible conflict I had was between the RAM variables and the stack.

 

Although trying to understand it might make my head explode, I'm curious to hear how you can access over 5K of data without doing some kind of bankswitching! :)

 

Michael

Link to comment
Share on other sites

In addition to RAM variables, I also had RAM-resident code that loaded the data for each line of text, since the text data could be in any bank, whereas the character shape data was in bank 0, and the text-loading routine needed to be available at all times while it switched banks as needed.

 

In a magical mystery cartridge, I need to access over 5K of data in every group of four scan lines--2.25K of that on every scan line, without any time for bank switching. There are some tricks available to magical kitties (hardware wizards) not available to most people. >:*3

A hardware wizard I ain't. :( I'm not even much of a software wizard. :( And I don't know what I was saying before-- of course I located my RAM-resident routine in the lowest area of RAM, followed by the RAM variables, followed by the stack. So the only possible conflict I had was between the RAM variables and the stack.

 

Although trying to understand it might make my head explode, I'm curious to hear how you can access over 5K of data without doing some kind of bankswitching! :)

 

Michael

He means conventional bankswitching. That's the beauty of hardware-software codesign :)

Link to comment
Share on other sites

He means conventional bankswitching. That's the beauty of hardware-software codesign :)

 

Yup. I figure out when the bank switching will be necessary, and make it so that under certain conditions it will be tripped by operations I need to do anyway (I need to be in the alternate bank for cycles 6-52. There's an HMOVE store on cycle 5, and an AUDV1 store on cycle 52. Since this game uses a custom PLD anyway, it's possible to make those necessary stores trigger the banking.

 

Of course, using that PLD with code that isn't designed for it could wreak havoc on things.

Link to comment
Share on other sites

  • 6 years later...
  • 3 weeks later...

Plus the bonus that the stack does not contain 2 more values.

 

So just as you can JMP into a subroutine (in some circumstances), the reverse can also be true (jumping out of a subroutine). Just get rid of the most-recent return address (via PLA PLA or TXS, for example). There are many games that fake a return address as a means of indirect jumping...sticking the destination-1 into the stack via PHA and then using RTS to perform the jump.

 

BTW sometimes it's better to use JMP(indirect) as a means of returning. You have control over which bytes of ram hold the address to return to, and the return does not need to be where you came from (useful if your code is littered with JSR+JMP pairs). The downside is that you need a few additional cycles and a register to write the address.

 

The stack pointer itself can be exploited. If your code isn't using the stack, you could use the pointer as an extra register (kinda). Or you can have it point at the address to ENAble the ball or missile sprites. Just remember to fix the stack pointer afterward if you are using the stack someplace else.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...