Saving registers in a subroutine

SasQ · April 20, 2020

I stumbled upon some difficulty regarding a 6502 assembly subroutine that takes a couple of parameters in CPU registers (A,X,Y), but it shouldn't mess them up during its operation as seen from the caller's perspective.

On other architectures, normally in this situation I could simply save the values of those registers on the call stack at entry, then restore their original values from the stack in reverse order just before returning from the subroutine.

But on 6502, the only register that can be pushed upon the stack, is the accumulator (A) :q So every other register needs to go through A first if I need to save it on the stack too. For example, if needed to save the X value, I would have to do something like this:

TXA
PHA

But then it destroys whatever was there in A :q

The obvious solution is to save A first:

PHA
TXA
PHA

But then I still no longer have the original value in A, which was the parameter that the caller passed in it ? And I cannot pull it back from the stack either, besause it is buried under the value of X I cannot pull X first, because it would reverse what I just did, and... you get the idea :q

Another idea that comes to my mind, is that perhaps I could store those values somewhere in RAM, e.g. on page 0. Something like this:

STX SAVEDX
STY SAVEDY
PHA
...
PLA
LDX SAVEDX
LDY SAVEDY

This solves the problem of saving registers without losing their values passed by the caller, but introduces another problematic side effect: the subroutine is no longer reentrant If it gets called again during its own operation (which may happen indirectly – inb4 you say I could simply avoid calling it from itself), then the saved value of the register will get overwritten during the second call ;\ (Another problem might be that such a procedure couldn't be put in ROM anymore, unless SAVEDX and SAVEDY are locations in RAM instead of somewhere next to the subroutine's code.) That's why such things are better to use the stack: because then the same code can be used to save/restore register values at different (subsequent) locations in memory.

Are there any solutions of this problem on 6502? Or is it impossible to have callee-saved registers if they are also used for passing parameters into the subroutine?

(If A wouldn't be used for parameter passing, just X and Y, then I guess this wouldn't be that much of a problem, because then I could destroy its value when moving them through A onto the stack, as I did before. So A seems to be the only problematic register here that cannot do the double duty.)

Hmm... Or maybe there is some way to read those values pushed upon the stack back into their original registers without pulling them from the stack?

danwinslow · April 20, 2020

If you want to be reentrant, you have to have a stack somewhere. I don't think that when writing for this machine reentrancy is usually a major problem.

As far as I know, there are no other solutions, you must either store on the stack or in some other location.

You can directly read the values off of the stack by using the SP, but then you wind up having to either adjust the SP manually or pull then anyway. Loading directly via lda,ldx,ldy of course works.

+Spancho · April 20, 2020

You can manipulate the stack pointer with TSX and TXS.

Once you have A and X on the stack decrease the S to the location of A and move S then back to location of X.

But be careful when exiting, as you don’t know what the stack value of former A will be.

drac030 · April 20, 2020

No stack pointer manipulation is needed:

TSX

INX

LDA $0100,X - loads the second byte counting from the top of the stack to the accumulator.

You can of course use LDA $0102,X instead of INX/INX, but this way there is a small risk that the LDA will exceed the stack area (like when S=$FF, the effective address will be $0201, and when you use INX, it will wrap).

Wrathchild · April 20, 2020

1 minute ago, drac030 said:

TSX

but now you have to restore X as that was one of the passed values?

If stack being used:

PHA
STA KeepA
TAX
PHA
TAY
PHA
KeepA = *+1
LDA #0
...
PLA
TAY
PLA
TAX
PLA

or if not:

STA RetA
STX RetX
STY RetY
...
RetA = *+1
LDA #0
RetX = *+1
LDX #0
RetY = *+1
LDY #0

drac030 · April 20, 2020

9 minutes ago, Wrathchild said:

but now you have to restore X as that was one of the passed values?

I understand that they are stacked, because the routine does not modify the registers, as OP said. If they are stacked, you have to restore X anyways. Also, OP wanted reenetrancy.

So something like this:

PHA

TXA

PHA

TYA

PHA

TSX

INX

LDA $0100,X - this is the former X value

TAY - have it in Y

INX

LDA $0100,X - this is the former A value

... processing ...

PLA - restore registers

TAY

PLA

TAX

PLA

RTS

Risking the stack excess:

PHA

TXA

PHA

TYA

PHA

TSX

LDA $0102,X - this is the former X value

TAY - have it in Y

LDA $0103,X - this is the former A value

... processing ...

PLA - restore registers

TAY

PLA

TAX

PLA

RTS

65C02/65C816:

PHY

PHX

PHA

... processing ...

PLA

PLX

PLY

RTS

Edited April 20, 2020 by drac030

danwinslow · April 20, 2020

Hehe. I think OP was just trying to ask if there were any other (simple) solutions to going through the push/pop dance. This really wasn't about stack manipulation. So, I think the the answer is no, there are no other simple solutions. There are many ways to save and restore the reqs, but they all involve some variation of individual storing and loading. Also, OP mentioned reentrancy, and if you want to be reentrant, you have to have a stack somewhere even it's one you wrote yourself.

Edited April 20, 2020 by danwinslow

Rybags · April 20, 2020

"User stack" would be a workable idea to preserve reentrancy. But you need to ensure that subroutine reentrance doesn't occur while user stack processing is occurring. That would be a problem e.g. if the sub is called during an interrupt.

Another alternative could be to use the BRK instruction. It is supposedly actually a 2 byte instruction so you could use the following byte as parameter (push or pull).

The OS IRQ routine has BRK at the end of the food chain so is a bit CPU heavy. But by that stage the registers have been preserved so you could just read them off the stack, put them into your user stack

drac030 · April 20, 2020

One could also create a stackframe on the 6502 stack, so that the reentrancy is preserved and there is still a handful of static variables to do calculations. A will then be the working register, X will be used to address the stackframe contents, and Y is spare. Something like that, maybe:

PHA

TXA

PHA

TSX ;create stackframe

TXA

SEC

SBC #16 ;example stackframe size in bytes

TAX

TXS

TYA ;save Y

STA $0110,X ;byte 15 of the stackframe

LDA #$00 ;now do processing

STA $0101,X ;byte 0 of the stackframe

... etc ...

LDY $0110,X ;restore Y

TSX ;delete stackframe

TXA

CLC

ADC #16

TAX

TXS

PLA ;restore regs

TAX

PLA

RTS

Edited April 20, 2020 by drac030

flashjazzcat · April 20, 2020

4 hours ago, Rybags said:

Another alternative could be to use the BRK instruction. It is supposedly actually a 2 byte instruction so you could use the following byte as parameter (push or pull).

Unfortunately not too efficient (as I discovered when attempting to use BRK as a syscall) owing to the fact CPU bugs need to be accounted for.

R0ger · April 20, 2020

Why store the registers though ? I usually accept the fact the subroutine will destroy the registers (or use them for arguments and return values) and I handle the problem on calling side. Most of the time things in the registers are already stored somewhere, and caller knows where. No need to store them again.

Interrupts are different matter of course. There I use sta, stx, sty, and usually I have separate sets of zero page variables for DLI, VBI or IRQ.

ivop · April 20, 2020

I'm totally with @R0ger. Only in extreme cases you might want A preserved across multiple, possibly nested, subroutine calls. And even then it's better to cater for that at the calling side instead of the subroutine itself.

Interrupts is another matter. Depending on if you use the OS mechanism (OS shadow vectors, exit via the OS, everything is handled for you for VBIs for example, but not for DLIs, IIRC), or use the 6502 vectors directly, there's some register saving involved. Most people use self-modifying code like @Wrathchild mentioned. If your code is small enough, put on page zero and you'll save a couple of cycles per register save/restore.

flashjazzcat · April 20, 2020

Useful for stuff like bank-switching and debugging calls (to printf routines and such); they are two things which leap to mind.

R0ger · April 20, 2020

What I wanted to say is unless we know exactly why the registers have to be saved, in what situation, it's hard to come up with "correct" method. On is fast, other is short, yet another allows re-entrancy, and it might even be best to simply not do it. It all depends.

dmsc · April 21, 2020

Hi!

14 hours ago, drac030 said:

One could also create a stackframe on the 6502 stack, so that the reentrancy is preserved and there is still a handful of static variables to do calculations. A will then be the working register, X will be used to address the stackframe contents, and Y is spare. Something like that, maybe:

I think that creating a stack frame is the only "usable" way to create truly re-entrant functions, so there should be more examples like this for the 6502

14 hours ago, drac030 said:

PHA

TXA

PHA

TSX ;create stackframe

TXA

SEC

SBC #16 ;example stackframe size in bytes

TAX

TXS

Note that you can simplify the return code a little if you use a "base-pointer" in addition to the stack-pointer, you can keep in X the "old" stack value:

PHA
TXA
PHA

TSX  ;create stackframe
TXA
SEC
SBC #16 ;example stackframe size in bytes

; Optional: detect stack wrap
; BCC STACK_OVERFLOW

TAX
TXS
ADC #15 ;assume that C was 1 fro above (no stack wrap) 
TAX     ; Restore original S value on X

Now, you use locals at addresses <$100,X and parameters at addresses >=$100,X

And at return, you simply restore the stack from X:

TXS ;restore S
PLA ;restore X,A
TAX
PLA
RTS

Note that the above are similar to the x86 idiom " PUSH BP / MOV BP,SP / SUB SP, 16 " and "MOV SP,BP / POP BP / RET ". Sadly, with only two index registers, it is not that usable in the 6502.

Have Fun!

drac030 · April 21, 2020

Yes, one could also think of stacking the old stack pointer value and restoring it later:

PHA

TXA

PHA

TSX ;create stackframe

TXA

PHA ;push old S value

CLC ;compensate

SBC #16 ;example stackframe size in bytes

TAX

TXS

... processing ...

LDA $0111,X ;load old S value

TAX

TXS ;delete stackframe

PLA ;restore regs

TAX

PLA

RTS

I hope I calculated the offsets correctly It is the general idea that counts, anyways.

This spares the ADC-stuff. As for the offsets, in real life you use labels and the offsets are calculated by the assembler, so it is not so important, what are the actual offset values for stuff stacked before the call and after the call.

Edited April 21, 2020 by drac030

danwinslow · April 21, 2020

For one small threading experiment I divided the stack page into 4 separate stacks, and implemented a 'stack frame push/pop' scheme so that I had 4 separate 'pseudo-threads' running at the same time. I actually did some of it in CC65, I think, dropping into assembler. I'd have to look it up. Was pretty cool, but at first was not preemptive and the threads had to do yield. I looked into making it preemptive using an interrupt, and that worked...sort of. Pretty much crashed anytime I tried to do any OS or DOS calls, of course, and that wasn't surprising, but I could do simple things like increment a counter. I did get some screen IO working but I had to devote 1 thread to being the only one doing it.

Edited April 21, 2020 by danwinslow

flashjazzcat · April 21, 2020

8 minutes ago, danwinslow said:

For one small threading experiment I divided the stack page into 4 separate stacks, and implemented a 'stack frame push/pop' scheme so that I had 4 separate 'pseudo-threads' running at the same time.

The GOS I was working on a few years back (and will be again) uses a combination of stack frames and stack caching. IIRC, I split the stack into four frames and have a have a cache of sixteen stack frames (one for each possible process) from which stacks not already in a slot when their process gets CPU time are retrieved. This really cuts down on the scheduling overhead while permitting largish frames and no harsh limits on the number of processes.

danwinslow · April 21, 2020

Hi Jon. Yep, that was almost exactly my method, although I did not extend it to 16, although I think I meant to. Copying 64 bytes back in each context switch is a little slow, but having it check if it has residency already is a nice touch.

Edited April 21, 2020 by danwinslow

ivop · April 21, 2020

8 hours ago, danwinslow said:

Hi Jon. Yep, that was almost exactly my method, although I did not extend it to 16, although I think I meant to. Copying 64 bytes back in each context switch is a little slow, but having it check if it has residency already is a nice touch.

You don't have to copy the full 64 bytes each time. Only the part of the stack that is active.

SasQ · April 22, 2020

Whoa! I didn't expect such quick and numerous replies on a forum about vintage computers! You guys are much better than the rulebook nazis from Stack Overflow

On 4/20/2020 at 1:34 PM, danwinslow said:

I don't think that when writing for this machine reentrancy is usually a major problem.

True, maybe it isn't. But it's definitely a Good Thing To Have™. Therefore my usual approach is to try having it from the start, and only loosen this requirement where I need, or where keeping it would be too troublesome/inefficient. It's one of the requirements if you want your procedure to be a "black box", not affecting the user with some weird side effects.

On 4/20/2020 at 1:34 PM, danwinslow said:

you must either store on the stack or in some other location.

Well, if the values in registers have to be retained, they surely must be stored somewhere, obviously :q

On 4/20/2020 at 1:34 PM, danwinslow said:

You can directly read the values off of the stack by using the SP

Yup, I know, that's what I was asking about in the last line of my original post. I know that there is such a technique, since I use it sometimes on x86, I just don't have much prior experience in how exactly do the same thing on 6502 where the register manipulation seems to be somewhat limited.

On 4/20/2020 at 1:42 PM, Spancho said:

You can manipulate the stack pointer with TSX and TXS.

I suppose that I have to push A and X first, right? Because otherwise, TXS would damage at least X, and I can't push X directly, it has to go through A first.

So my guess is that I should start by first pushing the registers (first A, then X and Y through it), and only then I can copy S to X to manipulate it and peek through the stack with it to get the original values of the registers back?

On 4/20/2020 at 1:42 PM, Spancho said:

Once you have A and X on the stack decrease the S to the location of A and move S then back to location of X.

Hmm... When I already have a copy of the original S in X for restoring it later, can I then use PLA for reloading the registers instead of LDA/LDX/LDY? (PLA would do that with one-byte instruction instead of three-byte). I mean, is it safe to do that? Because, since it moves the original S down the stack, I suspect there might be a risk of some interrupt overwriting the values above the stack pointer, am I right? If that's the case, I suppose it would be better to leave S where it is, and only peek the values below it through address arithmetics?

On 4/20/2020 at 1:46 PM, drac030 said:

TSX

INX

INX

LDA $0100,X - loads the second byte counting from the top of the stack to the accumulator.

BINGO! That seems to be the thing I was looking for ? Directly addressing the data on the stack with respect to the original stack pointer (because I suppose it's better to leave it where it is, if interrupts might interfere).

On 4/20/2020 at 1:46 PM, drac030 said:

there is a small risk that the LDA will exceed the stack area (like when S=$FF, the effective address will be $0201, and when you use INX, it will wrap).

Thank you for mentioning that. Definitely something worth keeping in mind.

On 4/20/2020 at 1:50 PM, Wrathchild said:

but now you have to restore X as that was one of the passed values?

Not a problem, as long as the previous value of X the user has passed is already sleeping nice & tight on the stack where I can reach it anytime with @drac030's technique

On 4/20/2020 at 1:50 PM, Wrathchild said:

PHA

STA KeepA

Why do you save A both in memory and on the stack?

On 4/20/2020 at 1:58 PM, drac030 said:

65C02/65C816:

PHY

PHX

PHA

... processing ...

PLA

PLX

PLY

RTS

Hahah so they realized their fault eventually and fixed it? :J This way it saves not only the registers, but also a lot of headache (and instruction bytes).

On 4/20/2020 at 7:48 PM, R0ger said:

Why store the registers though ? I usually accept the fact the subroutine will destroy the registers (or use them for arguments and return values) and I handle the problem on calling side.

Well, one of the reasons might be that when a subroutine can mess up the values in X and Y, it cannot be used inside of a loop that already uses X and Y for the loop counters You then have to remember to save these registers yourself before every subroutine call and restore them later, and moreover, you have to repeat the saving/restoring code in every place of a call If you save/restore them inside the subroutine instead, the save/restore code is localized inside that subroutine and doesn't have to be repeated all over the program. With the "caller saves" approach, the caller cannot assume that the subroutine won't mess the registers, so the caller has to always save them before the call and restore afterwards if he wants to keep their values, even if the subroutine actually doesn't mess up those registers. In that case, all that work is wasted. On the other hand, if it's the job of the subroutine to save the registers it actually uses, it can avoid that overhead if it doesn't mess with the registers (and the subroutine knows best which registers it needs to mess with, so it should be its responsibility to save their values in that case).

Not to mention that there is a conceptual benefit from treating a subroutine as a "black box", so that the caller didn't have to know what does it do with the resisters inside, or produce any weird side effects.

14 hours ago, danwinslow said:

For one small threading experiment I divided the stack page into 4 separate stacks, and implemented a 'stack frame push/pop' scheme so that I had 4 separate 'pseudo-threads' running at the same time.

Wow, threading on a machine with no hardware support for it? That's definitely something interesting that I'd like to try one day

Edited April 22, 2020 by SasQ
Typos

flashjazzcat · April 22, 2020

2 hours ago, SasQ said:

Wow, threading on a machine with no hardware support for it?

https://atari8.co.uk/gui/

Wrathchild · April 22, 2020

4 hours ago, SasQ said:

On 4/20/2020 at 12:50 PM, Wrathchild said:

PHA

STA KeepA

Why do you save A both in memory and on the stack?

Don't think of it as in memory, it's self-modifying code, once the A, X & Y are save to the stack, A restores itself from the value 'poked' there.

SasQ · April 22, 2020

9 minutes ago, Wrathchild said:

it's self-modifying code

What? How? I don't get it... ?

OK people, look what I found:

http://www.6502.org/tutorials/register_preservation.html

If only I had found it earlier, I wouldn't have to ask at all... -_-

1 hour ago, flashjazzcat said:

https://atari8.co.uk/gui/

That's too cool!

Interestingly, they seem to have a link to an article about multitasking at 6502.org too:

http://wilsonminesco.com/multitask/

Wrathchild · April 22, 2020

11 minutes ago, SasQ said:

What? How? I don't get it...

If the function was in Page 6:

$600 PHA
$601 STA KeepA
$604 TAX
$605 PHA
$606 TAY
$607 PHA
KeepA = *+1
$608 LDA #0

The instruction at $601 is writing the value in A (for example, $27) to the address $609

So when the instruction at $608 is executed it will perform LDA #$27 as the zero was overwritten.

Saving registers in a subroutine

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members