SasQ Posted April 20, 2020 Share Posted April 20, 2020 I stumbled upon some difficulty regarding a 6502 assembly subroutine that takes a couple of parameters in CPU registers (A,X,Y), but it shouldn't mess them up during its operation as seen from the caller's perspective. On other architectures, normally in this situation I could simply save the values of those registers on the call stack at entry, then restore their original values from the stack in reverse order just before returning from the subroutine. But on 6502, the only register that can be pushed upon the stack, is the accumulator (A) :q So every other register needs to go through A first if I need to save it on the stack too. For example, if needed to save the X value, I would have to do something like this: TXA PHA But then it destroys whatever was there in A :q The obvious solution is to save A first: PHA TXA PHA But then I still no longer have the original value in A, which was the parameter that the caller passed in it ? And I cannot pull it back from the stack either, besause it is buried under the value of X I cannot pull X first, because it would reverse what I just did, and... you get the idea :q Another idea that comes to my mind, is that perhaps I could store those values somewhere in RAM, e.g. on page 0. Something like this: STX SAVEDX STY SAVEDY PHA ... PLA LDX SAVEDX LDY SAVEDY This solves the problem of saving registers without losing their values passed by the caller, but introduces another problematic side effect: the subroutine is no longer reentrant If it gets called again during its own operation (which may happen indirectly – inb4 you say I could simply avoid calling it from itself), then the saved value of the register will get overwritten during the second call ;\ (Another problem might be that such a procedure couldn't be put in ROM anymore, unless SAVEDX and SAVEDY are locations in RAM instead of somewhere next to the subroutine's code.) That's why such things are better to use the stack: because then the same code can be used to save/restore register values at different (subsequent) locations in memory. Are there any solutions of this problem on 6502? Or is it impossible to have callee-saved registers if they are also used for passing parameters into the subroutine? (If A wouldn't be used for parameter passing, just X and Y, then I guess this wouldn't be that much of a problem, because then I could destroy its value when moving them through A onto the stack, as I did before. So A seems to be the only problematic register here that cannot do the double duty.) Hmm... Or maybe there is some way to read those values pushed upon the stack back into their original registers without pulling them from the stack? Quote Link to comment Share on other sites More sharing options...
danwinslow Posted April 20, 2020 Share Posted April 20, 2020 If you want to be reentrant, you have to have a stack somewhere. I don't think that when writing for this machine reentrancy is usually a major problem. As far as I know, there are no other solutions, you must either store on the stack or in some other location. You can directly read the values off of the stack by using the SP, but then you wind up having to either adjust the SP manually or pull then anyway. Loading directly via lda,ldx,ldy of course works. Quote Link to comment Share on other sites More sharing options...
+Spancho Posted April 20, 2020 Share Posted April 20, 2020 You can manipulate the stack pointer with TSX and TXS. Once you have A and X on the stack decrease the S to the location of A and move S then back to location of X. But be careful when exiting, as you don’t know what the stack value of former A will be. Quote Link to comment Share on other sites More sharing options...
drac030 Posted April 20, 2020 Share Posted April 20, 2020 No stack pointer manipulation is needed: TSX INX INX LDA $0100,X - loads the second byte counting from the top of the stack to the accumulator. You can of course use LDA $0102,X instead of INX/INX, but this way there is a small risk that the LDA will exceed the stack area (like when S=$FF, the effective address will be $0201, and when you use INX, it will wrap). 1 1 Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted April 20, 2020 Share Posted April 20, 2020 1 minute ago, drac030 said: TSX but now you have to restore X as that was one of the passed values? If stack being used: PHA STA KeepA TAX PHA TAY PHA KeepA = *+1 LDA #0 ... PLA TAY PLA TAX PLA or if not: STA RetA STX RetX STY RetY ... RetA = *+1 LDA #0 RetX = *+1 LDX #0 RetY = *+1 LDY #0 1 Quote Link to comment Share on other sites More sharing options...
drac030 Posted April 20, 2020 Share Posted April 20, 2020 (edited) 9 minutes ago, Wrathchild said: but now you have to restore X as that was one of the passed values? I understand that they are stacked, because the routine does not modify the registers, as OP said. If they are stacked, you have to restore X anyways. Also, OP wanted reenetrancy. So something like this: PHA TXA PHA TYA PHA TSX INX INX LDA $0100,X - this is the former X value TAY - have it in Y INX LDA $0100,X - this is the former A value ... processing ... PLA - restore registers TAY PLA TAX PLA RTS Risking the stack excess: PHA TXA PHA TYA PHA TSX LDA $0102,X - this is the former X value TAY - have it in Y LDA $0103,X - this is the former A value ... processing ... PLA - restore registers TAY PLA TAX PLA RTS 65C02/65C816: PHY PHX PHA ... processing ... PLA PLX PLY RTS Edited April 20, 2020 by drac030 1 Quote Link to comment Share on other sites More sharing options...
danwinslow Posted April 20, 2020 Share Posted April 20, 2020 (edited) Hehe. I think OP was just trying to ask if there were any other (simple) solutions to going through the push/pop dance. This really wasn't about stack manipulation. So, I think the the answer is no, there are no other simple solutions. There are many ways to save and restore the reqs, but they all involve some variation of individual storing and loading. Also, OP mentioned reentrancy, and if you want to be reentrant, you have to have a stack somewhere even it's one you wrote yourself. Edited April 20, 2020 by danwinslow Quote Link to comment Share on other sites More sharing options...
Rybags Posted April 20, 2020 Share Posted April 20, 2020 "User stack" would be a workable idea to preserve reentrancy. But you need to ensure that subroutine reentrance doesn't occur while user stack processing is occurring. That would be a problem e.g. if the sub is called during an interrupt. Another alternative could be to use the BRK instruction. It is supposedly actually a 2 byte instruction so you could use the following byte as parameter (push or pull). The OS IRQ routine has BRK at the end of the food chain so is a bit CPU heavy. But by that stage the registers have been preserved so you could just read them off the stack, put them into your user stack Quote Link to comment Share on other sites More sharing options...
drac030 Posted April 20, 2020 Share Posted April 20, 2020 (edited) One could also create a stackframe on the 6502 stack, so that the reentrancy is preserved and there is still a handful of static variables to do calculations. A will then be the working register, X will be used to address the stackframe contents, and Y is spare. Something like that, maybe: PHA TXA PHA TSX ;create stackframe TXA SEC SBC #16 ;example stackframe size in bytes TAX TXS TYA ;save Y STA $0110,X ;byte 15 of the stackframe LDA #$00 ;now do processing STA $0101,X ;byte 0 of the stackframe ... etc ... ... etc ... LDY $0110,X ;restore Y TSX ;delete stackframe TXA CLC ADC #16 TAX TXS PLA ;restore regs TAX PLA RTS Edited April 20, 2020 by drac030 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted April 20, 2020 Share Posted April 20, 2020 4 hours ago, Rybags said: Another alternative could be to use the BRK instruction. It is supposedly actually a 2 byte instruction so you could use the following byte as parameter (push or pull). Unfortunately not too efficient (as I discovered when attempting to use BRK as a syscall) owing to the fact CPU bugs need to be accounted for. Quote Link to comment Share on other sites More sharing options...
R0ger Posted April 20, 2020 Share Posted April 20, 2020 Why store the registers though ? I usually accept the fact the subroutine will destroy the registers (or use them for arguments and return values) and I handle the problem on calling side. Most of the time things in the registers are already stored somewhere, and caller knows where. No need to store them again. Interrupts are different matter of course. There I use sta, stx, sty, and usually I have separate sets of zero page variables for DLI, VBI or IRQ. 3 Quote Link to comment Share on other sites More sharing options...
ivop Posted April 20, 2020 Share Posted April 20, 2020 I'm totally with @R0ger. Only in extreme cases you might want A preserved across multiple, possibly nested, subroutine calls. And even then it's better to cater for that at the calling side instead of the subroutine itself. Interrupts is another matter. Depending on if you use the OS mechanism (OS shadow vectors, exit via the OS, everything is handled for you for VBIs for example, but not for DLIs, IIRC), or use the 6502 vectors directly, there's some register saving involved. Most people use self-modifying code like @Wrathchild mentioned. If your code is small enough, put on page zero and you'll save a couple of cycles per register save/restore. Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted April 20, 2020 Share Posted April 20, 2020 Useful for stuff like bank-switching and debugging calls (to printf routines and such); they are two things which leap to mind. Quote Link to comment Share on other sites More sharing options...
R0ger Posted April 20, 2020 Share Posted April 20, 2020 What I wanted to say is unless we know exactly why the registers have to be saved, in what situation, it's hard to come up with "correct" method. On is fast, other is short, yet another allows re-entrancy, and it might even be best to simply not do it. It all depends. Quote Link to comment Share on other sites More sharing options...
dmsc Posted April 21, 2020 Share Posted April 21, 2020 Hi! 14 hours ago, drac030 said: One could also create a stackframe on the 6502 stack, so that the reentrancy is preserved and there is still a handful of static variables to do calculations. A will then be the working register, X will be used to address the stackframe contents, and Y is spare. Something like that, maybe: I think that creating a stack frame is the only "usable" way to create truly re-entrant functions, so there should be more examples like this for the 6502 14 hours ago, drac030 said: PHA TXA PHA TSX ;create stackframe TXA SEC SBC #16 ;example stackframe size in bytes TAX TXS Note that you can simplify the return code a little if you use a "base-pointer" in addition to the stack-pointer, you can keep in X the "old" stack value: PHA TXA PHA TSX ;create stackframe TXA SEC SBC #16 ;example stackframe size in bytes ; Optional: detect stack wrap ; BCC STACK_OVERFLOW TAX TXS ADC #15 ;assume that C was 1 fro above (no stack wrap) TAX ; Restore original S value on X Now, you use locals at addresses <$100,X and parameters at addresses >=$100,X And at return, you simply restore the stack from X: TXS ;restore S PLA ;restore X,A TAX PLA RTS Note that the above are similar to the x86 idiom " PUSH BP / MOV BP,SP / SUB SP, 16 " and "MOV SP,BP / POP BP / RET ". Sadly, with only two index registers, it is not that usable in the 6502. Have Fun! Quote Link to comment Share on other sites More sharing options...
drac030 Posted April 21, 2020 Share Posted April 21, 2020 (edited) Yes, one could also think of stacking the old stack pointer value and restoring it later: PHA TXA PHA TSX ;create stackframe TXA PHA ;push old S value CLC ;compensate SBC #16 ;example stackframe size in bytes TAX TXS ... processing ... LDA $0111,X ;load old S value TAX TXS ;delete stackframe PLA ;restore regs TAX PLA RTS I hope I calculated the offsets correctly It is the general idea that counts, anyways. This spares the ADC-stuff. As for the offsets, in real life you use labels and the offsets are calculated by the assembler, so it is not so important, what are the actual offset values for stuff stacked before the call and after the call. Edited April 21, 2020 by drac030 Quote Link to comment Share on other sites More sharing options...
danwinslow Posted April 21, 2020 Share Posted April 21, 2020 (edited) For one small threading experiment I divided the stack page into 4 separate stacks, and implemented a 'stack frame push/pop' scheme so that I had 4 separate 'pseudo-threads' running at the same time. I actually did some of it in CC65, I think, dropping into assembler. I'd have to look it up. Was pretty cool, but at first was not preemptive and the threads had to do yield. I looked into making it preemptive using an interrupt, and that worked...sort of. Pretty much crashed anytime I tried to do any OS or DOS calls, of course, and that wasn't surprising, but I could do simple things like increment a counter. I did get some screen IO working but I had to devote 1 thread to being the only one doing it. Edited April 21, 2020 by danwinslow 1 Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted April 21, 2020 Share Posted April 21, 2020 8 minutes ago, danwinslow said: For one small threading experiment I divided the stack page into 4 separate stacks, and implemented a 'stack frame push/pop' scheme so that I had 4 separate 'pseudo-threads' running at the same time. The GOS I was working on a few years back (and will be again) uses a combination of stack frames and stack caching. IIRC, I split the stack into four frames and have a have a cache of sixteen stack frames (one for each possible process) from which stacks not already in a slot when their process gets CPU time are retrieved. This really cuts down on the scheduling overhead while permitting largish frames and no harsh limits on the number of processes. 3 Quote Link to comment Share on other sites More sharing options...
danwinslow Posted April 21, 2020 Share Posted April 21, 2020 (edited) Hi Jon. Yep, that was almost exactly my method, although I did not extend it to 16, although I think I meant to. Copying 64 bytes back in each context switch is a little slow, but having it check if it has residency already is a nice touch. Edited April 21, 2020 by danwinslow Quote Link to comment Share on other sites More sharing options...
ivop Posted April 21, 2020 Share Posted April 21, 2020 8 hours ago, danwinslow said: Hi Jon. Yep, that was almost exactly my method, although I did not extend it to 16, although I think I meant to. Copying 64 bytes back in each context switch is a little slow, but having it check if it has residency already is a nice touch. You don't have to copy the full 64 bytes each time. Only the part of the stack that is active. 2 Quote Link to comment Share on other sites More sharing options...
SasQ Posted April 22, 2020 Author Share Posted April 22, 2020 (edited) Whoa! I didn't expect such quick and numerous replies on a forum about vintage computers! You guys are much better than the rulebook nazis from Stack Overflow On 4/20/2020 at 1:34 PM, danwinslow said: I don't think that when writing for this machine reentrancy is usually a major problem. True, maybe it isn't. But it's definitely a Good Thing To Have™. Therefore my usual approach is to try having it from the start, and only loosen this requirement where I need, or where keeping it would be too troublesome/inefficient. It's one of the requirements if you want your procedure to be a "black box", not affecting the user with some weird side effects. On 4/20/2020 at 1:34 PM, danwinslow said: you must either store on the stack or in some other location. Well, if the values in registers have to be retained, they surely must be stored somewhere, obviously :q On 4/20/2020 at 1:34 PM, danwinslow said: You can directly read the values off of the stack by using the SP Yup, I know, that's what I was asking about in the last line of my original post. I know that there is such a technique, since I use it sometimes on x86, I just don't have much prior experience in how exactly do the same thing on 6502 where the register manipulation seems to be somewhat limited. On 4/20/2020 at 1:42 PM, Spancho said: You can manipulate the stack pointer with TSX and TXS. I suppose that I have to push A and X first, right? Because otherwise, TXS would damage at least X, and I can't push X directly, it has to go through A first. So my guess is that I should start by first pushing the registers (first A, then X and Y through it), and only then I can copy S to X to manipulate it and peek through the stack with it to get the original values of the registers back? On 4/20/2020 at 1:42 PM, Spancho said: Once you have A and X on the stack decrease the S to the location of A and move S then back to location of X. Hmm... When I already have a copy of the original S in X for restoring it later, can I then use PLA for reloading the registers instead of LDA/LDX/LDY? (PLA would do that with one-byte instruction instead of three-byte). I mean, is it safe to do that? Because, since it moves the original S down the stack, I suspect there might be a risk of some interrupt overwriting the values above the stack pointer, am I right? If that's the case, I suppose it would be better to leave S where it is, and only peek the values below it through address arithmetics? On 4/20/2020 at 1:46 PM, drac030 said: TSX INX INX LDA $0100,X - loads the second byte counting from the top of the stack to the accumulator. BINGO! That seems to be the thing I was looking for ? Directly addressing the data on the stack with respect to the original stack pointer (because I suppose it's better to leave it where it is, if interrupts might interfere). On 4/20/2020 at 1:46 PM, drac030 said: there is a small risk that the LDA will exceed the stack area (like when S=$FF, the effective address will be $0201, and when you use INX, it will wrap). Thank you for mentioning that. Definitely something worth keeping in mind. On 4/20/2020 at 1:50 PM, Wrathchild said: but now you have to restore X as that was one of the passed values? Not a problem, as long as the previous value of X the user has passed is already sleeping nice & tight on the stack where I can reach it anytime with @drac030's technique On 4/20/2020 at 1:50 PM, Wrathchild said: PHA STA KeepA Why do you save A both in memory and on the stack? On 4/20/2020 at 1:58 PM, drac030 said: 65C02/65C816: PHY PHX PHA ... processing ... PLA PLX PLY RTS Hahah so they realized their fault eventually and fixed it? :J This way it saves not only the registers, but also a lot of headache (and instruction bytes). On 4/20/2020 at 7:48 PM, R0ger said: Why store the registers though ? I usually accept the fact the subroutine will destroy the registers (or use them for arguments and return values) and I handle the problem on calling side. Well, one of the reasons might be that when a subroutine can mess up the values in X and Y, it cannot be used inside of a loop that already uses X and Y for the loop counters You then have to remember to save these registers yourself before every subroutine call and restore them later, and moreover, you have to repeat the saving/restoring code in every place of a call If you save/restore them inside the subroutine instead, the save/restore code is localized inside that subroutine and doesn't have to be repeated all over the program. With the "caller saves" approach, the caller cannot assume that the subroutine won't mess the registers, so the caller has to always save them before the call and restore afterwards if he wants to keep their values, even if the subroutine actually doesn't mess up those registers. In that case, all that work is wasted. On the other hand, if it's the job of the subroutine to save the registers it actually uses, it can avoid that overhead if it doesn't mess with the registers (and the subroutine knows best which registers it needs to mess with, so it should be its responsibility to save their values in that case). Not to mention that there is a conceptual benefit from treating a subroutine as a "black box", so that the caller didn't have to know what does it do with the resisters inside, or produce any weird side effects. 14 hours ago, danwinslow said: For one small threading experiment I divided the stack page into 4 separate stacks, and implemented a 'stack frame push/pop' scheme so that I had 4 separate 'pseudo-threads' running at the same time. Wow, threading on a machine with no hardware support for it? That's definitely something interesting that I'd like to try one day Edited April 22, 2020 by SasQ Typos Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted April 22, 2020 Share Posted April 22, 2020 2 hours ago, SasQ said: Wow, threading on a machine with no hardware support for it? https://atari8.co.uk/gui/ 1 Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted April 22, 2020 Share Posted April 22, 2020 4 hours ago, SasQ said: On 4/20/2020 at 12:50 PM, Wrathchild said: PHA STA KeepA Why do you save A both in memory and on the stack? Don't think of it as in memory, it's self-modifying code, once the A, X & Y are save to the stack, A restores itself from the value 'poked' there. 1 Quote Link to comment Share on other sites More sharing options...
SasQ Posted April 22, 2020 Author Share Posted April 22, 2020 9 minutes ago, Wrathchild said: it's self-modifying code What? How? I don't get it... ? OK people, look what I found: http://www.6502.org/tutorials/register_preservation.html If only I had found it earlier, I wouldn't have to ask at all... -_- 1 hour ago, flashjazzcat said: https://atari8.co.uk/gui/ That's too cool! Interestingly, they seem to have a link to an article about multitasking at 6502.org too: http://wilsonminesco.com/multitask/ Quote Link to comment Share on other sites More sharing options...
Wrathchild Posted April 22, 2020 Share Posted April 22, 2020 11 minutes ago, SasQ said: What? How? I don't get it... If the function was in Page 6: $600 PHA $601 STA KeepA $604 TAX $605 PHA $606 TAY $607 PHA KeepA = *+1 $608 LDA #0 The instruction at $601 is writing the value in A (for example, $27) to the address $609 So when the instruction at $608 is executed it will perform LDA #$27 as the zero was overwritten. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.