bugbiter Posted September 3, 2015 Share Posted September 3, 2015 I added a subroutine call to my code and it suddenly crashed. when I step through the code I see something I have never encountered before: The JSR call places a wrong return address onto the stack. How is this possible? I have never paid a lot of attention to the call stack window in Altirra before. If I understand correctly, it always shows the current address of the PC in the top line, and the last push below that and so on. Here is what I see: state before the jsr call: pc at $4249, last return address $214F the JSR should now push the return address onto the stack, which is the one after the jsr call, $424C After taking the JSR to $424D, the stack now has $4BFC as the current return address. And where did the $214f go? Of course there is no code at $4BFC, hence the crash after the next RTS. WTF? How can this happen? What did I do wrong? Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 3, 2015 Author Share Posted September 3, 2015 (edited) I just noticed that pushing registers onto the stack doesn't show on the call stack. As I'm doing this often in subroutines to save and retrieve x and y, I guess there must be a forgotten pha or pla somewhere. Could that be the reason? How can I monitor my stack better? Edited September 3, 2015 by bugbiter Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 4, 2015 Share Posted September 4, 2015 Altirra's call stack display works by a hybrid static/dynamic analysis engine that does virtual execution on the upcoming code. This is necessary as there is no metadata or standard for describing 6502 calling conventions. Most of the time this works, but there are some things that can confuse it, such as temporarily storing the stack pointer in memory and using Bcc instructions for unconditional branches. It also will never show return addresses that are unreachable by direct control flow, such as a call chain that ends in an infinite loop within a subroutine. As you've discovered, the problem is that the subroutine is unbalancing the stack and executing RTS at the wrong offset. In the first case, the call stack engine skips the JSR and then processes the RTS, following up the call chain. In the second case, the call stack engine is starting from within the subroutine and has to virtually execute the subroutine to find its RTS instruction. The stack becomes unbalanced in this path and that then causes the engine to find a different return address one byte off in the stack. The quickest way to debug this is to use the History window of the debugger. This view nests calls by execution history and will pretty quickly show which call unbalanced the stack. The nesting is usually enough to detect this, but if not enabling special registers in the right-click menu to look at the S register will reveal the problem. For interrupt routines, the debugger's Verifier feature has a special check for whether registers are saved and restored properly in the interrupt routine, and will force a break into the debugger if a register is corrupted by an IRQ or NMI handler. This will generally catch both forgotten and mismatched push/pops. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 4, 2015 Author Share Posted September 4, 2015 Thank you, phaeron. On monday will try history with S-register enabled. Thats the stack pointer, right? Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 6, 2015 Author Share Posted September 6, 2015 (edited) Well, looking at the history doesn't make me any wiser... the stack pointer goes from F9 to F7 with JSR PRINTXYFAST and finally to F5 with the wrong return address mentioned above. Where the JMP PRINT_FAST2 should put "424C" (the next instruction after the jsr call) it puts "4B21". There are be no register pushs or pulls involved until there that could corrupt the stack. Anyway, it's one thing to mess up return addresses that are already on the stack, but here the CPU obviously puts a totally wrong address on the stack all by its own! What is this?? Am I going mad? Is that only Altirra or does this also happen on the real thing? I'll have to check. Edited September 6, 2015 by bugbiter Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 6, 2015 Author Share Posted September 6, 2015 I once again captured the moment the wrong address appears, but now I'm watching page 1 directly in a memory window. Two bytes have been changed on the stack: 1F6: $4B and 1F7: $42. That could mean address $424B, which is 1 byte before the correct return address (I'm not quite sure what exactly gets put on the stack, if there is a one byte offset for return addresses) But the stck window shows $4BFC with the pointer on F6. I don't see where that number could come from.. Avery, please help! Quote Link to comment Share on other sites More sharing options...
phaeron Posted September 6, 2015 Share Posted September 6, 2015 JSR instructions push the address of the last byte of the JSR on the stack, MSB first. The S register points to the next available byte on the stack (not first valid byte as on other CPUs), so $101+S holds LSB and $102+S holds MSB. RTS resumes execution one byte after the word popped off from the stack. Therefore, the words you see in memory will be one byte less than the return address. Remember that the call stack view in the debugger tells you what return addresses might be used, since it is predicting future execution. Seeing a bad return address in the call stack view means that there is a potential code path leading to a broken return; only after actually executing the rest of the function can you verify this through registers / history. You haven't executed the rest of the function and don't have the disassembly scrolled down far enough, so I can't tell you what's wrong. The best approach from the last screenshot you posted is to do Debug > Step Out (Shift+F11) or the 'gr' command, which will cause the debugger to execute until the CPU does a return. After that, you can check the History window to check whether the pushes and pops are balanced in the actual code that got executed. Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 7, 2015 Author Share Posted September 7, 2015 Oh! So the call stack window does not necessarily show the actual state of the hw stack but shows that something WILL be going wrong! So I'll go through the rest of the subroutine and check what's happening. Thanks! Quote Link to comment Share on other sites More sharing options...
bugbiter Posted September 7, 2015 Author Share Posted September 7, 2015 Yep! It was an ordinary fault, one pla that was accidentally skipped. Sorry the call stack got me so confused. But now I know :-) 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.