Jump to content
IGNORED

Get Lost!


Propane13

Recommended Posts

  • 1 year later...
  • 2 years later...

@Propane13 In case the bug is keeping you from working this gem - heads up that a few AA 7800 coders collaborated on a "Debugging On Ancient Game Platforms" article at the wiki. I personally implemented some of the ideas in the Common Causes of Catastrophic Problems section inside 7800basic, and they've helped a fair bit with these sorts of random errors. 

 

Specifically, I implemented a canary check, and IRQ detect routines. Both sorts of problems will now trigger a dump of stack values to the screen and then run the KIL opcode to stop everything cold. This lets you know which address triggered the problem, which is often a smoking gun.

 

Not harassing you to return - one shouldn't feel pressured to do one's hobby. Just letting you know there might be a more positive path forward with this project.

  • Like 7
  • Thanks 1
Link to comment
Share on other sites

16 hours ago, RevEng said:

@Propane13 In case the bug is keeping you from working this gem - heads up that a few AA 7800 coders collaborated on a "Debugging On Ancient Game Platforms" article at the wiki. I personally implemented some of the ideas in the Common Causes of Catastrophic Problems section inside 7800basic, and they've helped a fair bit with these sorts of random errors. 

 

Specifically, I implemented a canary check, and IRQ detect routines. Both sorts of problems will now trigger a dump of stack values to the screen and then run the KIL opcode to stop everything cold. This lets you know which address triggered the problem, which is often a smoking gun.

 

Not harassing you to return - one shouldn't feel pressured to do one's hobby. Just letting you know there might be a more positive path forward with this project.

Hey, that's neat!

 

So, I think the only issue I have is that the game was written in assembly.  Are the fixes limited to 7800basic, or is some of the code ported to the emulators as well?

 

If memory serves, the problem never manifested itself on emulators, but specifically real hardware.  You could just sit on a screen for a long period of time, and it would freeze.  It pretty much was a game-killing bug.  A few answers were thrown my direction; specifically some memory writes that happened to ROM space if you took a ladder to a new screen.  I believe I fixed that, but the freeze still happened for some folks on real hardware.  My suspicion was that there was something funky happening with MARIA because some of my code seems to draw for longer than a screen.  My last hope years ago was to grab a logic analyzer and force a dump, but maybe with all the new knowledge, there's a way we could figure it out.

 

There was one other reason I stopped back in the day-- I wanted to have some special hardware made, and that wasn't possible.  It wasn't much, but if memory serves, it was 3 things:

 - support for the ROM to be large AND have memory.

 - support for a sound chip (was going to be pokey, but I'm now flexible on that)

 - the ability to "save" 256 bytes of volatile memory.  I made a "save / load" screen feature, but couldn't use it.

 

With hardware advances, I wonder if the above requirements are now possible.

 

 

Link to comment
Share on other sites

2 hours ago, Propane13 said:

 

There was one other reason I stopped back in the day-- I wanted to have some special hardware made, and that wasn't possible.  It wasn't much, but if memory serves, it was 3 things:

 - support for the ROM to be large AND have memory.

 - support for a sound chip (was going to be pokey, but I'm now flexible on that)

 - the ability to "save" 256 bytes of volatile memory.  I made a "save / load" screen feature, but couldn't use it.

 

With hardware advances, I wonder if the above requirements are now possible.

 

 

 

Large ROM + RAM + Pokey I believe is doable on CPUWIZ's 7800 cart boards. The save key can do saving but I am not sure how much memory is available on it.

 

Mitch

  • Like 3
Link to comment
Share on other sites

8 minutes ago, Mitch said:

 

Large ROM + RAM + Pokey I believe is doable on CPUWIZ's 7800 cart boards. The save key can do saving but I am not sure how much memory is available on it.

 

Mitch

 

Using NVRAM you could save as well. Savekey or NVRAM would work perfectly. Hell, do both :P

 

  • Like 3
Link to comment
Share on other sites

9 hours ago, Propane13 said:

So, I think the only issue I have is that the game was written in assembly.  Are the fixes limited to 7800basic, or is some of the code ported to the emulators as well?

The idea is language neutral. I implemented them in 7800basic, but there's no reason by they couldn't be implemented in a pure assembly. And I actually dump to the screen, without using an emulator, so real hardware reports are possible. If someone runs into the issue, they can just give a screen shot.

 

The first level of this sort of debugging is just to add certain checks, and kill the game with a certain screen color if the check is violated, without coding up a fancy stack dump. If someone reports a crash with one of these colors, you'll at least know what kind of problem you have, and that writing the stack dump is warranted.

  • Check 1 - execution of data. (likely caused by bad bankswitch or bad jump table)  Add the following code to your IRQ vector (instead of the usual RTI). It will tell you if a "0" byte of data was executed. (0=BRK, which is the only source of IRQ on the 7800)
IRQ
  lda #$1A ; YELLOW
  sta BACKGRND
  .byte $02 ; KIL opcode. Stop the 6502.
  • Check 2 - look for a "canary" value being overwritten by the stack (likely caused by unmatched jsr/rts, unmatched push/pop, or similar). As your last ZP memory location, define a bit of memory called "canary". Then add this check to your NMI (assuming you have one) or in your main loop.
 lda canary
 beq CanaryIsAlive
 lda #$45 ; RED
 sta BACKGRND
 .byte $02 ; KIL opcode
CanaryIsAlive

Between those two kinds of checks, you can catch a good many weird-crashing type glitches. Not all of them - it's possible a runaway memory-update loop can take everything out before they're triggered, or an unending loop condition happens - but it covers a good many of them.

 

Adding stack dumps to the above is a matter of popping values off the stack, and putting out the values to the screen, same as score values or whatnot.

 

To test the routes, just add a "BRK" opcode to your game somewhere, or overwrite the canary.

  • Like 4
Link to comment
Share on other sites

I see you partially adopted my IRQ debugging technique.  Awesome. ?

 

I have some parts coming to expand my dev7800 to an extreme, with full real time bus monitoring and all chip select lines etc. visible on a small LCD touch screen and maybe a 8 panel NEO pixel status bar.  I may even tap the port registers, to see what the hardware state is.

  • Like 7
Link to comment
Share on other sites

  • 2 weeks later...

ZeroPage Homebrew is playing Get Lost! on tomorrow's (Fri Apr 2, 2021) stream LIVE on Twitch at 12PM PT | 3PM ET | 7PM GMT! Hope everyone can watch!

PLEASE NOTE SPECIAL START TIME OF 12PM (NOON) PT DUE TO HOLIDAY WEEKEND

 

 

Games:

876125025_20210402-LetsPlay.thumb.jpg.ea28489c280eb99b6a529dd441eaa743.jpg

  • Like 4
Link to comment
Share on other sites

Thanks everyone for watching, and @ZeroPage Homebrew for giving "Get Lost" a try.

 

An interesting thing happened during the game-- the fabled crash actually occurred on hardware, and we have video of its occurrence:

https://www.twitch.tv/videos/972845538?t=156m50s

In my opinion, this is a good thing; maybe it'll help generate some ideas on debugging.

 

Would anyone be able to maybe slow this down frame-by-frame so I can try to figure out what happened?

 

 

 

  • Like 1
Link to comment
Share on other sites

I don't know how to go frame-by-frame in a Twitch stream, but pausing it right before the crash - it happens right after a death, and a fraction of a second after respawning. Sorry for the darkness - it grays out the screen a bit when pausing.

 

1305907005_ScreenShot2021-04-02at8_45_02PM.thumb.png.c41138b7d8afe59d4a3f011d665d6892.png

  • Like 4
Link to comment
Share on other sites

1 hour ago, Propane13 said:

Thanks everyone for watching, and @ZeroPage Homebrew for giving "Get Lost" a try.

 

An interesting thing happened during the game-- the fabled crash actually occurred on hardware, and we have video of its occurrence:

https://www.twitch.tv/videos/972845538?t=156m50s

In my opinion, this is a good thing; maybe it'll help generate some ideas on debugging.

 

Would anyone be able to maybe slow this down frame-by-frame so I can try to figure out what happened?

You're welcome! The game is amazing and I definitely want to play it to completion on the show. Hopefully this crash will help you narrow down the issue!

 

Here's the gameplay slowed down to 5% of the original speed during the time of the crash and also a screenshot of the frame before it crashed.

 

- James

 

 

image.thumb.png.51d666e116fd7341060110b7e087b4da.png

 

  • Like 4
Link to comment
Share on other sites

Reproducible in emulation. It appears to require a high number of deaths.

 

The trace file of the crash is attached, John. It doesn't have the whole session in it (it took about a gigabyte's worth of tracing before it crashed) but the file I attached does have a very substantial portion of play leading into the issue. Start at the end of the file, with all of the BRK+RTI opcodes, and backtrack until you see where it went off the rails.

 

getlost-trace.txt.gz

 

(spoiler, it looks like an indirect address write took out a subroutine's return address on the stack)

 

  • Like 3
Link to comment
Share on other sites

Well, this is interesting.  It has me theorizing.

 

The stack trace dies just after the AnimateBat() routine.  And, based on the videos / screenshots above, the bat is likely JUST being drawn on-screen when it happens. If you check out the slow-mo footage, the bat isn't there, so my guess is that when it's rendering, it kills the screen.  So, we have 2 pieces of data that seem to line up.

 

Let's add the third piece of data-- it seems to take a high number of deaths for the freeze to occur.  I wonder if--- maybe each time there's a death, I'm doing a JMP and skipping an RTS somewhere, adding a few bytes onto my stack.  Do that enough times, and we could end up with a stack overflow situation.

 

It's been a while since I coded in assembly and looked at the raw source, so forgive me for being rusty.  Are the debuggers today smart enough to see if that theory is true?  i.e. is my stack slowly growing after deaths (or other events), causing an overflow?

 

My memory is very fuzzy, but I don't remember a lot of discussion about how the 7800's stack worked in the docs.

My Color demo from 2001 leveraged somebody else's code, and has the following code block:

 

    ldx    #$ff        ; set stack pointer to $01ff
    txs

I have the same code in Get Lost.

 

Question: What sets the stack's major byte to $01 (as in $01FF)?  Stack information doesn't seem to exist in the "7800 software guide". I may have just made assumptions that whoever had originated that "set stack pointer to $01ff" was truthful.  It likely is, but I'd like confirmation.

 

Looking at the memory map, if the stack lives at $0140-$01FF (counting backwards), we would certainly get into trouble if the stack pointer moved all the say down past $0140, as those shadows of Zero-page TIA/MARIA ports.  If the stack got that far, and we hit an RTS, we're not in RAM; we probably have weird values and hence can't RTS.

 

More questions:

When a JSR occurs in 6502, how many bytes are put on the stack?  I can't remember if it's 2 (Major byte / minor byte) or 3 (major byte / minor byte / status byte).  Again, a bit fuzzy on this stuff.  If it's "2", then there's some fascinating math here:

 

Between $0140 and $01FF there are (256*.75) = 192 bytes.

If we save 2 bytes every time we JSR, that means after 192/2 = 96 JSR's, our stack has overflowed.

 

James died his 93rd time on his run.  Assuming that each death causes a JMP that skips an RTS (and 2-byte stacks per JSR), he would have moved the pointer 93 times.  There were probably a handful of layers of JSR's already in progress, and that last call to animate the bat likely took it over the edge.

 

So, that's the theory I'm going with.  It should be easy to confirm by just playing/dying approximately 93 times and waiting for the bat to arrive and lock the screen (if someone has a few minutes), or by checking the stack pointer in the debugger (if there's a way to do that).

 

As far as debugging tools go, this sounds like another Canary situation we should check-- the stack accessing an unauthorized area.  Maybe that already exists; if not, I would suggest it since, well, it looks like what happened to me.

 

One last footnote-- my assumption has always been that the problem had to do with me writing too much data to the screen (more than one frame) so that MARIA was fighting with the main processor. I had always suspected a timing issue (DMA interrupting a "critical section" of code), which would have been a grueling thing to fix.  It's kind of interesting that MARIA's DMA interruptions are looking to be more robust than I thought all these years and may not be the problem at all.

 

BTW, I need to take a moment and specifically thank everyone for the help. The screenshots, video, and stack trace have given this problem more ammo to attack than I've had in years.  If this is really the issue (and my gut feeling is that it is), I could not have solved this without your help.  You may have just removed my biggest impediment, and that means a lot to me.

  • Like 2
Link to comment
Share on other sites

Adding on, I just reproduced what James saw by dying over 80 times.  I saw the same "red diagonal lines" screen and everything.

So, I think that's mounting evidence of the stack being a problem.

 

Unfortunately, I am somewhat convinced that there may be 2 bugs total-- the stack problem may just be one.

Why? Because I got the dreaded "freeze" while trying to reproduce the other.  It happened 46 seconds in.

 

Curiously enough, the bat's not on the screen-- I am guessing AnimateBat() froze the screen.

 

20210403_103041.jpg

 

But, this COULD be a red herring since I haven't fired up my Cuttle Cart 2 in years and maybe there's corruption on the SD card.  Or, maybe my build there is an older one that had a known issue.  I'm not sure.  Anyway, attaching that screenshot so I can dive into it later.  The Stack problem seems a good one to solve first.

Link to comment
Share on other sites

 

 

On 4/3/2021 at 10:17 AM, Propane13 said:

It's been a while since I coded in assembly and looked at the raw source, so forgive me for being rusty.  Are the debuggers today smart enough to see if that theory is true?  i.e. is my stack slowly growing after deaths (or other events), causing an overflow?

Yep, dying 8 times advances the stack 16 bytes, in the debugger. There's an unmatched jsr/rts, or similar. The dying is the key, and that bat routine is likely just the victim that happened to try and use the stack.

 

 

On 4/3/2021 at 10:17 AM, Propane13 said:

Question: What sets the stack's major byte to $01 (as in $01FF)?  Stack information doesn't seem to exist in the "7800 software guide". I may have just made assumptions that whoever had originated that "set stack pointer to $01ff" was truthful.  It likely is, but I'd like confirmation.

That hi byte is hard-wired into the 6502. Setting the stack pointer to $ff is correct start-up, and will place it at $1ff. jsr puts 2 bytes on the stack.

 

 

On 4/3/2021 at 10:17 AM, Propane13 said:

As far as debugging tools go, this sounds like another Canary situation we should check-- the stack accessing an unauthorized area.  Maybe that already exists; if not, I would suggest it since, well, it looks like what happened to me.

For sure, that's the whole idea behind the canary. it's a protector to alert you that your memory is about to be overwritten by unrestrained stack growth. Even if you're not using that memory, the canary will also protect against stack wrap around.

 

 

 

Link to comment
Share on other sites

Maybe the freeze is a different situation. It would be good if you setup the canary and brk protection that I described earlier, so we at least know more than the fact that the game froze. Bonus points if you can output the stack values to the screen in that situation.

Link to comment
Share on other sites

On 4/1/2021 at 2:55 PM, ZeroPage Homebrew said:

ZeroPage Homebrew is playing Get Lost! on tomorrow's (Fri Apr 2, 2021) stream LIVE on Twitch at 12PM PT | 3PM ET | 7PM GMT! Hope everyone can watch!

PLEASE NOTE SPECIAL START TIME OF 12PM (NOON) PT DUE TO HOLIDAY WEEKEND

 

 

Games:

876125025_20210402-LetsPlay.thumb.jpg.ea28489c280eb99b6a529dd441eaa743.jpg

Hey guys, thanks for showing Into The Void!  Yeah, it's still just a some rough demos but I'm planning to finish it in 2021.
-Steve
 

  • Like 2
Link to comment
Share on other sites

So... a small update:

 

1) I can compile again

2) I have a build where I fixed the stack issue and died over 100 times with no crash in emulation.

 

Before I post a new binary, I want to check a few things out.  I'd like to do RevEng's test on my CC2, and I have to figure out what build I have on there anyway (I *think* it's an older build).  I may also have added some garbage test code between the last released build and where I'm at now, so I need to do a bit of reading to make sure what I have isn't broken out of the gate.

 

I'm interested in getting Canary into the mix for debugging too, but it looks like I'll have to shuffle some bytes around to do that.  Get Lost is still just 32K, and it appears that some sections are pretty dense in code.  With a little luck, we'll see.

  • Like 6
Link to comment
Share on other sites

A quick post for those interested.

 

I've added the stack fix and addition of Canary checks.  Again, it's still likely there's a freeze here.

 

The canary check doesn't do a stack dump at this point; just the yellow/red screen:

20210407_getlost.bin

20210407_getlost.a78

Gotta crawl before we walk. :)

 

I'll play with my Cuttle Cart a bit on the weekend and see if it's still good to go or if the problem seems to lie therein.

  • Like 4
Link to comment
Share on other sites

10 minutes ago, Propane13 said:

A quick post for those interested.

 

I've added the stack fix and addition of Canary checks.  Again, it's still likely there's a freeze here.

This is incredible news that the game is being worked on again. I'm so excited at the prospect of playing it all the way to the end one day!

 

- James

  • Like 5
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...