Jump to content

Photo

Building DASM on Linux


40 replies to this topic

#1 1FF8 OFFLINE  

1FF8

    Space Invader

  • 30 posts

Posted Fri Jan 27, 2012 3:42 PM

Hi,

Possibly stupid question here, so the "for newbies" subforum seemed the logical place to post. I am half-tempted to hack around a bit on the 2600 as part of my second childhood, and as I'm completely new to this I had to start by getting a working toolchain. So I decided to see if I could get dasm 2.20.11 built on 64-bit Linux (Ubuntu 11.10). It doesn't build as shipped, but some quality time with Google identified the problem as dasm not being 64-bit clean and after getting the flags right it works (at least to the extent of assembling a .bin file for the first tutorial kernel that is bitwise identical to the one the author posted). The tweaks to the Makefile were extremely simple, but a gotcha or two made them more tedious to work out than they're worth (basically that the link step is hard-coded in the makefile, so it doesn't obey the LDFLAGS variable), and it would be nice to help the next person out while I remember what the issues were. Is the project maintained at all, so that the Makefile in the upstream could be tweaked? I'd be willing to upgrade the makefile a bit (such as using LDFLAGS :-), maybe getting "install" to work). I came across some discussion that made me wonder if it might not get much love--if not, is there a nice central place to put a post/thread/whatever about getting the toolchain set up on Linux and lower the bar just a bit?

I may also try ca65 at some point, and IIRC it didn't build when I gave it a quick try, so maybe I should ask the same question about it. But I'm not sure yet, because I haven't put any effort into it. DASM seemed like a logical starting point because most of the code out there that I've seen targets it, not ca65.

F8

#2 Tjoppen OFFLINE  

Tjoppen

    Chopper Commander

  • 199 posts

Posted Fri Jan 27, 2012 5:41 PM

Add -m32 to CFLAGS and possibly LDFLAGS IIRC.
The DASM source is a mess. It violates C99 among other things, hence issues like these.

#3 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Fri Jan 27, 2012 6:05 PM

Add -m32 to CFLAGS and possibly LDFLAGS IIRC.


Indeed, that *should* be sufficient, though it might be friendly to some people to point out that it is src/Makefile that needs patching, not the top-level one (yes, it's obvious, but only if you're used to unixy recursive makefiles). But the makefile rule for the executable is this:

dasm: $(OBJS)
$(CC) $(OBJS) -o dasm

which means they hard-coded the rule instead of doing it "the make way". It *should* have $(LDFLAGS) there too, though for testing I just stuck -m32 in because I wasn't sure where the problem lay yet (it's been a while since I hacked makefiles, and when possible I just use CMake instead).

Anyway, I'd hazard a guess that -m32 won't cause trouble on 32-bit platforms (I didn't check though), in which case at a minimum the makefile should just get patched upstream. Better would be to clean it up so it uses the built-in rules instead of half-baked stuff that doesn't work the unix way and be somewhat portable. So my question is whether there is any hope of getting a patch accepted anywhere, or at least having a sticky here. I can't be the first person to hit this, and I'd hate for someone to get turned away just because they aren't familiar with unixy tools like make so I'll help out if it can be done.

The reason I'm asking here is I saw some hints that the code isn't actively developed, and I thought maybe I should find out what the real situation is before I shoot off my mouth too much.

The DASM source is a mess. It violates C99 among other things, hence issues like these.


Fair enough, but patching the Makefile is a very simple thing that won't touch the rest of the code, no matter how crufty and bitrotted.

F8

#4 mgiuca OFFLINE  

mgiuca

    Space Invader

  • 10 posts

Posted Fri Jan 27, 2012 10:13 PM

Fantastic coincidence -- I was just going to build DASM on Linux 64 myself just now, and found this at the top of the forum. Indeed, the compiled binary was rubbish, and adding -m32 fixes it.

The last real commit in SVN was 2008, and the last mailing list post was 2009, so I assume this project is no longer maintained. It looks like Andrew Davie (who wrote the 2600 Programming For Newbies guide here) was maintaining it, but his site (http://www.atari2600.org/DASM/) is down at the moment as well.

#5 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Sat Jan 28, 2012 1:13 AM

The last real commit in SVN was 2008, and the last mailing list post was 2009, so I assume this project is no longer maintained. It looks like Andrew Davie (who wrote the 2600 Programming For Newbies guide here) was maintaining it, but his site (http://www.atari2600.org/DASM/) is down at the moment as well.


Yeah. I suppose someone should contact the listed maintainer though the sourceforge web page and see if he regards it as abandoned or not.

The fact that a former dasm maintainer advocates abandoning it is sort of compelling. OTOH, is there a good disassembler that outputs ca65 code? I gather that distella is the community's standard disassembler, and it appears to output dasm code. It appears that this community doesn't really have a complete healthy toolchain. :-(

F8

#6 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • 1,583 posts
  • Dr.Boo
  • Location:Tasmania

Posted Sat Jan 28, 2012 5:23 AM


The last real commit in SVN was 2008, and the last mailing list post was 2009, so I assume this project is no longer maintained. It looks like Andrew Davie (who wrote the 2600 Programming For Newbies guide here) was maintaining it, but his site (http://www.atari2600.org/DASM/) is down at the moment as well.


Yeah. I suppose someone should contact the listed maintainer though the sourceforge web page and see if he regards it as abandoned or not.

The fact that a former dasm maintainer advocates abandoning it is sort of compelling. OTOH, is there a good disassembler that outputs ca65 code? I gather that distella is the community's standard disassembler, and it appears to output dasm code. It appears that this community doesn't really have a complete healthy toolchain. :-(

F8



I consider DASM abandoned.
I would prefer people try to use CA65. I will support those efforts where I can.
I believe it to be a much better assembler and worth the short-term pain to understand it properly.
You are right about the sickness of the toolchain.
Cheers
A

#7 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Sat Jan 28, 2012 11:12 AM

I consider DASM abandoned.
I would prefer people try to use CA65. I will support those efforts where I can.


Sure, but it seems to me that there are some barriers. I'm a n00b on the 2600 but I do know something about programming and can maybe articulate them better than some others in the same position, so maybe it would be productive to discuss those barriers and see if there are things that can be done about them?

I believe it to be a much better assembler and worth the short-term pain to understand it properly.


The pain isn't a big deal for me--I've hardly touched assembly since doing some C= 64 hacking thirty years ago, so one isn't more familiar than another. However, I've been reading up on 2600 programming and what people have been up to since I sold my six-switcher ages ago ( :_( ), and the community still seems to have quite a few dependencies on dasm:

* I gather that the code disassemblies I've seen were made with distella and thus are in dasm syntax. That's a big deal for hacking existing cartridges and also for journeyman programmers learning the finer points of the trade from reading disassemblies of existing practice. Worse, distella appears to have the same maintainer as dasm and be in precisely the same state of neglect. Is there another disassembler?

* I gather the batari Basic compiler targets dasm. Can it also produce output for ca65? If not, that means the beginners with the narrowest tool choices (those not yet capable of using assembly at all) *must* depend on dasm.

* All the tutorials I've seen specify dasm syntax/directives (even yours :-) ). That means that newcomers are dependent on dasm unless they have a fair amount of assembly experience. If they can tweak the directives a bit to get tutorials to assemble on ca65, they're probably almost at the point where they don't need the tutorial, right?

That's why I started by trying to get dasm to compile--I had already found your thread advocating moving to ca65, and I downloaded it to examine, but it appeared that even if I intended it to be my main assembler as you suggest I'd need dasm available at least for a while. I have barely touched assembly for thirty years and didn't do that much even back then, but I *am* a programmer and might be able to translate the dasm-dependent bits of your excellent tutorials (I know I have to change the code location syntax, hopefully not much more), but even if that works I'll probably need to have dasm available for the reasons given above. For example, even though basic is one of my least-favorite languages, and I'm quite comfortable starting with good old 6502 assembly (which I regard as quite friendly), if I continue hacking on this stuff I may learn batari Basic so I can look at the assembly it produces for clues as to how to perform common tasks (same as you'd do with C elsewhere). If bB targets dasm, as I suspect, then I still need it. And of course at some point I'll want to be able to read distella output to shamelessly steal ideas learn the finer points. ;)

Again, I'm just going with what I could learn so far, so please correct me if I'm wrong, but it looks like the transition is easiest for those with some experience, and the community is dependent on dasm for fresh blood. Can anything be done about that? Real fixes would be a lot of work (though the community may have to do it for its own survival, and while it might not be as much fun it shouldn't be more work than many of the good homebrew games were), but it seems like a simple band-aid would be to produce a new dasm tarball with a sane makefile. Knowing something about unix make is really not a skill the community wants to require of potential new programmers, especially when the fix is so trivial. I thought about just getting out my make book to remind myself of the best fixes for some of the infelicities (such as the hard-coded link step, which I believe is a work-around that has a better solution, I just don't quite remember what it is) and offering a tarball for people to check out. I didn't mainly because I was terrified I'd immediately be named the new dasm maintainer. :-o

You are right about the sickness of the toolchain.


So what's going to be done about it? It's clear the community has the skills and time to fix it if it wills, and my initial impression is that taking control of your own toolchain is close to becoming an existential necessity. No toolchain means no future (which is why gcc was, correctly, one of the very first gnu projects). I cannot believe that you all can produce games like Thrust+ and Juno First (which are awesome, better than virtually any game I can think of produced commercially--it's like being a kid again, thus my .sig line), software tools like batari Basic and especially stella, hardware tools like the Harmony cartridge, and so on but can't protect the rest of your toolchain if necessary. It's, well, "inconceivable." :-)

[digression]
This discussion reminds me of an old project idea I had. It never went anywhere at least partly because assemblers have a vanishingly small potential userbase, but this community might have actually appreciated the idea. I actually don't like traditional assembly syntax, and have never understood why people don't treat assembly like they treat other languages. Assembly is inherently difficult, but that isn't an excuse for adding to the difficulty with what I regard as amazingly poor choices of conventions. So I, like others, would like an assembly language designed with the same criteria as higher level languages, and certainly this has been done many times. But usually I'm not quite happy with the result, so I was tempted to give it yet another try (and in any event, I'd learn a great deal by trying my hand at it). I also wanted to see how much advanced compiler technology could be applied to the problem without losing the one-to-one correspondence with machine code or the ability to control every byte of output manually (which is actually not usually necessary for people optimizing high-level language projects with bits of assembly, but but appears to me to be always necessary for the 2600--I don't see how you can even afford the overhead of a normal function call with stack parameters).

Anyway, the beginning the project was to be the disassembler part, and the core syntax would be developed and debugged there for maximum readability before being implemented in the corresponding assembler and extended. That's the kind of tool that would benefit this community, I think. 2600 programming would also bet the ultimate reality test of whether such a language has succeeded in *not* hiding the kinds of issues that even intermediate-level languages usually hide, because you guys program closer to the metal than anyone I can think of short of the JPL guys still taking care of spacecraft launched decades ago.

I kind of wish I'd done that project, I'd happily give it away if there was an actual potential userbase. There aren't many places where low-level tools would still be appreciated, but this is one.
[/digression]

F8

#8 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Sat Jan 28, 2012 11:22 AM

Yeah. I suppose someone should contact the listed maintainer though the sourceforge web page and see if he regards it as abandoned or not.


FWIW, I managed to get access to my old Sourceforge account and asked the current maintainer if he'd take some simple makefile patches. We'll see what happens.

F8

#9 stephena OFFLINE  

stephena

    River Patroller

  • 2,513 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Sat Jan 28, 2012 2:30 PM

So what's going to be done about it? It's clear the community has the skills and time to fix it if it wills, and my initial impression is that taking control of your own toolchain is close to becoming an existential necessity. No toolchain means no future (which is why gcc was, correctly, one of the very first gnu projects). I cannot believe that you all can produce games like Thrust+ and Juno First (which are awesome, better than virtually any game I can think of produced commercially--it's like being a kid again, thus my .sig line), software tools like batari Basic and especially stella, hardware tools like the Harmony cartridge, and so on but can't protect the rest of your toolchain if necessary. It's, well, "inconceivable." :-)


I suspect people just don't have the time to do it. In my experience, most progress in this (relatively small) community comes from a few people (or even just one person) deciding that they want something fixed, and go about doing it themselves. And while they get good feedback and some help, eventually most projects end up being one or two-man jobs. I personally work on Stella, and I don't really have time for anything else. I would assume the same is true of Batari (Batari Basic and Harmony Cart), not to mention the individual homebrewers (where creating a viable game is enough work to keep them busy).

I would say that nobody has come to the point where they say "look, I need Dasm and Distella to work, so I'm just going to fix it myself'. And until someone comes along and has time to do it, it's not going to happen.

That being said, I worked pretty extensively on the Distella codebase, in adding it to Stella for its built-in debugger. So right now, the code has been converted to C++ that is 32/64-bit clean and platform-agnostic. Sure, you'd have to rip it out of Stella again, but I suspect that it will be easier to do that than to start from the current C code. As for Dasm, I have no experience with it, so I can't comment any further.

#10 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Sat Jan 28, 2012 6:36 PM

I suspect people just don't have the time to do it. In my experience, most progress in this (relatively small) community comes from a few people (or even just one person) deciding that they want something fixed, and go about doing it themselves.


Yeah, that's essentially the open-source way, even though this isn't precisely an open-source community. It's close in many ways, though.

That being said, I worked pretty extensively on the Distella codebase, in adding it to Stella for its built-in debugger. So right now, the code has been converted to C++ that is 32/64-bit clean and platform-agnostic. Sure, you'd have to rip it out of Stella again, but I suspect that it will be easier to do that than to start from the current C code. As for Dasm, I have no experience with it, so I can't comment any further.


Now this is interesting. I gather C++ may bother some, but I prefer it as long as it's well-written. If stella depends on a distella fork (in fact I wondered what code was doing the disassembly in debug mode), then it seems to me that that fork is the more mission-critical piece for the community already. If it were partitioned out into a library, that would be decent. If it were a dynamically loadable module, that would be even better. Just for curiosities' sake, how modular is stella? Could stella define a disassembler interface so that anything that obeys the interface could be plugged in? I don't really know anything about the internals of either stella or a disassembler, but naively it almost seems that all stella really needs is to be able to say "here is a block of machine code of this length, give me the text" as often as necessary. Or, perhaps, say "here is a code pointer, give me the text for that instruction and the new code pointer location." Either way, that's a pretty simple interface.

Actually there might be use for some community-wide architectural thinking like what happened in the Linux pro-audio community. There one guy identified the architecture necessary to bring together all the existing pieces, and the critical pieces that would make that architecture an ecosystem that would attract the rest of the projects to become pieces within that framework. Then he just wrote the pieces that enabled the ecosystem. There might not be that single guy with an overarching architectural vision here, but it's still a useful lesson. It might have some relevance here, if there are things that would benefit from being able to talk to each other. Maybe not, but if so you're a small enough community to be able to work it out between the projects.

F8

#11 stephena OFFLINE  

stephena

    River Patroller

  • 2,513 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Sat Jan 28, 2012 7:16 PM

Now this is interesting. I gather C++ may bother some, but I prefer it as long as it's well-written. If stella depends on a distella fork (in fact I wondered what code was doing the disassembly in debug mode), then it seems to me that that fork is the more mission-critical piece for the community already. If it were partitioned out into a library, that would be decent. If it were a dynamically loadable module, that would be even better. Just for curiosities' sake, how modular is stella? Could stella define a disassembler interface so that anything that obeys the interface could be plugged in?


Stella itself is 'class-ified' very well in my opinion. The code I mention can be found in DiStella.hxx and DiStella.cxx. I removed all 7800 support and most of the commandline argument handling, but it would be easy enough to add all that back. I also added several new distella 'directives', which describe the data in different ways (CODE, ROW, PGFX, GFX, DATA, etc).

I don't really know anything about the internals of either stella or a disassembler, but naively it almost seems that all stella really needs is to be able to say "here is a block of machine code of this length, give me the text" as often as necessary. Or, perhaps, say "here is a code pointer, give me the text for that instruction and the new code pointer location." Either way, that's a pretty simple interface.


That's the general idea, but it's a little more complicated than that. The disassembly in Stella is actually quite advanced, in that it has both a static and a dynamic component. For the dynamic part, each byte of ROM is marked as it's being executed. So for example, if an address is ever part of the PC, then it is marked as CODE. If it's ever loaded into the GRPx or playfield registers, it's marked as GFX and PGFX, respectively. And of course this is done for each bank in a multi-bank ROM.

This dynamic approach works fine for code that is actually referenced, but it doesn't necessarily cover all the addresses. This is where Distella comes in. It does a static analysis. That is, it disassembles from a starting point up until it can't go any further.

The way that Stella improves on this is to augment the static analysis with runtime info from the dynamic component. Upon disassembling in Distella, it passes in info about 'start points', labels, what's been marked as CODE and GFX, etc. This part obviously can't happen in a standalone Distella, so it would have to be stubbed out. But IMHO it is what makes the disassembly in Stella so versatile, so I have to question how useful Distella would be without it.

#12 mgiuca OFFLINE  

mgiuca

    Space Invader

  • 10 posts

Posted Sat Jan 28, 2012 7:42 PM

I consider DASM abandoned.
I would prefer people try to use CA65. I will support those efforts where I can.
I believe it to be a much better assembler and worth the short-term pain to understand it properly.
You are right about the sickness of the toolchain.


Okay, I take your point. But, having read through the rest of the conversation, it looks like CA65 is more advanced, but DASM is still needed by the community.

So in the true open source way, and just trying to have a "fix it quickly" solution, I have made a public fix to the Makefile to add the -m32 switch.

I've opened up a project on Launchpad (https://code.launchpad.net/dasm), which is a code hosting service that includes the ability to automatically import code from other repositories. Note that this is NOT an attempt to take over maintenance of DASM, but merely the creation of a public place where people can contribute public patches to DASM without requiring the permission of the project owner. The branch located at:
https://code.launchp...iuca/dasm/trunk
will be kept up-to-date automatically by Launchpad (by pulling from SourceForge), so it will always be identical to the Subversion (but converted to the more modern Bazaar revision control system).
I have also created a branch here:
https://code.launchp...iuca/dasm/32bit
which contains the one-line Makefile fix proposed by Tjoppen.
Anybody who wants to can create their own branches of DASM and push them into Launchpad, just as I did.

Thus, the easiest way to build DASM on a 64-bit platform is to download my branch. You will need the Bazaar revision control system. Then, just type:
bzr branch lp:~mgiuca/dasm/32bit dasm
and that will download my patched version of DASM.

As far as I can tell, the 32-bit version of DASM is working fine on 64-bit Linux (though it isn't passing the test cases; perhaps they are old).

If there is enough interest, we could build a release on Launchpad to avoid requiring that people use Bazaar to download the code.

FWIW, I managed to get access to my old Sourceforge account and asked the current maintainer if he'd take some simple makefile patches. We'll see what happens.


I realise DASM has had a long history of forks, and I didn't want to contribute to the mess, but if the trunk on SourceForge is no longer being maintained, then this may be a way forward. If the maintainer of the trunk on SourceForge responds and is willing to accept patches, then these branches can go away.

Edited by mgiuca, Sat Jan 28, 2012 7:42 PM.


#13 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Sun Jan 29, 2012 2:11 AM

Okay, I take your point. But, having read through the rest of the conversation, it looks like CA65 is more advanced, but DASM is still needed by the community.


Just to keep blame where blame is due, I'm the only one who actually said dasm still looks needed, and I'm a newcomer poking around and making guesses. At best we can say nobody has yet said I'm wrong, but they haven't said I've guessed right either.

So in the true open source way, and just trying to have a "fix it quickly" solution, I have made a public fix to the Makefile to add the -m32 switch.


Heh. I seem to recall a talk in which the Cinepaint maintainer said that is precisely how he became the Cinepaint maintainer.

F8

#14 mgiuca OFFLINE  

mgiuca

    Space Invader

  • 10 posts

Posted Sun Jan 29, 2012 3:46 AM

Just to keep blame where blame is due, I'm the only one who actually said dasm still looks needed, and I'm a newcomer poking around and making guesses. At best we can say nobody has yet said I'm wrong, but they haven't said I've guessed right either.

Not trying to blame anyone. I think DASM, at least right now, is the easiest way to start 2600 programming. At least for the reasons you stated: tutorials mention it, and distella outputs it. So I'm happy to do a small amount of work to maintain it.

Heh. I seem to recall a talk in which the Cinepaint maintainer said that is precisely how he became the Cinepaint maintainer

He he.

#15 Andrew Davie OFFLINE  

Andrew Davie

    Stargunner

  • 1,583 posts
  • Dr.Boo
  • Location:Tasmania

Posted Sun Jan 29, 2012 6:36 AM

There comes a time when the best thing to do, for everyone concerned, is to turn off the life support systems.

#16 stephena OFFLINE  

stephena

    River Patroller

  • 2,513 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Sun Jan 29, 2012 7:17 AM

So in the true open source way, and just trying to have a "fix it quickly" solution, I have made a public fix to the Makefile to add the -m32 switch.

Heh. I seem to recall a talk in which the Cinepaint maintainer said that is precisely how he became the Cinepaint maintainer.


And that is exactly how I eventually became Stella maintainer. Here it is 11 years later, and I've very hesitant to imply such an offer again.

#17 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Sun Jan 29, 2012 3:16 PM

The way that Stella improves on this is to augment the static analysis with runtime info from the dynamic component. Upon disassembling in Distella, it passes in info about 'start points', labels, what's been marked as CODE and GFX, etc. This part obviously can't happen in a standalone Distella, so it would have to be stubbed out. But IMHO it is what makes the disassembly in Stella so versatile, so I have to question how useful Distella would be without it.


This is extremely interesting. I had wondered how well a disassembler could work (never having used one) on a van Neumann machine, since IIRC distinguishing between code and data is equivalent to the halting problem (I didn't check to see if that's correct, but handwaving suggests it is)--when and if there is a difference, since those are not exclusive categories. You have answered the question--my gut feeling was right, and the best disassembler has to execute the code.

I've barely scratched the surface of Stella's amazing array of abilities. Can it output the disassembly in a format suitable for re-assembly? If not, could it? It seems that rather than considering breaking the code out, stella itself is really the only thing capable of being the community's standard disassembler. And if it could do so in ca65 format, that would remove one of the legacy dependencies on dasm. I doubt that is a feasible thing for you to add to your plate, I'm just thinking through the dependencies on dasm I found as a beginner. And if there is a well-defined interface, it would at least be easier for someone else to hack on the disassembler without impacting the rest of the codebase.

Really just thinking to myself here, but thanks very much for extending my understanding of the whole disassembly problem.

F8

#18 stephena OFFLINE  

stephena

    River Patroller

  • 2,513 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Sun Jan 29, 2012 5:06 PM


The way that Stella improves on this is to augment the static analysis with runtime info from the dynamic component. Upon disassembling in Distella, it passes in info about 'start points', labels, what's been marked as CODE and GFX, etc. This part obviously can't happen in a standalone Distella, so it would have to be stubbed out. But IMHO it is what makes the disassembly in Stella so versatile, so I have to question how useful Distella would be without it.


This is extremely interesting. I had wondered how well a disassembler could work (never having used one) on a van Neumann machine, since IIRC distinguishing between code and data is equivalent to the halting problem (I didn't check to see if that's correct, but handwaving suggests it is)--when and if there is a difference, since those are not exclusive categories. You have answered the question--my gut feeling was right, and the best disassembler has to execute the code.


Well, in some sense you always have to 'execute' the code. That's what Distella does; it 'executes' code until it can't go any further. That is, it starts from a given point, assumes this is the current PC, and keeps disassembling and running instructions from there. Of course this breaks down once you hit a conditional jump, since the condition can't actually be evaluated unless you have access to the entire state of the machine. And since it's a static analysis, there is no state, so it can't really go any further. And you typically only have one point to start from (the reset vector), so the analysis is quite incomplete.

Stella augments this by storing a list of start points each time it enters the debugger. Each of these start points is then used to do multiple disassembly passes. And of course you have other areas of addresses where Stella knows exactly what they are, and Distella just uses these as-is.

I've barely scratched the surface of Stella's amazing array of abilities. Can it output the disassembly in a format suitable for re-assembly? If not, could it? It seems that rather than considering breaking the code out, stella itself is really the only thing capable of being the community's standard disassembler. And if it could do so in ca65 format, that would remove one of the legacy dependencies on dasm. I doubt that is a feasible thing for you to add to your plate, I'm just thinking through the dependencies on dasm I found as a beginner. And if there is a well-defined interface, it would at least be easier for someone else to hack on the disassembler without impacting the rest of the codebase.

Really just thinking to myself here, but thanks very much for extending my understanding of the whole disassembly problem.

F8


There are commands in Stella to generate 'config' files, which are simply descriptions of address ranges that Distella uses when doing a static analysis. The idea being that it stores the runtime analysis from Stella so that the static analysis is that much more complete. I suppose Stella could be modified to take over this role completely, but it's not something I considered working on. My main goal was to get accurate output for the in-game debugger, not necessarily in getting that info back out again. But I guess it's possible.

#19 mgiuca OFFLINE  

mgiuca

    Space Invader

  • 10 posts

Posted Mon Jan 30, 2012 12:00 AM

This is extremely interesting. I had wondered how well a disassembler could work (never having used one) on a van Neumann machine, since IIRC distinguishing between code and data is equivalent to the halting problem (I didn't check to see if that's correct, but handwaving suggests it is)--when and if there is a difference, since those are not exclusive categories. You have answered the question--my gut feeling was right, and the best disassembler has to execute the code.

That's a good way to phrase the problem. My understanding is that you are right -- distinguishing between code and data in the general case on a von Neumann machine is equivalent to the halting problem and hence impossible to do perfectly. However, a program that exercises a strict subset of the von Neumann capabilities should be possible to statically analyse perfectly without any dynamic analysis. To be specific about this strict subset, a 6502 program would have to:
  • Never modify its own source code (not an issue on Atari 2600),
  • Never branch to or allow the program counter to land on a memory location that is considered to be data,
  • Never branch to the middle of an instruction,
  • Never use an indirect jump or indirect addressing mode,
  • Never directly (with an absolute addressing mode) read memory from a location that is considered to be code,
  • Have no code in the zero page (because any zero page addressing instruction could access any data in the zero page),
  • Have no code that is less than 256 bytes after the start of an array (where an array is a location accessed by an (absolute,x) or (absolute,y) instruction).
I may have missed some, but it seems if those rules are followed, a static analysis (not necessarily the Stella one, I haven't studied it) could infer with 100% accuracy the code/data status of each byte in the program. The above rules guarantee that every memory access occurs at either a known address, or within 256 bytes of one (in the case of the (absolute,x) or (absolute,y) addressing modes). As long as there is no code within this range of uncertainty, a hypothetical static analysis could observe every reachable instruction and infer all of the data that it could possibly read or write, and the two regions would be entirely separate.

I'm not sure how many games conform to the above rules, but they do not seem overly unrealistic. The only one that might be difficult to satisfy is the last one, but that can be satisfied by never having code after data. I'm also not sure why it's hard to cope with conditional branches -- couldn't the analysis simply mark both the subsequent and the target instruction as code, and continue analysing both? Anyway, interesting discussion.

#20 stephena OFFLINE  

stephena

    River Patroller

  • 2,513 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Mon Jan 30, 2012 8:30 AM

I'm not sure how many games conform to the above rules, but they do not seem overly unrealistic. The only one that might be difficult to satisfy is the last one, but that can be satisfied by never having code after data. I'm also not sure why it's hard to cope with conditional branches -- couldn't the analysis simply mark both the subsequent and the target instruction as code, and continue analysing both? Anyway, interesting discussion.


Oops, I meant to say unconditional jump. In some sense, when you reach one of those you reach the 'end' of the current routine. How can you tell what immediately comes after it is code, data, or garbage?

To be specific about this strict subset, a 6502 program would have to:

  • Never modify its own source code (not an issue on Atari 2600),
  • Never branch to or allow the program counter to land on a memory location that is considered to be data,
  • Never branch to the middle of an instruction,
  • Never use an indirect jump or indirect addressing mode,
  • Never directly (with an absolute addressing mode) read memory from a location that is considered to be code,
  • Have no code in the zero page (because any zero page addressing instruction could access any data in the zero page),
  • Have no code that is less than 256 bytes after the start of an array (where an array is a location accessed by an (absolute,x) or (absolute,y) instruction).


Some notes on your observations:
  • Code can be self-modifying if stored in ZP RAM or one of the other RAM schemes.
  • There can be addresses that are used as both code and data depending on how they're accessed (to save bytes). In Stella, CODE takes priority of DATA and GFX, but they can still be used as both.
  • Not sure about this one.
  • There are indirect jumps and addressing modes in most code.
  • Addresses can sometimes be read from what would normally be CODE areas as GFX. The most famous example is Yars Revenge, where the entire starfield (read into either GRPx or PFx registers) is the actual code. So in some sense you're seeing the source code on the screen.
  • Code can be stored in ZP (in the RAM).
  • Not completely sure about this one, but I'm reasonably sure it happens.


#21 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Mon Jan 30, 2012 5:12 PM

That's a good way to phrase the problem. My understanding is that you are right -- distinguishing between code and data in the general case on a von Neumann machine is equivalent to the halting problem and hence impossible to do perfectly. However, a program that exercises a strict subset of the von Neumann capabilities should be possible to statically analyse perfectly without any dynamic analysis.


Yes, a von Neumann machine can emulate any machine, so it can emulate a Harvard machine. :-)

To be specific about this strict subset, a 6502 program would have to:

  • Never modify its own source code (not an issue on Atari 2600)


Generally, yes having the program in ROM makes it kinda-sorta Harvard-like. But in full generality, I'm not so sure. 2600 programming seems to have involved such extreme optimization that I wouldn't bet people didn't squirrel little bits of code in RAM just so they could be re-written. That might be a good way to avoid short conditional branches in a display kernel, mightn't it? Just compute a couple of instructions (it has to branch out of RAM, at minimum) during your game logic and then execute them unconditionally in the time-critical sections.

  • Never branch to or allow the program counter to land on a memory location that is considered to be data


Oddly enough, I just read this:

"Bob Whitehead moved one of his subroutines so that it ended just before a block of sprite data....In this case, the first line of the sprite data was the hexadecimal value $60, which also happens to be the machine reference for the opcode RTS." -- Racing the Beam, p103.

  • Never branch to the middle of an instruction,


I don't know if there are useful instances of that or not.

  • Never use an indirect jump or indirect addressing mode,


I would think that not using indirect addressing would be an intolerable limitation. Not sure about indirect jumps, but code pointers are super useful (a much more usable way for a program to set up the code to be executed during display, for example).

  • Never directly (with an absolute addressing mode) read memory from a location that is considered to be code,


I don't know if there are other examples "in the wild" besides Yar's Revenge.

  • Have no code in the zero page (because any zero page addressing instruction could access any data in the zero page),


For a cartridge without extra ram, any code in ram would be code in zero page. Maybe if it were only accessed through one of the mirrors?

  • Have no code that is less than 256 bytes after the start of an array (where an array is a location accessed by an (absolute,x) or (absolute,y) instruction).


This sounds intolerable on such a memory-constrained system.

Anyway, interesting discussion.


It's certainly educational to me.

F8

#22 mgiuca OFFLINE  

mgiuca

    Space Invader

  • 10 posts

Posted Mon Jan 30, 2012 5:49 PM

Thanks for both your responses. From reading them, and also doing a bit of investigation into a couple of Atari games, it seems my demands were probably unreasonable. I'm used to doing static analysis on nice clean high-level languages (like Haskell), where you can make a lot of assumptions (like "code is code"). In assembly language, you can theoretically write "nice clean" code that follows my assumptions, but I'm beginning to see that the machine is so constrained that everyone resorts to dodgy tricks which wreck a static analysis. It sure was a different time!

Oops, I meant to say unconditional jump. In some sense, when you reach one of those you reach the 'end' of the current routine. How can you tell what immediately comes after it is code, data, or garbage?


Oh, right. You can't. But I don't really see that as a problem. I assume every program point would be marked as "garbage" by default, unless the analyser sees that the PC goes there, or sees that it could be accessed by a data operation. So whatever comes after an unconditional jump would just be garbage unless some other code jumped to it, and then it would be code. For example, a while loop usually ends with an unconditional jump, but then the top of the loop would jump to the line that comes after it, so eventually it would all be considered code.

Some notes on your observations:

  • Code can be self-modifying if stored in ZP RAM or one of the other RAM schemes.


Ah, is that RAM on the cartridge or something? What 2600 games store RAM on the cartridge?

  • Not sure about this one.


("Never branch to the middle of an instruction")
By this one, I meant that, say an instruction was 2 bytes long. If you were to branch to the 2nd byte, you would see a totally different instruction, and as you kept stepping through, you would continue to execute a (completely nonsense) program until one of the instructions happened to land on one of the original instructions' boundaries. I can't imagine that doing this would ever be useful, but it is certainly something that could happen.

  • Addresses can sometimes be read from what would normally be CODE areas as GFX. The most famous example is Yars Revenge, where the entire starfield (read into either GRPx or PFx registers) is the actual code. So in some sense you're seeing the source code on the screen.


Wow, awesome! So it's like seeing Matrix code? Neo could be playing Yars' Revenge and actually be reading the code as it goes by. So I assume Stella marks that as CODE then.

Generally, yes having the program in ROM makes it kinda-sorta Harvard-like. But in full generality, I'm not so sure. 2600 programming seems to have involved such extreme optimization that I wouldn't bet people didn't squirrel little bits of code in RAM just so they could be re-written. That might be a good way to avoid short conditional branches in a display kernel, mightn't it? Just compute a couple of instructions (it has to branch out of RAM, at minimum) during your game logic and then execute them unconditionally in the time-critical sections.

Yeah, I guess so. And I suppose that means people have done it.

Oddly enough, I just read this:

"Bob Whitehead moved one of his subroutines so that it ended just before a block of sprite data....In this case, the first line of the sprite data was the hexadecimal value $60, which also happens to be the machine reference for the opcode RTS." -- Racing the Beam, p103.

Hah, crazy. Yeah I've definitely got to read Racing the Beam. I just ordered it on Amazon. They said it would be here in six weeks --- maaaan.

For a cartridge without extra ram, any code in ram would be code in zero page. Maybe if it were only accessed through one of the mirrors?

I assume by "mirrors" you mean the fact that the top 3 bits of an address are ignored, so each physical memory location actually has 8 distinct addresses?
Hmm, I was thinking that using mirrors could help static analysis a lot. But on further thinking, it probably wouldn't, because the static analysis would need to know about mirrors and consider all of those addresses to refer to the same location.

  • Have no code that is less than 256 bytes after the start of an array (where an array is a location accessed by an (absolute,x) or (absolute,y) instruction).


This sounds intolerable on such a memory-constrained system.


Hmm, I'm not sure. Is there any space saving reason why you'd need to move code after data? Why not just have all the code up front, then have data? Assuming you aren't going to be doing other fancy tricks like we discussed above.

#23 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 7,778 posts
  • Busy bee!
  • Location:North, England

Posted Mon Jan 30, 2012 6:20 PM

("Never branch to the middle of an instruction")
By this one, I meant that, say an instruction was 2 bytes long. If you were to branch to the 2nd byte, you would see a totally different instruction, and as you kept stepping through, you would continue to execute a (completely nonsense) program until one of the instructions happened to land on one of the original instructions' boundaries. I can't imagine that doing this would ever be useful, but it is certainly something that could happen.


You might want to look at this thread :-

http://www.atariage....02-killer-hacks

#24 stephena OFFLINE  

stephena

    River Patroller

  • 2,513 posts
  • Stella maintainer
  • Location:Newfoundland, Canada

Posted Mon Jan 30, 2012 6:53 PM


Oops, I meant to say unconditional jump. In some sense, when you reach one of those you reach the 'end' of the current routine. How can you tell what immediately comes after it is code, data, or garbage?


Oh, right. You can't. But I don't really see that as a problem. I assume every program point would be marked as "garbage" by default, unless the analyser sees that the PC goes there, or sees that it could be accessed by a data operation. So whatever comes after an unconditional jump would just be garbage unless some other code jumped to it, and then it would be code. For example, a while loop usually ends with an unconditional jump, but then the top of the loop would jump to the line that comes after it, so eventually it would all be considered code.


Right, but my point was that a standalone Distella analysis only has one start point (the reset vector), so any code not reached during that one pass will never be reached. That's why Stella keeps a list of start points; the disassembly actually gets more accurate the longer you run the program (or more to the point, the more code paths you execute in the program).

Ah, is that RAM on the cartridge or something? What 2600 games store RAM on the cartridge?


ZP RAM is zero-page RAM (below $1000). There are several other RAM schemes that map the RAM (with separate read and write ports) directly into ROM space (above $1000). The Superchip variations are the most common, but there are others.

  • Not sure about this one.


("Never branch to the middle of an instruction")
By this one, I meant that, say an instruction was 2 bytes long. If you were to branch to the 2nd byte, you would see a totally different instruction, and as you kept stepping through, you would continue to execute a (completely nonsense) program until one of the instructions happened to land on one of the original instructions' boundaries. I can't imagine that doing this would ever be useful, but it is certainly something that could happen.


I understand what you meant, just that I'm not aware of any ROM that does it. But it can happen.

  • Addresses can sometimes be read from what would normally be CODE areas as GFX. The most famous example is Yars Revenge, where the entire starfield (read into either GRPx or PFx registers) is the actual code. So in some sense you're seeing the source code on the screen.


Wow, awesome! So it's like seeing Matrix code? Neo could be playing Yars' Revenge and actually be reading the code as it goes by. So I assume Stella marks that as CODE then.


Yes, CODE sections take priority over GFX. At some point I may extend the disassembly UI to show that fact (internally, the disassembler stores all the info, but it isn't presented to the UI). I'm not sure it's common enough to warrant the work required to do so, though.

#25 1FF8 OFFLINE  

1FF8

    Space Invader

  • Topic Starter
  • 30 posts

Posted Mon Jan 30, 2012 7:01 PM

I'm used to doing static analysis on nice clean high-level languages (like Haskell), where you can make a lot of assumptions (like "code is code").


Haskell is about as far from machine code as it's possible to be without just talking to the computer in English. :-) And actually, come to think of it Haskell is further than English, because English has a rich and expressive vocabulary about state. It's hard to overstate how misleading Haskell would be in guessing about machine code. Let's see--for starters, the core of Haskell starts with the absolute determination to eliminate all traces of explicit state (or at least imprison it in monads where decent folk don't have to see it), and going to any length to provide alternative abstractions. But assembly is all about state and nothing but state, with no other abstractions allowed unless you implement them by hand--with state!

Haskell: the anti-assembly language. :grin:

I'm totally comfortable in C, which is a lot closer, but actually still far too high level. For one thing, C subroutines still have hygenic, completely well-behaved parameters, which is maybe the single most useful abstraction you can have if you can only have one.

In assembly language, you can theoretically write "nice clean" code that follows my assumptions, but I'm beginning to see that the machine is so constrained that everyone resorts to dodgy tricks which wreck a static analysis.


I don't think they're dodgy, not in the context of the 2600. That's like saying that soldiers' gear is gauche because it would look gauche at a Hollywood dinner party. In the context of combat, the rules for Hollywood aren't relevant. :-)

But you're totally right about the constraint. If I were writing x86 assembly to run on a Linux machine, even a slow one, I'd naturally use proper function calls with parameters pushed on the stack (or wherever the platform spec specifies). First, I have to call the C library to do anything but invisibly flip bits on unix, and second I desperately need any simplifying abstraction I can get away with. I can afford hygenic parameters in my assembly on any machine capable of running the Linux kernel.

But on the 2600, it appears to me that you're seldom or never going to be able to afford pushing parameters (IIRC the 6502 is slow with stack gymnastics anyway), so you *have no subroutines*. Just stretches of code you can call with jsr, but with all parameters global and implicit (at best, the comments will document what registers you trash and what global state you mess with). Forget monads, and algebraic reasoning, and statelessness--*you have no subroutines (at least in your display code).* Think about *that*.

I think that's the attraction. A high-level language defines a virtual machine for which you write code--basically, we decided we couldn't handle what the machine really is (a little state machine with only very simple operations--the VAX excepted, that's a state machine with hideously complex operations :grin:), so we imagined a machine we could write for more easily and then simulated that easier machine. Then we wrote libraries to extend the vocabulary of primitives and make it even easier. That is to say, we couldn't handle reality and asked for a blue pill, please. (If you code in Haskell, you asked for a continuous IV feed of unreality drug. ;) ) By contrast, the 2600 is the purest machine for which there is an audience (well, we could quibble about microcontrollers, but people rarely appreciate the code in an embedded device). There isn't any firmware, nothing but what you put then. And you can't afford to lie to yourself about the machine, so you must face up to what the machine really is. For a certain kind of programmer, it's the ultimate red pill.

Well, OK, at least *I* appreciate that. And I think Ed Fries is right that, oddly enough, that kind of extreme difficulty is conducive to art.

  • Not sure about this one.


("Never branch to the middle of an instruction")
By this one, I meant that, say an instruction was 2 bytes long. If you were to branch to the 2nd byte, you would see a totally different instruction, and as you kept stepping through, you would continue to execute a (completely nonsense) program until one of the instructions happened to land on one of the original instructions' boundaries. I can't imagine that doing this would ever be useful, but it is certainly something that could happen.


I don't claim to be good enough to do it, but I can certainly imagine it being useful. Here is another example of "Heroic age" programming, the story of Mel:

http://www.cs.utah.e...lklore/mel.html

Mel wouldn't have any problem with finding a way to re-use at least a couple of instructions that way. By his standards, the 2600 is for softies.

Hmm...you know, it could be automated--you could write a tool to search for instances of very short routines in other code, in fact it could be done automatically in an optimizing pass. I'd write it by searching for bytes with the opcode for jsr and then then comparing backwards with what I need. If you write code with a lot of subroutines with only a few instructions (which, granted, may not describe 2600 coding specifically), you might eventually get a hit. That's pretty much how you'd do object-oriented programming in assembly--you often end up writing tiny little methods (granted, this is definitely not suitable for the 2600). I've written more than my share of C++ methods that did nothing but increment a field, and that might only be an inc, a two byte address, and an rts. It doesn't seem that unlikely to find four specific bytes (or whatever it ends up being when you dereference an object pointer, my 6502 isn't good enough to visualize it without paper) in unrelated data.

While those might be unlikely things to do on a 2600, that's also the only place so memory-constrained that the payoff would be worth it.

Say--has anyone proposed writing a cartridge optimizer for the 2600? That would be a rather neat hack. The result would be incomprehensible when disassembled, but that's OK.

Wow, awesome! So it's like seeing Matrix code? Neo could be playing Yars' Revenge and actually be reading the code as it goes by.


Yeah--the code that is actually executing to create his enemies at that second. :-o

Yeah, I guess so. And I suppose that means people have done it.


I suppose so, at least once. Maybe just to impress their coworkers, at a minimum.

I assume by "mirrors" you mean the fact that the top 3 bits of an address are ignored, so each physical memory location actually has 8 distinct addresses?


That's what I meant, though I'm not experienced enough to know whether what I said precisely made sense. That's why I'm posting in the beginner forum. :-D


  • Have no code that is less than 256 bytes after the start of an array (where an array is a location accessed by an (absolute,x) or (absolute,y) instruction).


This sounds intolerable on such a memory-constrained system.


Hmm, I'm not sure. Is there any space saving reason why you'd need to move code after data? Why not just have all the code up front, then have data?


Because you're putting routines in front of data that happens to be RTS, and thus interleaving code and data. :-D

Assuming you aren't going to be doing other fancy tricks like we discussed above.

O

Oh, well, if we can just assume that people aren't going to do the tricks the platform is renowned for, I assume that I was just given the exhaustively documented original source and don't need a disassembler. Done. ;)

F8




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users