mgiuca, on Mon Jan 30, 2012 5:49 PM, said:
I'm used to doing static analysis on nice clean high-level languages (like Haskell), where you can make a lot of assumptions (like "code is code").
Haskell is about as far from machine code as it's possible to be without just talking to the computer in English. :-) And actually, come to think of it Haskell is further than English, because English has a rich and expressive vocabulary about state. It's hard to overstate how misleading Haskell would be in guessing about machine code. Let's see--for starters, the core of Haskell starts with the absolute determination to eliminate all traces of explicit state (or at least imprison it in monads where decent folk don't have to see it), and going to any length to provide alternative abstractions. But assembly is all about state and nothing but state, with no other abstractions allowed unless you implement them by hand--with state!
Haskell: the anti-assembly language.
I'm totally comfortable in C, which is a lot closer, but actually still far too high level. For one thing, C subroutines still have hygenic, completely well-behaved parameters, which is maybe the single most useful abstraction you can have if you can only have one.
In assembly language, you can theoretically write "nice clean" code that follows my assumptions, but I'm beginning to see that the machine is so constrained that everyone resorts to dodgy tricks which wreck a static analysis.
I don't think they're dodgy, not in the context of the 2600. That's like saying that soldiers' gear is gauche because it would look gauche at a Hollywood dinner party. In the context of combat, the rules for Hollywood aren't relevant. :-)
But you're totally right about the constraint. If I were writing x86 assembly to run on a Linux machine, even a slow one, I'd naturally use proper function calls with parameters pushed on the stack (or wherever the platform spec specifies). First, I have to call the C library to do anything but invisibly flip bits on unix, and second I desperately need any simplifying abstraction I can get away with. I can afford hygenic parameters in my assembly on any machine capable of running the Linux kernel.
But on the 2600, it appears to me that you're seldom or never going to be able to afford pushing parameters (IIRC the 6502 is slow with stack gymnastics anyway), so you *have no subroutines*. Just stretches of code you can call with jsr, but with all parameters global and implicit (at best, the comments will document what registers you trash and what global state you mess with). Forget monads, and algebraic reasoning, and statelessness--*you have no subroutines (at least in your display code).* Think about *that*.
I think that's the attraction. A high-level language defines a virtual machine for which you write code--basically, we decided we couldn't handle what the machine really is (a little state machine with only very simple operations--the VAX excepted, that's a state machine with hideously complex operations
), so we imagined a machine we could write for more easily and then simulated that easier machine. Then we wrote libraries to extend the vocabulary of primitives and make it even easier. That is to say, we couldn't handle reality and asked for a blue pill, please. (If you code in Haskell, you asked for a continuous IV feed of unreality drug.
) By contrast, the 2600 is the purest machine for which there is an audience (well, we could quibble about microcontrollers, but people rarely appreciate the code in an embedded device). There isn't any firmware, nothing but what you put then. And you can't afford to lie to yourself about the machine, so you must face up to what the machine really is. For a certain kind of programmer, it's the ultimate red pill.
Well, OK, at least *I* appreciate that. And I think Ed Fries is right that, oddly enough, that kind of extreme difficulty is conducive to art.
stephena, on Mon Jan 30, 2012 8:30 AM, said:
("Never branch to the middle of an instruction")
By this one, I meant that, say an instruction was 2 bytes long. If you were to branch to the 2nd byte, you would see a totally different instruction, and as you kept stepping through, you would continue to execute a (completely nonsense) program until one of the instructions happened to land on one of the original instructions' boundaries. I can't imagine that doing this would ever be useful, but it is certainly something that could happen.
I don't claim to be good enough to do it, but I can certainly imagine it being useful. Here is another example of "Heroic age" programming, the story of Mel:
Mel wouldn't have any problem with finding a way to re-use at least a couple of instructions that way. By his standards, the 2600 is for softies.
Hmm...you know, it could be automated--you could write a tool to search for instances of very short routines in other code, in fact it could be done automatically in an optimizing pass. I'd write it by searching for bytes with the opcode for jsr and then then comparing backwards with what I need. If you write code with a lot of subroutines with only a few instructions (which, granted, may not describe 2600 coding specifically), you might eventually get a hit. That's pretty much how you'd do object-oriented programming in assembly--you often end up writing tiny little methods (granted, this is definitely not suitable for the 2600). I've written more than my share of C++ methods that did nothing but increment a field, and that might only be an inc, a two byte address, and an rts. It doesn't seem that unlikely to find four specific bytes (or whatever it ends up being when you dereference an object pointer, my 6502 isn't good enough to visualize it without paper) in unrelated data.
While those might be unlikely things to do on a 2600, that's also the only place so memory-constrained that the payoff would be worth it.
Say--has anyone proposed writing a cartridge optimizer for the 2600? That would be a rather neat hack. The result would be incomprehensible when disassembled, but that's OK.
Wow, awesome! So it's like seeing Matrix code? Neo could be playing Yars' Revenge and actually be reading the code as it goes by.
Yeah--the code that is actually executing to create his enemies at that second.
Yeah, I guess so. And I suppose that means people have done it.
I suppose so, at least once. Maybe just to impress their coworkers, at a minimum.
I assume by "mirrors" you mean the fact that the top 3 bits of an address are ignored, so each physical memory location actually has 8 distinct addresses?
That's what I meant, though I'm not experienced enough to know whether what I said precisely made sense. That's why I'm posting in the beginner forum.
- Have no code that is less than 256 bytes after the start of an array (where an array is a location accessed by an (absolute,x) or (absolute,y) instruction).
This sounds intolerable on such a memory-constrained system.
Hmm, I'm not sure. Is there any space saving reason why you'd need to move code after data? Why not just have all the code up front, then have data?
Because you're putting routines in front of data that happens to be RTS, and thus interleaving code and data.
Assuming you aren't going to be doing other fancy tricks like we discussed above.
Oh, well, if we can just assume that people aren't going to do the tricks the platform is renowned for, I assume that I was just given the exhaustively documented original source and don't need a disassembler. Done.