Jump to content





Porting Elite to the 2600

Posted by TROGDOR, 29 November 2009 · 1,218 views

After spending a couple days researching Elite, I'm convinced this game could be ported to the Atari 2600 and still retain most of its original look and feel. Here's what the original Elite looked like on the Commodore 64:

Posted Image
And here's a prototype screenshot on the Atari 2600 (this is from an emulator, not a mockup):

Posted Image
The Bitmap

The first challenge of the port is implementing a high-resolution bitmap. This can be achieved by using the 30Hz text display from Stellar Track. The 12 characters from this display provides a resolution of 12 x 8 = 96 pixels per scanline. The next question is how tall should the display be. This is defined by how large of a video buffer can be provided and maintained by the hardware. I chose a 1K buffer as a reasonable target. 1024 / 12 bytes per line = 85 scanlines. This would fit nicely in the 96 scanline resolution of a 2-scanlines-per-pixel Atari display. Using square pixels also simplifies the rendering math.

To simplify the kernel code, it would be preferable to keep each column of buffer bytes in its own page. 84 x 3 = 252, so using 84 scanlines you can fit 3 1-byte columns in a single page. Multiply that by 4, and you've got 12 characters spanning 84 scanlines, so the final resolution is 96 x 84 pixels. The size of the bitmap can be seen inside the yellow border in the prototype image.

The Hardware

The 2600 only has 128 bytes of RAM, so it's going to need some help. The only current options for hardware that would provide a 1K RAM buffer are the Supercharger and an M-Network cartridge. The Supercharger only allows for a 6K game without multi-loading, and I don't think 6K would be sufficient to do an Elite port justice. Writes to Supercharger memory are also slower, if my conclusions from studying the Supercharger are correct.

The M-Network architecture, on the other hand, is just about optimal for this scenario. The 16K of ROM provides plenty of room to make a detailed game. The 1K continuous block of RAM is perfect for the high-resolution buffer, and allows quick read and write access to the entire buffer. The extra 1K of split RAM would be useful as scratch memory for the 3D calculations, and also as extra memory to store details about your ship configuration and cargo.

Another hardware consideration is the CPU. The main processor for the Commodore 64 is a MOS 6510 running at 1.02 MHz. The Atari CPU is a MOS 6507 running at 1.19 MHz. It's surprising that the Atari CPU, released in 1977, is clocked 19% faster than the Commodore CPU, which was released 5 years later in 1982. Other than the I/O port and the extra address pins, I'm not aware of any 6510 feature that would make it more powerful than a 6507. So the Atari starts out with a 19% advantage over the Commodore.

The big disadvantage for the Atari is having to process the kernel. This effectively excludes 192 of the 262 scanlines from any kind of render processing, resulting in a (192/262) = 73 percent loss of processing power. So only 27% of the processor can be dedicated to rendering. When the faster Atari clock is factored in, you get 27 * 1.19 = 32 percent speed relative to the Commodore processing. So it can only handle about one third the processing load.

Video Buffer Sizes

The upside of the small Atari video buffer is that there are fewer pixels to render. The Commodore buffer used 256 pixels by about 140 pixels, for a total of 4480 bytes. The Atari is using less than 1024 bytes, so it has less than a quarter of the pixels to render. This matches well with the one-third processing power.

Rendering

The main functions needed to display the images are:

3D Rotation
Projection from a 3D object to a 2D plane
Hidden surface removal
Line drawing
Circle drawing

I found some online resources that could really help with these functions. The most authoritative source is one of the original authors, Ian Bell, who posted the entire source code for Elite! This source was written for a BBC Micro, which also uses a MOS 6502. The only problem is there are very few comments in the code, and it's written in some kind of BASIC wrapper around the assembly code.

While googling for 6502 3D algorithms, I happened upon a great 3D tutorial written specifically for the 6502. If you scroll down to the "art of 3d" section, you'll find three links to C= Hacking issues. It's a 3 part study titled "A Different Perspective: Three-Dimensional Graphics on the C64." It includes a very detailed discussion of projections, rotations, line drawing, circle drawing, and hidden surfaces. It also provides full assembly implementations of these algorithms, and optimizing suggestions. I highly recommend it.

Code Organization

Using the M-Network bankswitching scheme, here's how I think the implementation would work.

ROM bank 7 (slice 1)

On system boot, the code would immediately bankswitch to bank 0. Bank 7 would be entirely reserved for only routines that need to access the 1K RAM buffer. This is necessary since none of the other banks can access this RAM directly, and it would be far too slow to use bank switching to set the video buffer or read it for kernel processing. So, the only code that would go here is the bit-mapped section of the kernel (which is actually very small, less than 200 bytes,) and the line drawing and circle drawing routines, which would need direct buffer writing access. Hopefully they could be squeezed into 1.5K.

ROM banks 0-6 (slice 0)

All the other routines, including projection, rotation, and surface removal, would go here. The only data that would have to be passed to slice 1 is an array of line and circle definitions that need to be drawn to the screen. This array could be stored in one of the 256 byte RAM sections, which are also available to slice 1. Other ROM banks would include system boot, sound effects, text display, trade interfaces, and all the 3D data for the various ships. It would also include the non-bitmapped bottom portion of the ship kernel, which would include radar and ship status meters.

Concessions

I think the Atari could keep up on the Video RAM drawing routines. I'm not so sure it could keep up on the 3D rendering. I don't know what the ratio of necessary compute power is between these two functions. However, there are several concessions that could be made to reduce the 3D rendering load on the Atari. The most obvious is to reduce the number of lines and polygons on the ships. For the ship displayed above, the rear of the ship consists of an 8-sided polygon, plus 4 4-sided polygons for the thrust ports. The 8-sided polygon could be reduced to a 6-sided or even 4-sided polygon (a trapezoid) and still retain the look of the original ship. The two smaller thrust ports could also be removed. Small alterations like this could save significant processing time.

Another optimization is to greatly reduce the polygon count when the ship is farther away. I'm not sure if the original Elite games did this, other than turning them into dots when they are very far away. For the Atari version, any ship that is sufficiently far away can be reduced to just 4 points, essentially a pyramid, of varying shapes and sizes to match their corresponding ship.

Other Platforms

There are two other systems that came to mind while I was doing this study. The first is the Channel F. Oddly enough, its hardware setup is great for 3D rendering. It already has dedicated RAM for a screen buffer, and its resolution of 102 X 58 is comparable to the Atari buffer, minus the flicker. Plus, its CPU operates at 1.79 MHz, so it has the most processing power and the smallest video buffer to render. I know that video RAM access is slow in the Channel F, but this still sounds like a possibility.

The other system of interest is the Vectrex. This system is begging to have an Elite port. Graphics rendering would be a breeze on this system. The Vectrex is powered by a Motorola 68A09 at 1.5 MHz, which is far more powerful than the Commodore CPU. Elite ported to this system could look better than any of the original 8-bit versions. A Google search produced no evidence of any effort to port Elite to the Vectrex.

The Prototype

Attached File  elite_study.zip (5.01KB)
downloads: 192

The prototype binary I'm posting here is quite simple. It doesn't contain any 3D rendering. The ship is just a static image loaded from ROM. What it does demonstrate is the size and appearance of the bitmap, and the simplicity of the kernel. The only difference between this bitmap kernel and an actual M-Network bitmap kernel is that it would point to a different location in memory, corresponding to the M-Network RAM, rather than ROM. The VideoRAM[0-11] addresses in the source would become RAM pointers.

I haven't implemented this in an M-Network bankswitching scheme yet. That will require more research, since the M-Network switching method is fairly complicated.




Doing Elite seems like a huge challenge. If you can get the graphics right, everything else should be "just" a lot of labor (compared to the initial problem).

The hardware choice seems optimal. With the coming Melody cart, hardware for this should be no problem anymore.

Some problems you have to overcome:
- With just 96x84 pixel your resolution is pretty limited. Maybe you have to simplify some ships.
- The aspect ratio of a pixel is not 1:2, more like 1:1.6. So probably you will have to compensate for that.
- How are you going to display the crosshair? With the bitmap or with dedicated objects?
- The 3D radar screen also looks like a real challenge. I suppose the 84 lines are only meant for the space screen, right? Maybe you have to reduce that to ~75 lines to have enough space for the radar, which IMO is essential.
- Only partially updating the screen buffer between frames will result into visible problems. Some kind of double-buffering seems necessary.

What visible screen size are you aiming for? The more the better, but the less free CPU time you will have. Alternatively, you can slightly reduce the frame rate, by e.g. display 270 total lines.

You should use the reduced resolution to your advantage. E.g. like you describe, reducing the ships to points. This can be done earlier than with a full resolution.

For the 3D calculations you should try a table based approach. Everything else seems way too slow. Maybe the original source code can help.

For hidden-line removal you fortunately only have to calculate Z-part of the surface direction. ;)

BTW: Elite became 25 this year.
  • Report
What an impressive project! Count me in for a copy! ;)

Other than the I/O port and the extra address pins, I'm not aware of any 6510 feature that would make it more powerful than a 6507.


Interrupts.
  • Report
It just occurred to me, that only for clearing the 1k RAM, you will need at least 5000 cycles. That's almost the CPU time of a whole standard frame ((262-192)*76 = 5320)! ;)

And if 20% of the pixels of the buffer are set and you need ~20 cycles (which is pretty optimistic IMO) to set one pixel, this will require another ~4000 cycles.

Then you still have to do all the 3D coordinates calculations.

So maybe a frame rate of 20Hz would work, but unless you want to display two empty frames in between, you have to double buffer. And then you need another 1k RAM.
  • Report
Thanks for the feedback guys.

Thomas, here's what I came up with for the issues you mentioned:

- With just 96x84 pixel your resolution is pretty limited. Maybe you have to simplify some ships.

I don't think the resolution will affect the ship design. The one thing I am considering is making the ships appear larger in the Atari version so they show detail sooner. I'd have to see what they look like in a rendered. To be sure, at farther distances when they're only 8 pixels wide they will look more like filled polygons.

- The aspect ratio of a pixel is not 1:2, more like 1:1.6. So probably you will have to compensate for that.

Hmmm. I'm not seeing that. I looked at an individual double-height pixel for a while and they look pretty square to me.

- How are you going to display the crosshair? With the bitmap or with dedicated objects?

Bitmapped would be easier. It can be drawn very quickly into the video buffer, and then I won't have to mess with it during the kernel.

- The 3D radar screen also looks like a real challenge. I suppose the 84 lines are only meant for the space screen, right? Maybe you have to reduce that to ~75 lines to have enough space for the radar, which IMO is essential.

For the radar, I would deviate from the original design. I'm thinking 2 side-by-side radar boxes at 16 x 32 pixels (half-height pixels) The left radar would show a front-on view, so you'd see the X and Y axises. This would be primarily used for targeting. The right display radar would show a top-down view, so you'd see the X and Z axises. The dots on the radar would be color coded to help you correlate them between the two displays. This should be sufficient to show their position in 3-space.

- Only partially updating the screen buffer between frames will result into visible problems. Some kind of double-buffering seems necessary.

It just occurred to me, that only for clearing the 1k RAM, you will need at least 5000 cycles. That's almost the CPU time of a whole standard frame ((262-192)*76 = 5320)! ;)

And if 20% of the pixels of the buffer are set and you need ~20 cycles (which is pretty optimistic IMO) to set one pixel, this will require another ~4000 cycles.

Then you still have to do all the 3D coordinates calculations.

So maybe a frame rate of 20Hz would work, but unless you want to display two empty frames in between, you have to double buffer. And then you need another 1k RAM.


The buffer clearing is bad, but not quite that bad. Remember that the memory "writes" are done with LDA, not STA, so it's only 4 cycles per pixel using absolute,X mode. By writing 16 pixels per loop, the overhead of the loop can be reduced. So the entire buffer could be cleared in about 4300 cycles.

Another trick I can use is based on the fact that the screen is displayed at 30Hz. I'm considering changing the display so that all the pixels on the left are displayed in one frame, and all the pixels on the right are displayed in the next frame. I'll have to see if the flicker is bearable.

If I use that setup, I buy some time. I can use 512 bytes to buffer only the left side of the screen, then switch to that buffer instantly. For the right side of the screen, I will have 2 frames to update it directly in video memory, since it's only displayed ever other frame. 2000 cycles would be used for clearing the right side of the buffer, so there's ~8000 cycles left for line drawing.

The overall refresh rate would vary, similar to a modern 3D game. For simple displays, it would refresh every 2 frames. For complex displays, it may only refresh every 16 frames. This applies to the buffer only. The display will always be shown every 30Hz, to prevent excess flicker. It will just be showing an "old" buffer.

I need to get to work now. I'll comment more later.
  • Report

The buffer clearing is bad, but not quite that bad. Remember that the memory "writes" are done with LDA, not STA, so it's only 4 cycles per pixel using absolute,X mode.

Are you sure that you write with LDA? Where did you get that info from?
  • Report


The buffer clearing is bad, but not quite that bad. Remember that the memory "writes" are done with LDA, not STA, so it's only 4 cycles per pixel using absolute,X mode.

Are you sure that you write with LDA? Where did you get that info from?


The LDA could be used to trigger the correct write address, but you're right, the STA is necessary to correctly populate the data bus.

LDA might actually work if you loaded zero into the accumulator and did a STA to populate the data bus, and then followed it by running all the LDAs to write zeros to the buffer. This would leave the databus in the Z state, likely retaining the previous zero. But that's sketchy. Whether it would work would depend on the hardware implementation, and how the emulators chose to handle it, so I'll avoid using this.
  • Report
I suggest you to start with a demo, e.g. drawing an animated 3D-cube or Elite ship. That should give you an idea about the necessary task and the required cycles.

After doing some pseudo code, I suppose setting one pixel will cost ~35 cycles, not 20. So my estimated 4000 cycles will now change into 7000 cycles.
  • Report

I suggest you to start with a demo, e.g. drawing an animated 3D-cube or Elite ship. That should give you an idea about the necessary task and the required cycles.

After doing some pseudo code, I suppose setting one pixel will cost ~35 cycles, not 20. So my estimated 4000 cycles will now change into 7000 cycles.

Hi Thomas,

The first thing I'll need is a working M-Network implementation. I've started looking at Wickeycolumbus' E7 Template. Hopefully I can get that working this weekend. Once I have the bitmap running in actual RAM, I'll try implementing a line drawing routine.

Your cycle estimate is accurate. The C= Hacking article mentioned above gave an estimate of 38 cycles average per point plotted:

When combined with the earlier line drawing routine, this gives an average
time of 38 cycles or so (with a best time of 34 cycles); six of those cycles
are for PHA and PLA, since the line drawing routine uses A for other things.

Like most of the code, you can improve on this method if you think about it
a little. Most of the time is spent checking the special cases, so how can
you avoid them? Maybe if we do another article, we'll show you our
solution(s).

Fortunately, the third installment of the 3D article offers a clever optimization for line drawing that will reduce this value. But even a highly optimized solution will probably average 30 cycles per pixel.

The biggest problem is going to be drawing long lines, because they contain the most pixels. The demo image above contains ~700 pixels. A single line that spans the width of the buffer is 96 pixels, and it takes a long time to draw.

The half screen buffer method will get me down to ~350 pixels, but it will still be tough. On the bright side, this image is a worst case scenario. Most of the time the ships will be farther back, and therefore smaller and less effort to draw. My first demo will be something simple, like a box. You can't go wrong with a box.

The fall-back plan if I can't get this half-buffer solution to work would be to use two full 1K buffers, or reduce the screen size to 64 pixel height, so it would fit in 768 bytes, with 2 3-page buffers. Then I could spend multiple frames updating the buffer without worrying about a half-drawn buffer.

Or I could just try using half-drawn buffers, and see if that still looks acceptable. If you're spending 8 frames to draw a complex display, it might not be too bad if 1 of those 8 frames is only partially drawn.

The reality is I can't use the entire 2K of RAM on just screen buffers. Significant memory will still be needed for other aspects of the game. For example, every ship is going to need:

6 bytes RAM to define its position (X,Y,Z) x 2
3 bytes RAM to define its rotational orientation (X,Y,Z)
Either 1 or 3 bytes RAM to define its velocity (Either X,Y,Z velocity, or a 1-byte scalar that incorporates the rotational orientation to define the velocity vector)

And at least 256 bytes RAM will be needed to define all the lines necessary to be drawn for the display, with 4 bytes defining a line (X0,Y0 to X1,Y1). The demo ship is 41 lines, but I will likely reduce that down to 27 lines by making the back of the ship a trapezoid.

Using the Melody cart, I could probably just throw RAM at the problem. An ideal bank-switching solution for me would be:

3 slices
slice 1 - A 2K block of selectable ROM, from 8 2K sections
slice 2 - A 512 byte block of selectable RAM that uses 1K of addresses (for read and write), selectable from 8 512 byte sections, for a total of 4K RAM.
slice 3 - A fixed 1K block of ROM for common routines and to define the bankswitching for the other slices.

I took an upper-level computer graphics course back in college in the mid 90s that will help with some of this work. For the final project, we had to implement a 3D wire-frame object moving around the screen. Of course I chose to design a spaceship. But the class used these expensive high-end 60Mhz Pentium machines, so performance optimization wasn't much of a concern.
  • Report
It should be easy for the ARM to provide some hardware assistance while maintaining the 'feel' of raw 6502 coding. For example, a cart could provide a bank-switch mode in which fetching a byte from RAM would stuff a 0 into the RAM while putting A9 on the data bus (the hardware to do that on a real cart wouldn't be hard; my 4A50 cart CPLD could provide such a mode if I went to the next larger CPLD). To clear a range of memory, one would simply bank it into an area which was followed by the code one wanted to execute after the "clear". Voila--memory cleared at a rate of one cycle per byte.

Incidentally, my 4A50 cart supports some rather nice put-pixel abilities. To set a pixel whose coordinates are in X and Y:
  lda $1F00,x
  ora $1E00,y
  sta $1E00,y
Nice and quick and easy.

Incidentally, you could use a 108-pixel-wide bitmap.
  • Report
I had some fun doing some real line draw coding on the 2600 over the weekend, 35 cycles/pixel turned out to be a very good guess. One could save maybe 3 cycles by using self-modified code.

Regarding double buffering, I had a closer look at C64 Elite again. It seams that is it XORing the pixels. Which means, instead of clearing the RAM, you can draw the same line(s) again and erase them that way. No double buffer needed in this case. Clearing the RAM takes ~5000-5500 cycles, you can clear maybe 140-150 pixel with XOR in the same time. Not too much.

The 2600 Elite ship displayed above consists out of about 650 pixel, so drawing and erasing it once with XOR will take ~45500 cycles, which is about 600 scan lines, which equals the free CPU time of maybe 8-10 frames.

Now I wonder if there are any news about this. ;)
  • Report
I hope I didn't kill your blog. :)
  • Report