Boulder Dash® Developers' Diary

+Andrew Davie · January 27, 2008

Follow this thread for updates to progress on Atari 2600 Boulder Dash®

The 2600 doesn't have enough processing 'grunt' to implement the game the game the same way as all other implementations appear to. That is, have a large board, and scan it from top to bottom, left to right. There simply isn't enough time available to do this rapidly enough. So, the engine I've developed uses a stack-based system which holds all active objects in a small stack and intelligently determines what objects should be active. For example, butterflies, player, firefly are always active, but mostly boulders and diamonds are inactive so never need to be processed on a frame-by-frame basis. By maintaining an active stack, no board scanning is necessary.

Boulders and diamonds become active when they start to fall, of course, and that's one of the reasons the game can slow down when under heavy loads. It's pretty good, though, and not a real issue during game playing.

The original game relies on the scanning method for correct operation of objects. That is, boulders at the top of the 'board' are always processed before boulders at the bottom of the 'board'. It's possible, for example, to run upwards past the left of a firefly and not get killed. This is an important part of how the original works, and this behaviour needs to be duplicated in the Atari 2600 version.

But because I'm using a first-in-first-off stack-based system, I don't get this correct processing order; in fact I get an (essentially) arbitrary ordering of creature processing, depending on what gets added to the stack as a result of what.

So, the major task left for me is to fix this. And that's really just a matter of sorting the stack such that objects are processed in the same order as they would be if the board was being scanned. It's not exactly the most difficult of tasks, but sorting can be an incredibly slow process -- and I have to implement this in such a way that it can be divided up into small time "slices" so that the slices can be executed while the screen is drawing. I've been thinking about this for a month or so, just never getting around to actually doing it.

That's what I'm working on now. I've been rather slow getting this done, possibly because I know it's the final hurdle, but also because I'm a bit afraid that implementing a sort will kill the performance.

Meanwhile, Thomas is working hard on improving the scrolling engine. The layout used for representing the board spreads it out, 1 row per RAM bank. This has great potential advantages for vertical scrolling, as rather than redrawing the ENTIRE visual screen (as my version currently does) every time there's a vertial scroll, it's possible to simply change the ordering of the rows. However, this does require an extra row to draw into (over multiple frames) so that we don't get flicker when scrolling. This is almost working, Thomas has been working hard on this for a couple of weeks.

Of course, horizontal scrolling still uses the old system (which redraws only every CHANGED character).

There are various 'superstructure' items to come, basically dealing with loss of life, scoring and when to display score or diamond count, etc. These should come along in the next few weeks.

Thomas Jentzsch · January 28, 2008

Meanwhile, Thomas is working hard on improving the scrolling engine. The layout used for representing the board spreads it out, 1 row per RAM bank. This has great potential advantages for vertical scrolling, as rather than redrawing the ENTIRE visual screen (as my version currently does) every time there's a vertial scroll, it's possible to simply change the ordering of the rows.
However, this does require an extra row to draw into (over multiple frames) so that we don't get flicker when scrolling. This is almost working, Thomas has been working hard on this for a couple of weeks.

Not so much hard work, mainly wondering about solutions and alternatives.

Our screen is 8 rows high, which is a nice number as it makes programming at certain points much easier. The hard scroll (as I call it) works quite nicely already, but like Andrew said, it still produces glitches. I already ordered the drawing so that the new row gets processed last, but those *** glitches are still very noticable.

Those glichtes occur due to the objects of the new scrolling in row have to be drawn into a screen buffer (from there they get displayed during the kernel). This draw either has to happen before or after the scroll. If you do it before the scroll (as we do now), the new graphics data shows up for ~1 frame at the row at opposite side of the screen (the row which will be replaced with the new one). If you draw after the scroll, the new row will (also for ~1 frame) show the data from the old, scrolled out row from the opposite side (I hope this makes sense). This is unavoidable, since redrawing a complete row often takes longer than one frame.

I first thought those glitches would be too short to be noticed, but it turned out that I was wrong.

So the prefered idea now is, to add a 9th row and only display 8 of them. So when scrolling, we first fill the currently unused row with the new graphics data and then do the scroll. This will work perfectly at the cost of one additional whole RAM bank. And people who know me better will understand that such a "waste" really hurts my programmer ego.

Another idea that occured to me is, that the glitches will disappear if we temporarily blank the affected row. So instead of glitches, we get (minimal) flicker. I suppose this will be almost unnoticable, definitely much less noticable than the current glitches (though I could be wrong here too :ponder: ).

So before finally switching to 9 rows, I will at least give the alternative a chance.

There are some more problems of hard scrolling, e.g. vertical moving objects might look like jumping up and down during the scroll, but that's a different story...

Edited January 28, 2008 by Thomas Jentzsch

Cybergoth · January 28, 2008

You possibly know that already, but this issue of C= Hacking has some pretty cool articles regarding sorting:

http://www.ffd2.com/fridge/chacking/c=hacking18.txt

supercat · January 29, 2008

You possibly know that already, but this issue of C= Hacking has some pretty cool articles regarding sorting:
http://www.ffd2.com/fridge/chacking/c=hacking18.txt

Incidentally, "counting sort" is essentially the same as radix sort. A couple of interesting things to note about it:

-1- If one sorts by the least-significant byte of data, and then by the next least significant byte, etc. until one reaches the most significant byte, one will end up with a fully sorted data set in time proportional to the number of items (and the number of passes).

-2- Radix sorting is a comparatively simple method for sorting punched cards, and in fact was the primary method for automated sorting until the advent of the electronic computer. I don't know if any place still uses card sorters (the last time I saw one in use was probably around 1997) but they are quite something to see.

happymonster · January 29, 2008

Can I ask how you are drawing the tiles? Are you reading from the rom and writing each 4 pixel character for each row on a scanline basis?

supercat · January 29, 2008

Any possibility of "shimmering" the diamonds? It shouldn't require too many cycles. Not sure how the memory usage would work out in 3E; in 4A50 there would be no problem.

The approach I'd suggest for shimmering would be to use 6x9 bytes to hold a rough bitmap of which 'character cells' are diamonds. The six bytes per row would have bits set to indicate where diamonds were in two values each for PF0, PF1, and PF2. To generate the 'shimmer', one would use something like:

shimmer_all_diamonds:
 ldx #8; Number of character rows -1
 ldy shimmer_subrow; scan line within character
shimmerLp:
 lda shimPF0a,x
 eor PF0Abitmap,y
 sta PF0Aitmap,y
 lda shimPF1a,x
 eor PF1Abitmap,y
 sta PF1Abitmap,y
 lda shimPF2a,x
 eor PF2Abitmap,y
 sta PF2Abitmap,y
 lda shimPF0b,x
 eor PF0Bbitmap,y
 sta PF0Bbitmap,y
 lda shimPF1b,x
 eor PF1Bbitmap,y
 sta PF1Bbitmap,y
 lda shimPF2b,x
 eor PF2Bbitmap,y
 sta PF2Bbitmap,y
 dex
 bpl shimmerLp

I'd guess it would work well to change one scan line of the diamonds per frame (i.e. running the code once). Figure five frames of adding a 'red' pixel to the diamonds (which would otherwise be green+blue), then five frames of removing a red pixel, etc. That would make all the diamonds shimmer at 6Hz. Doing that would add about 700 cycles/frame overhead. Not trivial, but hardly outrageous. The biggest caveat would be that when drawing diamonds one would need to be careful to draw them in the correct state. That would be balanced, however, by the fact that one would not have to redraw them for purposes of animation.

+Andrew Davie · January 30, 2008

Can I ask how you are drawing the tiles? Are you reading from the rom and writing each 4 pixel character for each row on a scanline basis?

No. Each 'character' row has its own bank. In that bank is reserved enough space to hold a bitmap representation of the screen. That is, 6 bytes/line * 21 lines. This bitmap data is copied direct to the screen when drawing, in other words a 1:1 correspondence with the PF0/1/2 writes.

When drawing 'tiles' (or 'characters', same thing), their shape data is copied into the correct position in the bitmap itself. So when scrolling, all of the bitmap needs to be corrected. But it's typically not necessary to redraw every tile; many tiles remain the same -- and so an 'only when changing' draw is performed. Also, there are various optimisations used to speed this process up (eg: only masking the adjacent character when drawing to half of a PF byte when absolutely necessary).

So in summary, a complete bitmap representation of the entire screen is maintained in RAM, split over multiple banks, one bank per character row. Characters are drawn into the bitmap. Bitmap is used to draw the screen.

happymonster · January 30, 2008

And for each of the 21 lines per character row, would it not be faster if you did a check to see if that line had any gfx and then skipped if not? For example, the earth gfx only uses the orange scan lines, the new diamonds only use blue and white and air doesn't use any lines at all.

You may do something like this already, which is why I asked. If so, I wonder if like with the diamonds it's possible to optimise the enemy colours to save time?

+Andrew Davie · January 30, 2008

And for each of the 21 lines per character row, would it not be faster if you did a check to see if that line had any gfx and then skipped if not? For example, the earth gfx only uses the orange scan lines, the new diamonds only use blue and white and air doesn't use any lines at all.

You may do something like this already, which is why I asked. If so, I wonder if like with the diamonds it's possible to optimise the enemy colours to save time?

No, it wouldn't be faster. You still have to erase what was there before. This is achieved by copying the entire new character in, even if it consists of stripes of colour.

Thomas Jentzsch · January 30, 2008

Doing that would add about 700 cycles/frame overhead.

Since CPU time is the limiting factor, 700 extra cycles/frame would slow down the whole game more than 10%.

happymonster · January 30, 2008

No, it wouldn't be faster. You still have to erase what was there before. This is achieved by copying the entire new character in, even if it consists of stripes of colour.

If you have an old tile of air and the new tile is earth, you only have to update the 7 orange lines. Or to reverse the process, overwrite the 7 orange lines rather than the full 21.

Is the check slower than doing the extra writes?

Thomas Jentzsch · January 30, 2008

If you have an old tile of air and the new tile is earth, you only have to update the 7 orange lines. Or to reverse the process, overwrite the 7 orange lines rather than the full 21.

Is the check slower than doing the extra writes?

Hard to say.

At least the check would add quite a lot of overhead and make the code more complicated. And the drawing is already pretty complicated.

While we might save 50% of drawing time for certain objects, but at the cost of adding some overhead to all objects. So in some scenarios we might be faster, but in some we will be slower.

happymonster · January 30, 2008

That's true. It just depends on how fast the check can be done I guess. But it would save time on earth-air-diamond combinations. Just an idea I had!

Thomas Jentzsch · January 30, 2008

That's true. It just depends on how fast the check can be done I guess. But it would save time on earth-air-diamond combinations. Just an idea I had!

Only earth-air and diamonds/air would profit.

A more general approach would be to calculate a table for each object combination and then only update different rows.

Some pseudo code:

; copy graphics data into draw buffer, central loop:
.loop:
 lda Graphics,y
 sta Buffer,y
 lda NextY,y
 tay
 bpl .loop

Given our current, pretty much optimized drawing code, I am not sure if it would save cycles.

happymonster · January 30, 2008

You would know better than myself. Just thought it might help you.

vdub_bobby · January 30, 2008

I know how much I just love it when someone suggests a complete rewrite when I'm close to completion... :ponder:

But I'm going to post this anyway. :twisted:

Have you looked into using indices into precreated tiles rather than storing the entire bitmap in memory?

Basically, your kernel would look like this:

KernelLoop3
SLEEP 4;+4	 9
IntoKernelLoop3
sta.w GRP0;+4	13


lda ColorTable,X
sta COLUPF;+7	20

MarkerR3PF1
lda $1800,X;		this absolute address is modified outside the kernel
sta PF1
MarkerR3PF2
lda $1800,X;		this absolute address is modified outside the kernel
sta PF2
MarkerR3PF4
lda $1800,X;		this absolute address is modified outside the kernel
sta PF1
MarkerR3PF3
lda $1800,X;		this absolute address is modified outside the kernel
sta PF2	;+28	48

MarkerOtherSpriteR3
lda PlayerMaskingData,X;		this absolute address is modified outside the kernel
sta GRP1	;+12/14	60/62	VDELed
lda (PlayerDataPtr),Y;
and (PlayerMaskPtr),Y;+10/12	65/68
dey
sta WSYNC

dex
bpl KernelLoop3;+5	5

With slight modifications. You are only using a single sprite (correct?) so drop one of the sprites and that gives you enough time to write to all six PF registers.

So your code would have to reside in RAM, obviously. The main problem with this is that you have to have prerendered tile-pairs, for every possible combination of tiles. I *think* there are 12 different tiles in BD, so that gives you ~144 tile-pairs, each of which will need a 21-byte bitmap. Plus animation... :ponder:

And don't forget that you'll need separate copies for PF1 vs. PF2, so double that. Basically works out to almost 6K of tile-pair bitmaps. Which shouldn't be a problem, since you already are using a 32K+ (correct?) ROM anyway, except that they won't all fit in a bank. But some wizardry should be able to work around that limitation, I would think.

The big advantage is that you have a lot fewer writes to do when you scroll, since you don't have to modify the bitmap directly. Even with the painfully slow SC RAM writes, I could still get 30Hz scrolling in the demo I wrote (see below).

Anyway, just a thought or two for you.

Kinda funny - inspired by your BD demos, I wrote a scrolling, tiled engine for the Supercharger that works as I described above. I posted the source to the [stella] list about a year ago, I think; I'll attach it.

SCRPG20080102.zip

SCRPG20080102.bin

Edited January 30, 2008 by vdub_bobby

Thomas Jentzsch · January 31, 2008

With slight modifications. You are only using a single sprite (correct?) so drop one of the sprites and that gives you enough time to write to all six PF registers.

Actually we are already using all six PF registers. And self-modifying code of course.

So your code would have to reside in RAM, obviously. The main problem with this is that you have to have prerendered tile-pairs, for every possible combination of tiles. I *think* there are 12 different tiles in BD, so that gives you ~144 tile-pairs, each of which will need a 21-byte bitmap. Plus animation...

Well, there are at least 20 different tiles, probably more.

Edited January 31, 2008 by Thomas Jentzsch

vdub_bobby · January 31, 2008

With slight modifications. You are only using a single sprite (correct?) so drop one of the sprites and that gives you enough time to write to all six PF registers.

Actually we are already using all six PF registers. And self-modifying code of course.

I meant drop one of the sprites in the code I posted.

So your code would have to reside in RAM, obviously. The main problem with this is that you have to have prerendered tile-pairs, for every possible combination of tiles. I *think* there are 12 different tiles in BD, so that gives you ~144 tile-pairs, each of which will need a 21-byte bitmap. Plus animation...

Well, there are at least 20 different tiles, probably more.

Ah, of course. Nevermind, then.

EDIT: Wait a minute. Just replayed A800 BD a bit, and all the animation is simultaneous; i.e., all the fireflies change animation frame at the same time. Also noticed that the exit is not a separate tile (sort of), but a blinking outer-wall tile.

So the total tiles is 11: wall, dirt, boulder, jewel, outer wall, explosion, blank, butterfly, firefly, magic wall, and amoeba, .

So figure 11 x 11 possible tile-pairs, x 2 for forward/reverse PF, x 21 tile height = 5082 bytes.

But all the tiles don't appear on any one level. So divide the tiles into a couple sets of 9 and you can fit all the pre-rendered tile-pairs for a level into one 4K bank. To animate, just switch banks (i.e., character set flipping ).

Here's a rough stab at it...

levels A-F, I-L, N: wall, dirt, boulder, jewel, outer wall, explosion, blank, butterfly, firefly

level G: wall, dirt, boulder, jewel, outer wall, explosion, blank, firefly, amoeba

levels H, O, P: wall, dirt, boulder, jewel, outer wall, explosion, blank, firefly, magic wall

level M: wall, dirt, boulder, jewel, outer wall, explosion, blank, butterfly, amoeba

9 x 9 tile-pairs = 81 x 2 copies for PF1 vs. PF2 = 162 x 21 bytes per tile-pair = 3402 bytes. Easily fit in a bank.

4 different tile sets, doubled for 2-frame animation = 8 banks. That's pretty huge, but you're already at least that big, right? Since you're using 1 bank per row?

Edited January 31, 2008 by vdub_bobby

vdub_bobby · January 31, 2008

Talk is cheap. :twisted: I partially mocked up the first level using the SC demo.

Only finished the top third of the map, but all tiles but the jewels/diamonds are supported; there's room to add the jewels/diamonds but I ran out of time.

EDIT: Pulled binary for copyright reasons.

Edited February 4, 2008 by vdub_bobby

+Andrew Davie · February 1, 2008

9 x 9 tile-pairs = 81 x 2 copies for PF1 vs. PF2 = 162 x 21 bytes per tile-pair = 3402 bytes. Easily fit in a bank.
4 different tile sets, doubled for 2-frame animation = 8 banks. That's pretty huge, but you're already at least that big, right? Since you're using 1 bank per row?

Only one 1K bank per row, not 4K bank.

Cheers

A

vdub_bobby · February 1, 2008

9 x 9 tile-pairs = 81 x 2 copies for PF1 vs. PF2 = 162 x 21 bytes per tile-pair = 3402 bytes. Easily fit in a bank.
4 different tile sets, doubled for 2-frame animation = 8 banks. That's pretty huge, but you're already at least that big, right? Since you're using 1 bank per row?

Only one 1K bank per row, not 4K bank.

Cheers

A

Ahhh. That makes more sense. Well, if you've got a board with 32K of RAM on it, maybe consider this... :ponder: :lol:

Anyway, I finished semi-mocking up the first level. I said I could fit all the tiles in but, well, I lied. I forgot about the explosion tile. But, everything else is in.

Went to all that work, thought I'd share.

EDIT: Pulled binary for copyright reasons.

Edited February 4, 2008 by vdub_bobby

+Andrew Davie · February 1, 2008

Went to all that work, thought I'd share.

And a nice effort it is, too! I'm quite impressed. However, I don't believe predefining all tile pairs is the way to go. TJ and I considered this and it does restrict things somewhat with the number of tiles possible. The current system is a bit more general-purpose. I think your comments and demo are very interesting, and sobering, but wish you'd go and find another project to bully!!!

Here's the current character count...

2 PF variants x

( blank / soil / rock / diamond(2) / butterfly(2) / firefly(2) / explosion(3) / amoeba(2) / wall/magic wall (4) / steel wall )

and that's for BD#1. I expect that if this engine were to be used for further BD variants then predefined pairs become more untenable. TJ and I have discussed using predefined pairs for only common variants (eg: soil/boulder) and this might happen.

+Andrew Davie · February 1, 2008

I'd guess it would work well to change one scan line of the diamonds per frame (i.e. running the code once). Figure five frames of adding a 'red' pixel to the diamonds (which would otherwise be green+blue), then five frames of removing a red pixel, etc. That would make all the diamonds shimmer at 6Hz. Doing that would add about 700 cycles/frame overhead. Not trivial, but hardly outrageous. The biggest caveat would be that when drawing diamonds one would need to be careful to draw them in the correct state. That would be balanced, however, by the fact that one would not have to redraw them for purposes of animation.

An interesting idea I'll keep in mind. A similar thing could be used for amoeba animation. But it's a fairly radical change, so not high on my list of priorities, and the cost is kind of daunting.

vdub_bobby · February 1, 2008

And a nice effort it is, too! I'm quite impressed. However, I don't believe predefining all tile pairs is the way to go. TJ and I considered this and it does restrict things somewhat with the number of tiles possible. The current system is a bit more general-purpose. I think your comments and demo are very interesting, and sobering, but wish you'd go and find another project to bully!!!

Thanks. Just wanted to throw this idea out there...

happymonster · February 2, 2008

As an experiment I tried seeing what the gfx would look like with only 2 alternating colour lines, rather than 3. The 21 pixel height of the 'tiles' doesn't quite fit here, but this is just a mockup.

Well, although this might look better when full size on a TV (due to less spacing between the colour line effect), it's missing the wider range of your colour palette, especially the grey as well as the blue for the boulders. Still, it was an interesting experiment..

Boulder Dash® Developers' Diary

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members