Jump to content
IGNORED

Compression conundrums


TheMole

Recommended Posts

I've been thinking about how to add efficient, fast compression of level data to Alex Kidd. Currently, the widest level possible (unpacked) due to memory constraints is 1280px wide (or, 160 characters, or 5 screens). The level in the current Alex Kidd demo fully exploits that space, and it is the smallest I have in my current designs. This takes slightly less than 4k of RAM (3840 bytes, to be exact).

 

So, I need to come up with a form of compression of the levels that has a number of characteristics:

  1. Compress the data enough so that total memory consumption for a map of 3072px wide (the largest map I currently have designed) is less than the 3840 bytes currently available (and actually, I probably need to shoot for something around 3kb 'cause I need some memory back for other things, such as longer music tracks).
  2. Is extremely fast to unpack to VDP: I need to copy three to six rows (96 - 192 bytes) of unpacked data to the VDP every frame, depending on how fast Alex is running (he runs fast enough to need two-pixel scroll increments at times). Add to that the 128 bytes for the sprite table that need to be uploaded every frame, as well as the game logic even with the uncompressed map I'm close to the edge of the the machine can do with the current C code.
  3. Allow for dynamic modifications to the map. This is probably the most annoying requirement, but since Alex can destroy parts of the world (rocks get replace by nothing, boxes replaced by money bags or special items, specials items and money bags by nothing) I need to be able to take those out for as long as that part is potentially on screen. Because of that, simple RLE is out of the question.

So I ruled out RLE. Besides that, the obvious one is to look at the way the levels are built up: every item on screen can be represented by a 16x16 meta-tile. Using the meta-tile approach, the map data itself of course compresses to 25% of it's original size, and the meta-tile dictionary will take up around 800 - 1024 bytes, depending on the complexity of the map. This puts the largest map at around 2k, so the compression result is excellent.

 

However, uploading this to the VDP turns out to be slower than expected. In iterating over the metatiles, I need to see whether I'm uploading odd or even rows, and starting from odd or even columns and that code slows things down enough to push me over the limit of what I still have available in the time for one frame (well, when Alex is running at least, when he moves slowly, I only need to do this for three rows and that still seems to work fine).

 

I can do a couple of things to remedy this (optimize the lookup code, unroll some loops, hand code it in assembly, ...), but I wanted to see of someone else can think of a super-duper way of compressing the data that is nearly free when it comes to turning it into something that I can upload to the VDP.

 

Link to comment
Share on other sites

In XB256 I used a very simple method to compress the parts of the VDP memory which might work for you. You can store bytes from the VDP as compressed DATA statements and then use an A/L subprogram to quickly restore them to memory. It looks for repeating bytes. If there are up to 127 bytes that do not repeat more than twice it will make a length byte from 1 to 127, then the non repeating bytes follow the length byte. If bytes repeat more than twice then the repeating bytes are counted up and the MSB of the length byte is set. If there are 32 spaces then the length byte would be >A0, followed by a single >20. Another way of looking at it is to check if the length byte is greater than 128. If so then subtract 128 from the length byte and write the next byte that many times to the VDP. This method is very fast and can lead to some pretty good compression of the screen because there are often repeating characters. Not so much with character definitions which tend to not repeat a lot. Also, if the screen was filled with ABCD repeating through the entire screen there would be no compression because it is not smart enough to see that pattern. This method would not be restricted to XB; it would work fine in A/L.

Link to comment
Share on other sites

That sounds like a simplified RLE compression algorithm you're proposing. For static stuff, I agree that this would work very well and is speedy enough for real-time decompression. However, the problem is that it's hard (impossible?) to change the compressed data in-memory without going through a full compression cycle, and since the player can destroy parts of the map/level, I need it to be dynamically modifiable.

Link to comment
Share on other sites

I can't suggest anything better than metatiles, so I think you have to optimize your code. Even though this makes the code less elegant, deal with the odd ends of a row separately. Then you can make the inner/middle loop where you transfer the full tiles as tight and unrolled as possible. Perhaps make one routine for odd rows and another for even rows so you can eliminate any checks from the inner loop?

Link to comment
Share on other sites

A key issue is how to get from the metatile to the actual bytes you need to transfer in the fewest possible instructions. If you can afford to store the metatile as a word (the address of the bytes to transfer) instead of a byte, you should be able to eliminate one shift and one add - at least for the even rows. For the odd rows you still need to increment the address by two.

Link to comment
Share on other sites

Yeah, I do think optimization of the meta-tile approach is the only way forward. I want to be sure that I'm not missing some super fast, super intelligent approach that I haven't thought of.

 

As for storing the meta-tile dictionary as words instead of bytes... I could, but I'm afraid that doubling the size of it will come back to bite me in the ass later on when I get new ideas/maps/artwork/music/whatever... that require the memory :). I'll probably try to get it fast enough with bytes first.

Link to comment
Share on other sites

It seems that the problem is that you have destructable blocks, but they are at fixed, known locations. All you need to know is which block to place at that spot.

You could RLE compress the static portions, but do not include any block which could be changed. Fill those blocks with sky or something. Keep another RLE map of all the variable sections. Render one pass with the static map, then run another pass to overlay the variable map.

The variable map should be very sparse and should compress well, and should render quickly. When a block change is needed, it should be easy to convert screen coordinates back to inidices in the RLE map and make the update there. Updates will happen rarely, so this part can be a little slow.

Here's a badly described example in 1D.
For the RLE representation, I'm using (repeat count)(symbol)

Expanded static map:         aaaabcccXXcX
Expanded variable map:       ________AB_C
RLE compressed static map:   4a,1b,3c,2X,1c,1X
RLE compressed variable map: 8_,1A,1B,1_,1C
Expanded composite map:      aaaabcccABcC


When block A needs to change to D (or blocks for money bags), only that value needs to change. The rest of the RLE structure is unaffected.

After changing A to D:

Expanded static map:         aaaabcccXXcX
Expanded variable map:       ________DB_C
                                     ^--- changed block
RLE compressed static map:   4a,1b,3c,2X,1c,1X
RLE compressed variable map: 8_,1D,1B,1_,1C
                                 ^--- changed block
Expanded composite map:      aaaabcccDBcC
                                     ^--- changed block

I'm not familiar with your rendering algorithm, so this may not be feasable, but I think this should help reduce the memory needed for a level.

Link to comment
Share on other sites

I just had another thought, you could scrap the second pass altogether, and just not include variable blocks when calculating the RLE.

 

Another bad example:

Expanded map: aaaabcccAAcC
RLE compressed map: 4a,1b,3c,1A,1A,1c,1C

Change the first block A to block D:

Expanded map: aaaabcccDAcC
                      ^--- changed block
RLE compressed map: 4a,1b,3c,1D,1A,1c,1C
                              ^--- changed block

This has the same effect as what I mentioned earlier, but is a lot simpler, do it should be faster too.

Link to comment
Share on other sites

True, I could opt not to compress destructible blocks. The only problem that's left when using an RLE method then is how to quickly determine the starting column of a block of data that needs to be copied to the VDP.

 

Hard to explain, so I'll take your cue and show a 1D example. Basically, for every 8 frames, I need to make sure that the full screen table (32 columns x 24 rows) has been update in VDP memory. Every 8 frames, the starting column of the map section I need to copy needs to be increased by 1.

 

So, the first time I need to copy columns 0 - 31. After the screen has scrolled 8 pixels, I need to copy columns 1 - 32, after 16 pixels 2 - 33, and so forth. With the non-compressed map, this is trivial: I just copy 32 bytes (per row) from the array that represents the map to the VDP starting at the item that corresponds to the column that needs to be displayed on the left-most side of the screen (map[row][0], map[row][1], map[row][2], ...). All I need to do is increase a counter and use that as an index for the array.

 

With RLE compression, I can't predict where in the map array to start looking, so I'd need to make sure I have a buffer of at least 32 bytes decompressed in advance. That can still work, and it's worth investigating actually. I'll run some tests to see how much the maps compress with RLE when not compressing destructible parts of the map.

 

As an example: all blue (*edit*: cyan actually, not the blue background obviously :) ) and yellow blocks in this map are destructible:

miracle_island_1.png

Edited by TheMole
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...