extended discussion about compression

newcoleco · October 12, 2014

For me, graphics in ColecoVision projects is what takes the most space and what can be compressed the most. So, I've been working on data compression from time to time but most seriously during the last month, trying to figure out the secrets of a good compressor and data format based on what others made to save memory space without the need of a too big decompressor and no extra (cpu) memory.

My research resulted into 2 realisations : Run Length Encoding is fast but isn't good enough even if you encode it in a clever way, Huffman is good but you need a variation of LZ77 to really perform great compression and it's the base for the popular compressors in the homebrew scene and modern computers compressors including WinRAR, WinZip, GZip and 7Zip.

I tried to do my own compression based on LZ77 with the option of an extra bits saving with a fixed Huffman. At first coding the compressor gave me headaches because a greedy LZ77 compression algorithm doesn't give the best compressed size. I went back to the source, reading papers about LZ77, and finaly I understood what I was doing wrong and code an algorithm for getting the best compression every time even if it means taking a long time to process. When the first results came out, I was shocked to discover that this simple change from greedy to a better strategy algorithm made my own compression format going from "as good as Pletter and ZX7" to "a bit better than Exomizer".

So before yelling "Victory!", I've done more tests and I started to write a paper based on my research, showing not only what I've learned but also compression tests on real graphics. The following tables are in my paper, the size of the decompressors mentionned are from the ones I've encounter and do decompress directly into VRAM to respect the rule of no extra memory needed.

1st : decompression routine size

2nd : compressed data size of the Smurf Challenge title screen by Youki

3rd : compressed data size of the pixelart Unreal by Rion

4th : compressed data size of Lena (used in many computer science papers, original photo in Playboy)

Run Length Encoding

43

1994

14814

14562

PuCrunch

297

1438

6780

9358

Bitbuster

173

1262

6588

9188

APLib (commercial)

289

1227

6556

9138

Pletter

241

1235

6476

9081

ZX7

128

1234

6474

9079

MegaLZ (commercial)

110

1243

6486

8960

Exomizer

250

1130

6236

8694

My own compression

200

1144

6136

8515

These 3 pictures in PC format (PATTERN 6144 bytes + COLOR 6144 bytes) : 3 sample pics in PC format.zip

Victory? Not yet, I still have to code a lot including a tool using my own format.

Please leave a comment

Edited October 12, 2014 by newcoleco

digress · October 12, 2014

It clearly is beating the other compression methods. And even though the Exomizer beat it on one image it was overall not as good as your your new technqiue.

It is amazing the quality of graphics the colecovision can do. Nice picture samples.

alekmaul · October 12, 2014

I agree Daniel, graphics waste too much space in rom , really nice to see you again for such Coleco help for homebrewers.

Is your own compression based on your Dan0 compression algorythm ?

Can' t wait to see news regarding your new tool about that !

newcoleco · October 12, 2014

I agree Daniel, graphics waste too much space in rom , really nice to see you again for such Coleco help for homebrewers.

Is your own compression based on your Dan0 compression algorythm ?

Can' t wait to see news regarding your new tool about that !

Dan0 is a Run Length Encoding with a bit of my magic composed of a somewhat adaptive Huffman and the possibility of re-using part of another data table to reduce even more the number of bytes to be encoded. It's a bit complicated to explain and my tool based on this format isn't for everyone (no interface, you modify my tool coded in java and use jdk to compile and run it each time) but it works. Essentialy, all raw part of a RLE is a potential for saving more bytes by saying : "hey! I've seen this byte not long ago, let's just encode a reference to it in a few bits instead of encoding it raw". Since it's RLE based first, the decoding process is relatively fast, faster than all the lz77 based ones, which can be a big deal for some projects, Even if Dan0 isn't the best compression, it has a potential benifit for some projects, it depends of the needs.

The compression I propose today is based on LZ77 with the option of a fixed Huffman that helps in some cases to get a great compression. Mixing LZ77 with Huffman isn't a new idea, it is used in many formats like Deflate in WinZip. Since my work is based on my ColecoVision projects, this custom compression do give great compression for me, I just assume it should works great for your projects too.

youki · October 13, 2014

Very interresting!.

It would be interresting you put the comparaison with your Dan0 in that list.

I use Dan0 in all my projects since a while , and it does already some miracle for me. But if you put in your dev kit a better compression/decompression tool i will use it for sure.

Curious that for Smurf Challenge , Exomizer does better than yours. Because on other the difference is really important.

The rion Unreal image is a ZX Spectrum image no?

I saw some very impressive images in some ZX Spectrum Demo , it is incridible what Artist manage to do with a so much constraint.

artrag · October 18, 2014

Very good results, when do you plan to release compressor and decompressor ? They could be very handy also on msx.

Just for comparison I've tried to compress the sample images using msx-o-mizer (based on exomizer 2.0 beta 6) and I had this:

[youki] smurf challenge 1130 bytes
[rion] unreal 6426 bytes
Lena 8697 bytes

Basically the results confirm that msx-o-mizer is derived by exomizer.

Edited October 18, 2014 by artrag

newcoleco · October 27, 2014

Update

Error in the previous results. Oops!

Corrected results suggest a compression ratio between MegaLZ and Exomizer, or similar to Exomizer ; it depends on the data to compress and the way you approach the case ( "naive or "clever" ). What I mean by "clever" is mostly because it needs human intervention (or maybe highly advanced algo with a lot of computations) to cut the dataset into subsets where the sum of the bytes for each compressed subset is smaller than the the whole dataset compressed as one set. With mode 2 pictures, groups are quite easy to detect and this particular case can be automated (duh!) : PATTERN and COLOR tables. Working on the final format so results may change again in my next post. No evaluation tool for the moment.

Another thing. Most of the mode 2 pictures we do use in our projects are relatively simple graphics, with unused data. Many times we get values 00 and FF for the pattern which make one of the nibbles for the color code totaly meaningless (unused). There is also some cases where consecutive color codes may be just alternate version of the same 2 colors if the pattern codes where coded differently. This situation cause the graphic tool to generate arbritary color codes that are not used anywhere else in the picture that creates more variations of bytes possible for color than needed, making the picture harder to compress for no good reason.

[youki] smurf challenge (naive) 1155 or (clever) 1136 bytes
[rion] unreal (naive) 6329 or (clever) 6204 bytes
Lena (naive) 8803 or (clever) 8601 bytes

artrag · October 30, 2014

Humm... partitioning the input data in statistically homogeneous subsets would improve the compression rate also of the other compressors. In msx-o-mizer (close relative of exomizer) compressing colors and patterns separately does improve the results a lot.

I suspect that if you segment input data in subsets statistically homogeneous and apply the other algorithms, their compression rate would improve outperforming the results of your proposed algorithm.

newcoleco · October 31, 2014

Humm... partitioning the input data in statistically homogeneous subsets would improve the compression rate also of the other compressors. In msx-o-mizer (close relative of exomizer) compressing colors and patterns separately does improve the results a lot.

I suspect that if you segment input data in subsets statistically homogeneous and apply the other algorithms, their compression rate would improve outperforming the results of your proposed algorithm.

Well, my current tests shown : if cleverly segmented, yes exomizer do beat my proposed data compression in all the samples I've tried.

With that said, I'm happy with the results. And a tool using this compression is coming next month.

artrag · October 31, 2014

Consider also that exomizer needs a ram buffer of about 384 bytes IIRC, it is not trivial to fit it in the ram of a plain coleco.

If your algorithm does not need ram buffers, it could be the best choice in many cases where the buffer for exomizer simply would not fit in ram

newcoleco · September 4, 2015

My presentation during ADAMCon summer 2015 was about my work on data compression and my upcoming tool for it.

For the moment, the research and development process is still going. When the first beta version of this new is done, I'll let it know here.

digress · September 7, 2015

Cool,

I'm using pletter for all the graphics.

I've setup a batch file which will take the binary uncompressed data , pletter it, and then reoutput to hex data after being plettered.

The one step I find still tedious is converting it into binary from the hex data which involves (cropping out the "0x", the "," and the "<spaces>". I do it with 3 mass find & replace searches which isn't too bad really.

so ideally if it has the option to output uncompressed binary data which you can then compress with your own preferred method would be nice.

I developed (out of necessity) a nice way to compress music. It's not really compressing it really but rather using predefined notes and a shorthand version of the song being played. It then plays the notes 1 at a time and won't play the next note til the delay of say 16 has passed.

I noticed that when I analyzed my straight up music data the same notes & lengths would be used over and over again so I decided to just code the note 1 time then use a lookup table whenever that same note was played again. It cut the music data down to about 1/8th the original size due to the same note being used over & over with a custom playback routine that controls the lookup and timing.

My presentation during ADAMCon summer 2015 was about my work on data compression and my upcoming tool for it.

For the moment, the research and development process is still going. When the first beta version of this new is done, I'll let it know here.

Pixelboy · September 16, 2015

Hey Daniel, how come I cannot send you a private message? Is your PM inbox full? I'm trying to contact you...

newcoleco · June 4, 2016

Hey Daniel, how come I cannot send you a private message? Is your PM inbox full? I'm trying to contact you...

Deleted 15 old messages in my AtariAge mailbox. You can send me messages.

newcoleco · June 4, 2016

Cool,

I'm using pletter for all the graphics.

I've setup a batch file which will take the binary uncompressed data , pletter it, and then reoutput to hex data after being plettered.

The one step I find still tedious is converting it into binary from the hex data which involves (cropping out the "0x", the "," and the "<spaces>". I do it with 3 mass find & replace searches which isn't too bad really.

so ideally if it has the option to output uncompressed binary data which you can then compress with your own preferred method would be nice.

I developed (out of necessity) a nice way to compress music. It's not really compressing it really but rather using predefined notes and a shorthand version of the song being played. It then plays the notes 1 at a time and won't play the next note til the delay of say 16 has passed.

I noticed that when I analyzed my straight up music data the same notes & lengths would be used over and over again so I decided to just code the note 1 time then use a lookup table whenever that same note was played again. It cut the music data down to about 1/8th the original size due to the same note being used over & over with a custom playback routine that controls the lookup and timing.

Please tell me more about how to make things less tedious.

And I would love to learn about your music format. Send me a message about it.

newcoleco · June 4, 2016

Consider also that exomizer needs a ram buffer of about 384 bytes IIRC, it is not trivial to fit it in the ram of a plain coleco.

If your algorithm does not need ram buffers, it could be the best choice in many cases where the buffer for exomizer simply would not fit in ram

I kept my focus on not spending more memory space and I'm quite confident about its efficiency.

Expect a first BETA version in a few days.

digress · June 5, 2016

ok, PM sent.

Please tell me more about how to make things less tedious.

And I would love to learn about your music format. Send me a message about it.

newcoleco · June 5, 2016

ok, PM sent.

Got it! Thanks

digress · June 6, 2016

cool,

So there is a predefiined note table for 1 voice melody only. It plays them based on a lookup data array so yto play a simple scale you could have a data array ={1,2,3,4,5,6,7};

rather than

0x40,0x57,0x63,0x04,0x42,0x57,0x83,0x04,0x24,0x11,0x50, //c3 8th note

0x40,0xFA,0x62,0x04,0x42,0xFA,0x82,0x04,0x24,0x11,0x50, //d3 8th note

etc. really cuts down on the reuse becasue you'll use the same c3 note again and again. so to call c3 again you just use {1,1,1} or whatever

the base or accompaniemnt is a timed beat that that is repeated.

The biggest problem is the predefined notes array/sound array was limited to approx 62 sounds so I could only do it for 1 voice.

as for the data fro the ICVGM which is an excellent program. We all appreciate it. If you ever desgin another similar it would be great if I could have an option to have the outputs optionally like so:

Currently:

byte NAME[] = {

0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,

};

I have to convert it to before I can compress it:

20202020202020202020202020202020

So maybe just as an extra option to have all the "," "spaces" & "0x" already taken out.

I've gotten used to it so it's a nice to have. Perhaps even output to a window so it can just be cut & paste into the final data file.

If you are going to build the DAN0 compression option right into it then perhaps it doesn't matter.

Got it! Thanks

Kiwi · June 6, 2016

You could use Notepad++ feature called, Search>Replace. Highlight the data table you want. Replace 0x with <nothing>. Then replace , with <nothing>. It should provide the table you want.

extended discussion about compression

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members