My name is Einar Saukas, I'm the original author of ZX7. I'm glad to know people have been using it here, thanks for the interest!
The ZX7 compression tool is "optimal", which means it produces the best compression results possible for this format. The documentation file explicitly says it. If you read anywhere a comment that suggested otherwise, please let me know so I can post a clarification there too.
The compression tool runs very fast because it implements a faster algorithm I developed myself, but it doesn't sacrifice compression ratio to achieve this performance.
I'm fairly confident my ZX7 compressor is optimal, thus it should never be possible to produce an even smaller result for any file using the same format. However I recognize I'm not infallible, there's always a chance I made a mistake somewhere. Could you please provide one example where you believe you got a better result? If I really made a mistake, I would very much appreciate the chance to learn about it!
Sounds interesting! I'm curious about your format also. Where can I find it?
When I designed ZX7, I was aiming for the best tradeoff between compression ratio on most typical cases, decompression speed and size, while restricting myself to "pure" LZ77/LZSS. I'm aware that best results can be obtained in one of these characteristics by sacrificing others, it's always interesting to see different algorithms and the choices they made.
Indeed, ZX7 is a tradeoff between compression ratio and decompression speed and size because it's based on reading bytes most of the time instead of individual bits. It looks like there was a misunderstanding from my part regarding your proof of concept not providing optimal results. When I tried to repeat the experience by rewriting in Java the ZX7 data compression written in C, my results were differents than with the optimal algorithm. Clearly, I made a mistake. One paper compared Elias gamma with Elias gamma on delta and so on and the conclusion is that Elias gamma with lazy LZSS gives best results; this paper confused me at first about the "lazy" LZSS, but I decided to not investigate further... maybe I should.
Journey to DAN1 format
My goal is to make graphics fit in my ColecoVision projects and the biggest graphics are bitmap screens made of two 6K tables for patterns and colors for a total of 12K, charsets are also quite big and can certainly benefit from good compression.
In LZSS compression, offset values are usually the biggest part of the data and just by changing to more suitable sizes for offsets it gets to a better compression at the cost of speed. For example, all Graphic Mode II bitmap screens I've tested, going from a simple logo to pixel art (ZX7, MSX1) to heavily dithered pictures do fit into these two categories : simple and elaborated; 4 bits and 9 bits offsets are best for simple bitmaps (simple logo) and 4 bits and 11 or 12 bits for elaborated bitmaps (vast majority of ColecoVision Bitmap title screens). Since the biggest challenge for my CV projects is to make the graphics fit, I had fun adapting ZX7 to uses these 4,9 and 4,11 offsets to get better compression but still couldn't get close to Exomizer results. I also started to write a paper in PDF about these results, even wrote a presentation last year for the ADAMCon, but I decided to keep going in my R&D before fixing my new format. This year, I've made the discovery of 4 fixed sizes for offset values is the best fit overall (for graphics and text compression), providing compression results closer to Exomizer without the need for extra memory. And since the waiting time before seeing a cool title screen doesn't really matter, I decided to adopt the 4 sizes to encode the offsets.
Table 1 - Size of compressed data on a 12K bitmap sample obtained by using one, two, three, four, five or six sizes to encode the offset values.
- ONE : 11 bits = 2961
- ZX7 : 7 and 11 bits = 2901
- TWO : 8 and 11 bits = 2841
- THREE : 4,8 and 11 bits = 2814
- FOUR : 1,4,8 and 11 bits = 2798
- FIVE : 1,3,5,8 and 11 bits = 2804
- SIX : 1,2,3,5,8 and 11 bits = 2823
Based on these results, no more than 4 different sizes for offset values is best.
According to a paper about variants of gamma encoding in LZSS optimization, as mentionned earlier, normal Elias gamma with lazy LZSS is best. But since I don't fully understand what this paper means by "lazy" I just go with what I know.
I started to encode my new format DAN1 using LZSS variant with Elias gamma and 4 different sizes to encode offset values. Because I knew some people would want to use this format and not just on simple title bitmap screens, DAN1 also avoid adding constantly bits for each literal byte in a long sequence of bytes (see RLE compression).
Table 2 - Size of different routines to decompress to VRAM used in various ColecoVision projects :
- RLE = 43
- DAN0 = 97 ( RLE with literals encoded into a fixed Huffman )
- ZX7 = 131 ( LZSS with 7 and 11 bits for Offsets )
- DAN1 = 220* ( LZSS with 1, 4, 8 and 12 bits for Offsets and RLE for uncompressed literals )
- PLETTER = 222
Table 3 - Size of compressed data of a 12K Bitmap sample
- RLE = 4528
- DAN0 = 4143
- ZX7 = 3551
- DAN1 = 3419*
- PLETTER = 3557
Table 4 - Size of both decompression routine and compressed data
- RLE = 4571
- DAN0 = 4240
- ZX7 = 3682
- DAN1 = 3639*
- PLETTER = 3779
DAN1 results are marked with * because it's still in development, not yet used in a ColecoVision project.
I finish my tests and then present my results during ADAMCon 2016, July 14-18.
Edited by newcoleco, Wed Jun 1, 2016 2:52 PM.