newcoleco Posted December 4, 2017 Share Posted December 4, 2017 (edited) DAN3 Lossless Data CompressionProgrammed by Daniel Bienvenu, 2017. What is DAN3?DAN3 is a LZ77/LZSS variant using unary encoding, k=1 Exp-Golomb and various sizes relative offsets without the need for extra memory prios to decompress data. Details below. Ask questions, post comments, share experiences. Download DAN3 build 20180126 (de)compression tool final? (BIN + SRC) : dan3_20180126.zip DAN3 build 20180123 (de)compression tool *BUG* (BIN + SRC) : dan3_20180123.zip DAN3 build 20180118 (de)compression tool *BUG* (BIN + SRC) : dan3_20180118.zip (previous version) DAN3 compression tool *BUG* only experimental beta (BIN) : dan3beta.zip ColecoVision Slide Show Sample (SRC, BIN) : SLIDESHOWDAN3.zip Technical InformationDAN3 is a LZ77/LZSS variant developed based on DAN1 and DAN2, which explains their similarities.The format starts, like in DAN2, with defining what's the size in bits for the large offset values used to identify a match with the following table with a unary code (sequence of bits that looks like this): 0 : 9 bits long <- 512 bytes (characters on a screen). 10 : 10 bits long <- 1K (character set, some bitmap screens) 110 : 11 bits long <- 2K (most bitmap screens) 1110 : 12 bits long <- 4K (dithered bitmap screens) 11110 : 13 bits long <- 8K (my decompression routine limit as-is) 111110 : 14 bits long <- 16K 1111110 : 15 bits long <- 32K 11111110 : 16 bits long <- 64K (only good for 32bits/64bits PC files at this point) ... (no limit in theory) Then, like in DAN2, the first byte of the uncompressed data is stored as is. Afterward, things are a little different, which makes DAN3 different than the others; offering a better compression on average but not always. For each match, the size and the relative offset values are encoded. While DAN1 and DAN2 are using Elias Gamma to encode the size value, DAN3 is using a k=1 Exp-Golomb encoding instead, which somehow helps to optimize a little bit both in term of space and time to decode. As for the relative offset values, DAN3 is using a completely different set of possible bit-size offsets; using {5, 8, max} instead of {1, 4, 8, max} bits to encode offsets. As for the special case of a nearby single byte being the same as the current byte to encode, DAN3 limits to the first 3 nearest bytes, instead of the 18 nearest bytes, which limits its potential to find a match and save space, but since the big impact is with sequences of more than 2 bytes, this change has no impact besides offering a better compression than Pletter and the others that do not support sequences of a single byte, acting if you like as a local fixed Huffman encoding. Here's a comparison side by side of Elias Gamma and k=1 Exp-Golomb to show you what I mean by size and speed possible gain in DAN3 since reading fewer bits means taking less time to decode. Elias Gamma (DAN1 and DAN2) versus k=1 Exp-Golomb (DAN3) 1 : 10 = size 1 010 : 11 = size 2 011 : 0100 = size 3 00100 : 0101 = size 4 00101 : 0110 = size 5 00110 : 0111 = size 6 00111 : 001000 = size 7 0001000 : 001001 = size 8 0001001 : 001010 = size 9 0001010 : 001011 = size 10 0001011 : 001100 = size 11 0001100 : 001101 = size 12 0001101 : 001110 = size 13 0001110 : 001111 = size 14 0001111 : 00010000 = size 15 000010000 : 00010001 = size 16 ... 000000011111110 : 00000011111111 = size 254 In DAN3, the support of Exp-Golomb stops at 00000011111111 = size 254. There are 3 reasons for that: It allows an optimization of decoding only into a single byte instead of supporting to carry the bits into a 16 bits register pair. In Z80 opcodes, that simplifies the decoding routine, making it faster and smaller. It makes the specials markers for END OF DATA and RLE about a byte smaller and faster to read. Very large sequences of nothingness sadly will need more than one match of size 254 each but since we're talking about compressing our elaborated graphics which are mostly not that empty, the limitation should be barely an issue and satisfy plenty our needs. Special Markers 00000001 + byte : RLE - Copy raw the next (byte value + 1) bytes 00000000 : END OF DATA Relative Offset For a size of 1 byte, the relative offset is either 0 (the byte just before), 1, or 2 encoded respectively as 0, 10, and 11. For sizes of 2 or more, the offset is encoded as follow: 10 + 5 bits = 5-bits Offset 0 + byte = 8-bits Offset = byte + 32 11 + N bits + byte = large Offset = (N bits and byte together as a 9 or more bits value) + 288 ListingDecompression to VRAM Routine for Z80, using BE and BF ports (ColecoVision)Written in SDCC style, to be added to your compression toolbox.DAN3 decompression routine is only 14 bytes bigger than DAN1 decompression. ; dan3.s ; DAN3 Lossless Data Compression Format by Daniel Bienvenu ; DAN3 Decompression to VRAM ; 6 December, 2017 ; Size: 201 bytes ; HL = SOURCE ; DE = DESTINATION ; global from this code ;================ .globl _dan3 ; void dan3 (void *data, unsigned vram_offset); .globl dan3 ; HL = ADDR. TO COMPRESSED DATA , DE = DESTINATION IN VRAM ; Wrapper to get values from parameters (register pairs) _dan3: pop bc pop de pop hl push hl push de push bc ; HL = SOURCE ; DE = DESTINATION dan3: ; Set Write in VRAM at DE ld c,#0xbf out (c),e set 6,d out (c),d res 6,d ; Set A for reading a bit routine ld a,#0x80 ; Important to save the IX register push ix ld ix, #get_bit_e+3 dan3_offsetsize_loop: dec ix dec ix dec ix call get_bit ; check next bit jr c, dan3_offsetsize_loop ; Copy literal byte dan3_copy_byte: ld b,#0x01 dan3_literal2main: ld c,#0xbe dan3_literals_loop: outi inc de jr nz, dan3_literals_loop ; Main loop dan3_main_loop: call get_bit ; check next bit jr c,dan3_copy_byte ; Decode Exp-Golomb + Special Marker push de ld de, #0x0001 ld b,e dan3_expgolomb_0: inc b call get_bit ; check next bit jr c, dan3_expgolomb_value bit 3,b jr z, dan3_expgolomb_0 ; Special Marker pop de call get_bit ; check next bit jr c, dan3_literals pop ix ret ; EXIT dan3_literals: ld b, (hl) ; load counter value (8 bits) inc hl inc b jr dan3_literal2main dan3_expgolomb_value: dec b dan3_expgolomb_value_loop: call get_bit_e ; check next bit -> DE djnz dan3_expgolomb_value_loop dec e push de pop bc jr z, dan3_offset1 ; D = 0, E = ??, BC = LENGTH ; Decode Offset value ld e,d ; e = 0 call get_bit ; check next bit jr nc, dan3_offset3 call get_bit jr nc, dan3_offset2 call get_highbits_e ; read some bits -> E inc e ld d,e ; D = E + 1 dan3_offset3: ld e, (hl) ; load offset offset value (8 bits) inc hl ex af, af' ld a,e add a,#0x20 ; Skip the short offsets covered by 5 bits ones ld e,a jr nc, dan3_offset_nocarry inc d dan3_offset_nocarry: ex af, af' jr dan3_copy_from_offset dan3_offset2: call get_5bits_e ; read 5 bits -> E jr dan3_copy_from_offset dan3_offset1: call get_bit ; check next bit jr nc, dan3_copy_from_offset call get_bit_e inc e ; Copy previously seen bytes dan3_copy_from_offset: ex (sp), hl ; store source, restore destination push hl ; store destination scf sbc hl, de ; HL = source = destination - offset - 1 pop de ; DE = destination ; BC = count ; COPY BYTES ex af,af' set 6,d dan3_copybytes_loop: push bc ld c,#0xbf out (c),l nop out (c),h inc hl nop nop in a,(#0xbe) nop nop nop out (c),e nop out (c),d inc de nop nop out (#0xbe),a pop bc dec bc ld a,b or c jr nz, dan3_copybytes_loop res 6,d ex af,af' pop hl ; restore source address (compressed data) jp dan3_main_loop get_highbits_e: jp (ix) ; COVER 16K ; call get_bit_e ; get next bit -> E ; COVER 8K get_5bits_e: call get_bit_e ; get next bit -> E call get_bit_e ; get next bit -> E call get_bit_e ; get next bit -> E call get_bit_e ; get next bit -> E get_bit_e: call get_bit ; get next bit rl e ; push bit into E ret ; get a bit get_bit: add a,a ret nz ld a,(hl) inc hl rla ret Comparison with DAN1 DAN3 to VRAM decompression routine is only 14 bytes more than the one for DAN1 to VRAM. As for the compression ratio, it really depends on the data. For example, here's a table showing the size obtained with DAN1 and DAN3 for each picture in the SlideShow sample. awb1p: DAN1 3677+2298, DAN3 3689+2328 f1spp: DAN1 2366+1679, DAN3 2349+1664 h6exp: DAN1 3412+2313, DAN3 3398+2297 mgfap: DAN1 3554+1935, DAN3 3551+1928 sotbp: DAN1+ 3394+1956, DAN3 3381+1921 Updates* Dec 5, 2017 - Added Offset encoding details.* Dec 6, 2017 - Bug-Fixed and optimized ASM decompression routine. Added comparison with DAN1 for the SlideShow sample. * Jan 18, 2018 - Updated (de)compression tool * Jan 23, 2018 - Fixed fast compression method "-f" to be closer to perfect compression optimization * Jan 26, 2018 - Fixed RLE compression, now provides the expected results for hard to compress data Edited January 26, 2018 by newcoleco 3 Quote Link to comment Share on other sites More sharing options...
R.Cade Posted December 5, 2017 Share Posted December 5, 2017 What is the Weissman score? Quote Link to comment Share on other sites More sharing options...
newcoleco Posted December 5, 2017 Author Share Posted December 5, 2017 What is the Weissman score? A fan of Silicon Valley TV series. I still have to watch it, and my library has it to be watched freely, but I never find the time to do it. You can compare GZIP and DAN3 as being basically the same since they are both variants of LZ77/LZSS. If the TV series mention a score to GZIP, which is the format we use in our Internet communications, then you have basically the Weissman score of DAN3... if only I was not the only one working on this and many tools were done supporting DAN3 format for various needs. Since the compression time is meaningless compared to the resulting size to be used in our homebrew projects (decompression routine + data), I will not even try to give a Weissman score. My compression tool here is written in a way to try to optimize the compression ratio regardless the time it will take to do so. Once compressed, we put together the compressed data and the decompression routine in our homebrew projects, which is the part that affects the users' experience, to save in loading time if on tapes and disks, and to offer more content (graphics, levels, text) inside the limited space of the memory chips of our cartridges. As for the decompression time, it's about the same as the other LZ77-LZSS variants, sometimes faster, sometimes slower, making really the compression ratio the most important part and that's what I've focussed on. DAN3 compression ratio tends to be closer to PuCrunch and Exomizer than Pletter, ZX7, ApLib, MegaLZ and others, but it uses a format close to the latter group and it doesn't need extra memory space like Exomizer to set up a table of values in memory. I believe that the difference is mostly because DAN3 tries to even compress a single byte rather than just sequences of 2 or more bytes. At the extreme, DAN3 will give worse compression ratio results like MegaLZ will do for meaningless data like a text file with the only the letter Q thousands of times. But since I'm not concerned about meaningless data and expect DAN3 to be used to compress even better-detailed content (hi-res graphics and elaborated levels for example), I'm quite confident that DAN3 fits for our needs in average, even if not always the best solution. Quote Link to comment Share on other sites More sharing options...
newcoleco Posted December 20, 2017 Author Share Posted December 20, 2017 (edited) By reusing what works from DAN1 and DAN2, I've missed an optimization with the DAN3 to VRAM routine. At the same time, why fix what is not broken. So, use the previous version that works just fine just to be sure. The optimization I've missed is the maximum sequence length allowed in DAN3, which is only 254 bytes, an 8-bits value, so there is no need for a full 16-bits pair register BC in the loop. So, we can remove the "pop bc" and "push bc" and get a smaller routine. It should not really improve the decompression speed considering how tricky dealing with Video RAM timing can be. According to the Z80 documentation, "DJNZ" is 13 cycles long when the register B is not equal to zero, and 3 NOPs (the typical delay) is 12 cycles, so this code here doesn't need an extra delay like "NOP" before the "DJNZ" opcode. *TESTED* IT WORKS ON A REAL COLECOVISION GAME SYSTEM JUST FINE *Updated: December 24, 2017 - fixed a few lines in order to use only B and not BC for match length value. Routine even smaller and faster now. ; dan3.s ; DAN3 Lossless Data Compression Format by Daniel Bienvenu ; DAN3 Decompression to VRAM ; 24 December, 2017 ; Size: 195 (?) bytes ; HL = SOURCE ; DE = DESTINATION ; global from this code ;================ .globl _dan3 ; void dan3 (void *data, unsigned vram_offset); .globl dan3 ; HL = ADDR. TO COMPRESSED DATA , DE = DESTINATION IN VRAM ; Wrapper to get values from parameters (register pairs) _dan3: pop bc pop de pop hl push hl push de push bc ; HL = SOURCE ; DE = DESTINATION dan3: ; Set Write in VRAM at DE ld c,#0xbf out (c),e set 6,d out (c),d res 6,d ; Set A for reading a bit routine ld a,#0x80 ; Important to save the IX register push ix ld ix, #get_bit_e+3 dan3_offsetsize_loop: dec ix dec ix dec ix call get_bit ; check next bit jr c, dan3_offsetsize_loop ; Copy literal byte dan3_copy_byte: ld b,#0x01 dan3_literal2main: ld c,#0xbe dan3_literals_loop: outi inc de jr nz, dan3_literals_loop ; Main loop dan3_main_loop: call get_bit ; check next bit jr c,dan3_copy_byte ; Decode Exp-Golomb + Special Marker push de ld de, #0x0001 ld b,e dan3_expgolomb_0: inc b call get_bit ; check next bit jr c, dan3_expgolomb_value bit 3,b jr z, dan3_expgolomb_0 ; Special Marker pop de call get_bit ; check next bit jr c, dan3_literals pop ix ret ; EXIT dan3_literals: ld b, (hl) ; load counter value (8 bits) inc hl inc b jr dan3_literal2main dan3_expgolomb_value: dec b dan3_expgolomb_value_loop: call get_bit_e ; check next bit -> DE djnz dan3_expgolomb_value_loop dec e ld b, e dec e jr z, dan3_offset1 ; D = 0, E = ??, B = LENGTH ; Decode Offset value ld e,d ; e = 0 call get_bit ; check next bit jr nc, dan3_offset3 call get_bit jr nc, dan3_offset2 call get_highbits_e ; read some bits -> E inc e ld d,e ; D = E + 1 dan3_offset3: ld e, (hl) ; load offset offset value (8 bits) inc hl ex af, af' ld a,e add a,#0x20 ; Skip the short offsets covered by 5 bits ones ld e,a jr nc, dan3_offset_nocarry inc d dan3_offset_nocarry: ex af, af' jr dan3_copy_from_offset dan3_offset2: call get_5bits_e ; read 5 bits -> E jr dan3_copy_from_offset dan3_offset1: call get_bit ; check next bit jr nc, dan3_copy_from_offset call get_bit_e inc e ; Copy previously seen bytes dan3_copy_from_offset: ex (sp), hl ; store source, restore destination push hl ; store destination scf sbc hl, de ; HL = source = destination - offset - 1 pop de ; DE = destination ; BC = count ; COPY BYTES ex af,af' set 6,d ld c,#0xbf dan3_copybytes_loop: out (c),l nop out (c),h inc hl nop nop in a,(#0xbe) nop nop nop out (c),e nop out (c),d inc de nop nop out (#0xbe),a djnz dan3_copybytes_loop res 6,d ex af,af' pop hl ; restore source address (compressed data) jp dan3_main_loop get_highbits_e: jp (ix) ; COVER 16K ; call get_bit_e ; get next bit -> E ; COVER 8K get_5bits_e: call get_bit_e ; get next bit -> E call get_bit_e ; get next bit -> E call get_bit_e ; get next bit -> E call get_bit_e ; get next bit -> E get_bit_e: call get_bit ; get next bit rl e ; push bit into E ret ; get a bit get_bit: add a,a ret nz ld a,(hl) inc hl rla ret Edited January 27, 2018 by newcoleco 1 Quote Link to comment Share on other sites More sharing options...
newcoleco Posted January 18, 2018 Author Share Posted January 18, 2018 (edited) UPDATE January 23, 2018 Fixed DAN3 (de)compression tool final(?) version with source code. Fixed the fast compression method "-f" to get better compression ratio. Modified the help text "-h" to show DAN3 instead of DAN37. January 18, 2018 Added DAN3 (de)compression tool final(?) version with source code. Limit the maximum of bits used for large offset values. " -m 13 " (13 bits max which covers 8K like the ASM decompression code posted). Remove support of RLE with " -r ". Decompress a DAN3 file with " -d ". Faster compression algorithm (experimental), less optimization, with " -f ". The help information " -h " mentions this tool as "DAN37" which was its project branch name on my computer. REMINDER This is not a miracle solution. DAN3 performs better with elaborated graphics rather than simple and empty screens (use DAN2 in that case). I hope you enjoy the new version of this DAN3 tool. Recompile and adapt it based on your needs. I would love to hear news from people who actually used my compression tools since DAN1. Edited January 23, 2018 by newcoleco Quote Link to comment Share on other sites More sharing options...
newcoleco Posted January 26, 2018 Author Share Posted January 26, 2018 (edited) WARNING MAJOR UPDATE JANUARY 26, 2018 FIXED BUG IN RLE COMPRESSION - PLEASE UPDATE TO BUILD # 20180126 An error of logic in the DAN3 compression tool causing no possible RLE compression resulting to artificially inflated compressed files, and also the decompression " -d " mode of the same tool refused to decode any RLE codes from the compressed files. Both problems are now fixed and should be quite bug-free. Sorry for any bad compression ratio you may have experienced with a previous bugged version of this tool. Have a nice day! Edited January 26, 2018 by newcoleco Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.