DAN3 Lossless Data Compression (tool and sample)

newcoleco · December 4, 2017

DAN3 Lossless Data Compression
Programmed by Daniel Bienvenu, 2017.

What is DAN3?
DAN3 is a LZ77/LZSS variant using unary encoding, k=1 Exp-Golomb and various sizes relative offsets without the need for extra memory prios to decompress data. Details below.

Ask questions, post comments, share experiences.

Download

DAN3 build 20180126 (de)compression tool final? (BIN + SRC) : dan3_20180126.zip

DAN3 build 20180123 (de)compression tool *BUG* (BIN + SRC) : dan3_20180123.zip

DAN3 build 20180118 (de)compression tool *BUG* (BIN + SRC) : dan3_20180118.zip

(previous version) DAN3 compression tool *BUG* only experimental beta (BIN) : dan3beta.zip

ColecoVision Slide Show Sample (SRC, BIN) : SLIDESHOWDAN3.zip

Technical Information
DAN3 is a LZ77/LZSS variant developed based on DAN1 and DAN2, which explains their similarities.
The format starts, like in DAN2, with defining what's the size in bits for the large offset values used to identify a match with the following table with a unary code (sequence of bits that looks like this):

0 : 9 bits long <- 512 bytes (characters on a screen).
10 : 10 bits long <- 1K (character set, some bitmap screens)
110 : 11 bits long <- 2K (most bitmap screens)
1110 : 12 bits long <- 4K (dithered bitmap screens)
11110 : 13 bits long <- 8K (my decompression routine limit as-is)
111110 : 14 bits long <- 16K
1111110 : 15 bits long <- 32K
11111110 : 16 bits long <- 64K (only good for 32bits/64bits PC files at this point)
... (no limit in theory)

Then, like in DAN2, the first byte of the uncompressed data is stored as is.

Afterward, things are a little different, which makes DAN3 different than the others; offering a better compression on average but not always.

For each match, the size and the relative offset values are encoded. While DAN1 and DAN2 are using Elias Gamma to encode the size value, DAN3 is using a k=1 Exp-Golomb encoding instead, which somehow helps to optimize a little bit both in term of space and time to decode. As for the relative offset values, DAN3 is using a completely different set of possible bit-size offsets; using {5, 8, max} instead of {1, 4, 8, max} bits to encode offsets. As for the special case of a nearby single byte being the same as the current byte to encode, DAN3 limits to the first 3 nearest bytes, instead of the 18 nearest bytes, which limits its potential to find a match and save space, but since the big impact is with sequences of more than 2 bytes, this change has no impact besides offering a better compression than Pletter and the others that do not support sequences of a single byte, acting if you like as a local fixed Huffman encoding.

Here's a comparison side by side of Elias Gamma and k=1 Exp-Golomb to show you what I mean by size and speed possible gain in DAN3 since reading fewer bits means taking less time to decode.

Elias Gamma (DAN1 and DAN2) versus k=1 Exp-Golomb (DAN3)

1 : 10 = size 1
010 : 11 = size 2
011 : 0100 = size 3
00100 : 0101 = size 4
00101 : 0110 = size 5
00110 : 0111 = size 6
00111 : 001000 = size 7
0001000 : 001001 = size 8
0001001 : 001010 = size 9
0001010 : 001011 = size 10
0001011 : 001100 = size 11
0001100 : 001101 = size 12
0001101 : 001110 = size 13
0001110 : 001111 = size 14
0001111 : 00010000 = size 15
000010000 : 00010001 = size 16
...
000000011111110 : 00000011111111 = size 254

In DAN3, the support of Exp-Golomb stops at 00000011111111 = size 254. There are 3 reasons for that:

It allows an optimization of decoding only into a single byte instead of supporting to carry the bits into a 16 bits register pair. In Z80 opcodes, that simplifies the decoding routine, making it faster and smaller.
It makes the specials markers for END OF DATA and RLE about a byte smaller and faster to read.
Very large sequences of nothingness sadly will need more than one match of size 254 each but since we're talking about compressing our elaborated graphics which are mostly not that empty, the limitation should be barely an issue and satisfy plenty our needs.

Special Markers

00000001 + byte : RLE - Copy raw the next (byte value + 1) bytes
00000000 : END OF DATA

Relative Offset
For a size of 1 byte, the relative offset is either 0 (the byte just before), 1, or 2 encoded respectively as 0, 10, and 11. For sizes of 2 or more, the offset is encoded as follow:

10 + 5 bits = 5-bits Offset
0 + byte = 8-bits Offset = byte + 32
11 + N bits + byte = large Offset = (N bits and byte together as a 9 or more bits value) + 288

Listing
Decompression to VRAM Routine for Z80, using BE and BF ports (ColecoVision)
Written in SDCC style, to be added to your compression toolbox.
DAN3 decompression routine is only 14 bytes bigger than DAN1 decompression.

; dan3.s
; DAN3 Lossless Data Compression Format by Daniel Bienvenu

; DAN3 Decompression to VRAM 
; 6 December, 2017
; Size: 201 bytes

; HL = SOURCE
; DE = DESTINATION

	; global from this code
	;================
	.globl  _dan3
	; void dan3 (void *data, unsigned vram_offset);
	.globl  dan3 ;  HL = ADDR. TO COMPRESSED DATA , DE = DESTINATION IN VRAM

;	Wrapper to get values from parameters (register pairs)
_dan3:
	pop	bc
	pop	de
	pop	hl
	push	hl
	push	de
	push	bc

; 	HL = SOURCE
;	DE = DESTINATION
dan3:
	; Set Write in VRAM at DE
	ld	c,#0xbf
	out	(c),e
	set	6,d
	out	(c),d
	res	6,d
	; Set A for reading a bit routine
	ld	a,#0x80
	; Important to save the IX register
	push	ix
	ld	ix, #get_bit_e+3
dan3_offsetsize_loop:
	dec	ix
	dec	ix
	dec	ix
	call    get_bit            ; check next bit
	jr	c,	dan3_offsetsize_loop

; Copy literal byte
dan3_copy_byte:
	ld	b,#0x01

dan3_literal2main:
	ld	c,#0xbe
dan3_literals_loop:
	outi
	inc	de
	jr	nz,	dan3_literals_loop

; Main loop
dan3_main_loop:
	call    get_bit            ; check next bit
	jr	c,dan3_copy_byte

; Decode Exp-Golomb + Special Marker
	push	de
	ld	de, #0x0001
	ld	b,e
dan3_expgolomb_0:
	inc	b
	call    get_bit            ; check next bit
	jr      c, dan3_expgolomb_value
	bit	3,b
	jr	z, dan3_expgolomb_0

; Special Marker
	pop	de
	call    get_bit            ; check next bit
	jr	c, dan3_literals
	pop	ix
	ret	; EXIT
dan3_literals:
	ld      b, (hl)                 ; load counter value (8 bits)
	inc	hl
	inc	b
	jr	dan3_literal2main

dan3_expgolomb_value:
	dec	b
dan3_expgolomb_value_loop:
	call	get_bit_e			; check next bit -> DE
	djnz	dan3_expgolomb_value_loop
	dec	e
	push	de
	pop	bc
	jr	z, dan3_offset1
	
	; D = 0, E = ??, BC = LENGTH
	
; Decode Offset value
	ld	e,d				; e = 0
	call	get_bit			; check next bit
	jr	nc, dan3_offset3
	call	get_bit
	jr	nc, dan3_offset2
	call	get_highbits_e		; read some bits -> E
	inc	e
	ld	d,e				; D = E + 1
dan3_offset3:
	ld      e, (hl)			; load offset offset value (8 bits)
	inc     hl
	ex	af, af'
	ld	a,e
	add	a,#0x20			; Skip the short offsets covered by 5 bits ones
	ld	e,a
	jr	nc, dan3_offset_nocarry
	inc	d
dan3_offset_nocarry:	
	ex	af, af'
	jr	dan3_copy_from_offset
dan3_offset2:
	call	get_5bits_e			; read 5 bits -> E
	jr	dan3_copy_from_offset
dan3_offset1:
	call    get_bit            ; check next bit
	jr	nc, dan3_copy_from_offset
	call	get_bit_e
	inc	e
		
; Copy previously seen bytes
dan3_copy_from_offset:
	ex      (sp), hl                ; store source, restore destination
	push    hl                      ; store destination
	scf
	sbc     hl, de                  ; HL = source = destination - offset - 1
	pop     de                      ; DE = destination
	; BC = count
	; COPY BYTES
	ex	af,af'
	set	6,d
dan3_copybytes_loop:
	push	bc
	ld	c,#0xbf
	out	(c),l
	nop
	out	(c),h
	inc	hl
	nop
	nop
	in	a,(#0xbe)
	nop
	nop
	nop
	out	(c),e
	nop
	out	(c),d
	inc	de
	nop
	nop
	out	(#0xbe),a
	pop	bc
	dec	bc
	ld	a,b
	or	c
	jr	nz, dan3_copybytes_loop
	res	6,d
	ex	af,af'
	pop	hl		; restore source address (compressed data)
	jp	dan3_main_loop

get_highbits_e:
	jp	(ix)

;	COVER 16K
;	call	get_bit_e	; get next bit -> E
;	COVER 8K
get_5bits_e:
	call	get_bit_e	; get next bit -> E
	call	get_bit_e	; get next bit -> E
	call	get_bit_e	; get next bit -> E
	call	get_bit_e	; get next bit -> E
get_bit_e:
	call	get_bit	; get next bit
	rl      e		; push bit into E
	ret

; get a bit
get_bit:
	add	a,a
  	ret	nz
	ld	a,(hl)
	inc	hl
	rla
	ret

Comparison with DAN1

DAN3 to VRAM decompression routine is only 14 bytes more than the one for DAN1 to VRAM.

As for the compression ratio, it really depends on the data.

For example, here's a table showing the size obtained with DAN1 and DAN3 for each picture in the SlideShow sample.

awb1p: DAN1 3677+2298, DAN3 3689+2328
f1spp: DAN1 2366+1679, DAN3 2349+1664
h6exp: DAN1 3412+2313, DAN3 3398+2297
mgfap: DAN1 3554+1935, DAN3 3551+1928
sotbp: DAN1+ 3394+1956, DAN3 3381+1921

Updates
* Dec 5, 2017 - Added Offset encoding details.
* Dec 6, 2017 - Bug-Fixed and optimized ASM decompression routine. Added comparison with DAN1 for the SlideShow sample.

* Jan 18, 2018 - Updated (de)compression tool

* Jan 23, 2018 - Fixed fast compression method "-f" to be closer to perfect compression optimization

* Jan 26, 2018 - Fixed RLE compression, now provides the expected results for hard to compress data

Edited January 26, 2018 by newcoleco

R.Cade · December 5, 2017

What is the Weissman score?

newcoleco · December 5, 2017

What is the Weissman score?

A fan of Silicon Valley TV series.

I still have to watch it, and my library has it to be watched freely, but I never find the time to do it.

You can compare GZIP and DAN3 as being basically the same since they are both variants of LZ77/LZSS. If the TV series mention a score to GZIP, which is the format we use in our Internet communications, then you have basically the Weissman score of DAN3... if only I was not the only one working on this and many tools were done supporting DAN3 format for various needs.

Since the compression time is meaningless compared to the resulting size to be used in our homebrew projects (decompression routine + data), I will not even try to give a Weissman score.

My compression tool here is written in a way to try to optimize the compression ratio regardless the time it will take to do so. Once compressed, we put together the compressed data and the decompression routine in our homebrew projects, which is the part that affects the users' experience, to save in loading time if on tapes and disks, and to offer more content (graphics, levels, text) inside the limited space of the memory chips of our cartridges.

As for the decompression time, it's about the same as the other LZ77-LZSS variants, sometimes faster, sometimes slower, making really the compression ratio the most important part and that's what I've focussed on.

DAN3 compression ratio tends to be closer to PuCrunch and Exomizer than Pletter, ZX7, ApLib, MegaLZ and others, but it uses a format close to the latter group and it doesn't need extra memory space like Exomizer to set up a table of values in memory. I believe that the difference is mostly because DAN3 tries to even compress a single byte rather than just sequences of 2 or more bytes. At the extreme, DAN3 will give worse compression ratio results like MegaLZ will do for meaningless data like a text file with the only the letter Q thousands of times. But since I'm not concerned about meaningless data and expect DAN3 to be used to compress even better-detailed content (hi-res graphics and elaborated levels for example), I'm quite confident that DAN3 fits for our needs in average, even if not always the best solution.

newcoleco · December 20, 2017

By reusing what works from DAN1 and DAN2, I've missed an optimization with the DAN3 to VRAM routine. At the same time, why fix what is not broken. So, use the previous version that works just fine just to be sure.

The optimization I've missed is the maximum sequence length allowed in DAN3, which is only 254 bytes, an 8-bits value, so there is no need for a full 16-bits pair register BC in the loop. So, we can remove the "pop bc" and "push bc" and get a smaller routine. It should not really improve the decompression speed considering how tricky dealing with Video RAM timing can be.

According to the Z80 documentation, "DJNZ" is 13 cycles long when the register B is not equal to zero, and 3 NOPs (the typical delay) is 12 cycles, so this code here doesn't need an extra delay like "NOP" before the "DJNZ" opcode.

*TESTED* IT WORKS ON A REAL COLECOVISION GAME SYSTEM JUST FINE

*Updated: December 24, 2017 - fixed a few lines in order to use only B and not BC for match length value. Routine even smaller and faster now.

; dan3.s
; DAN3 Lossless Data Compression Format by Daniel Bienvenu

; DAN3 Decompression to VRAM 
; 24 December, 2017
; Size: 195 (?) bytes

; HL = SOURCE
; DE = DESTINATION

	; global from this code
	;================
	.globl  _dan3
	; void dan3 (void *data, unsigned vram_offset);
	.globl  dan3 ;  HL = ADDR. TO COMPRESSED DATA , DE = DESTINATION IN VRAM

;	Wrapper to get values from parameters (register pairs)
_dan3:
	pop	bc
	pop	de
	pop	hl
	push	hl
	push	de
	push	bc

; 	HL = SOURCE
;	DE = DESTINATION
dan3:
	; Set Write in VRAM at DE
	ld	c,#0xbf
	out	(c),e
	set	6,d
	out	(c),d
	res	6,d
	; Set A for reading a bit routine
	ld	a,#0x80
	; Important to save the IX register
	push	ix
	ld	ix, #get_bit_e+3
dan3_offsetsize_loop:
	dec	ix
	dec	ix
	dec	ix
	call    get_bit            ; check next bit
	jr	c,	dan3_offsetsize_loop

; Copy literal byte
dan3_copy_byte:
	ld	b,#0x01

dan3_literal2main:
	ld	c,#0xbe
dan3_literals_loop:
	outi
	inc	de
	jr	nz,	dan3_literals_loop

; Main loop
dan3_main_loop:
	call    get_bit            ; check next bit
	jr	c,dan3_copy_byte

; Decode Exp-Golomb + Special Marker
	push	de
	ld	de, #0x0001
	ld	b,e
dan3_expgolomb_0:
	inc	b
	call    get_bit            ; check next bit
	jr      c, dan3_expgolomb_value
	bit	3,b
	jr	z, dan3_expgolomb_0

; Special Marker
	pop	de
	call    get_bit            ; check next bit
	jr	c, dan3_literals
	pop	ix
	ret	; EXIT
dan3_literals:
	ld      b, (hl)                 ; load counter value (8 bits)
	inc	hl
	inc	b
	jr	dan3_literal2main

dan3_expgolomb_value:
	dec	b
dan3_expgolomb_value_loop:
	call	get_bit_e			; check next bit -> DE
	djnz	dan3_expgolomb_value_loop
	dec	e
    ld b, e
    dec    e
	jr	z, dan3_offset1
	
	; D = 0, E = ??, B = LENGTH
	
; Decode Offset value
	ld	e,d				; e = 0
	call	get_bit			; check next bit
	jr	nc, dan3_offset3
	call	get_bit
	jr	nc, dan3_offset2
	call	get_highbits_e		; read some bits -> E
	inc	e
	ld	d,e				; D = E + 1
dan3_offset3:
	ld      e, (hl)			; load offset offset value (8 bits)
	inc     hl
	ex	af, af'
	ld	a,e
	add	a,#0x20			; Skip the short offsets covered by 5 bits ones
	ld	e,a
	jr	nc, dan3_offset_nocarry
	inc	d
dan3_offset_nocarry:	
	ex	af, af'
	jr	dan3_copy_from_offset
dan3_offset2:
	call	get_5bits_e			; read 5 bits -> E
	jr	dan3_copy_from_offset
dan3_offset1:
	call    get_bit            ; check next bit
	jr	nc, dan3_copy_from_offset
	call	get_bit_e
	inc	e
		
; Copy previously seen bytes
dan3_copy_from_offset:
	ex      (sp), hl                ; store source, restore destination
	push    hl                      ; store destination
	scf
	sbc     hl, de                  ; HL = source = destination - offset - 1
	pop     de                      ; DE = destination
	; BC = count
	; COPY BYTES
	ex	af,af'
	set	6,d
	ld	c,#0xbf
dan3_copybytes_loop:
	out	(c),l
	nop
	out	(c),h
	inc	hl
	nop
	nop
	in	a,(#0xbe)
	nop
	nop
	nop
	out	(c),e
	nop
	out	(c),d
	inc	de
	nop
	nop
	out	(#0xbe),a
	djnz	dan3_copybytes_loop
	res	6,d
	ex	af,af'
	pop	hl		; restore source address (compressed data)
	jp	dan3_main_loop

get_highbits_e:
	jp	(ix)

;	COVER 16K
;	call	get_bit_e	; get next bit -> E
;	COVER 8K
get_5bits_e:
	call	get_bit_e	; get next bit -> E
	call	get_bit_e	; get next bit -> E
	call	get_bit_e	; get next bit -> E
	call	get_bit_e	; get next bit -> E
get_bit_e:
	call	get_bit	; get next bit
	rl      e		; push bit into E
	ret

; get a bit
get_bit:
	add	a,a
  	ret	nz
	ld	a,(hl)
	inc	hl
	rla
	ret

Edited January 27, 2018 by newcoleco

newcoleco · January 18, 2018

UPDATE

January 23, 2018

Fixed DAN3 (de)compression tool final(?) version with source code.

Fixed the fast compression method "-f" to get better compression ratio.
Modified the help text "-h" to show DAN3 instead of DAN37.

January 18, 2018

Added DAN3 (de)compression tool final(?) version with source code.

Limit the maximum of bits used for large offset values. " -m 13 " (13 bits max which covers 8K like the ASM decompression code posted).
Remove support of RLE with " -r ".
Decompress a DAN3 file with " -d ".
Faster compression algorithm (experimental), less optimization, with " -f ".
The help information " -h " mentions this tool as "DAN37" which was its project branch name on my computer.

REMINDER

This is not a miracle solution. DAN3 performs better with elaborated graphics rather than simple and empty screens (use DAN2 in that case).

I hope you enjoy the new version of this DAN3 tool. Recompile and adapt it based on your needs.

I would love to hear news from people who actually used my compression tools since DAN1.

Edited January 23, 2018 by newcoleco

newcoleco · January 26, 2018

WARNING

MAJOR UPDATE JANUARY 26, 2018

FIXED BUG IN RLE COMPRESSION - PLEASE UPDATE TO BUILD # 20180126

An error of logic in the DAN3 compression tool causing no possible RLE compression resulting to artificially inflated compressed files, and also the decompression " -d " mode of the same tool refused to decode any RLE codes from the compressed files. Both problems are now fixed and should be quite bug-free.

Sorry for any bad compression ratio you may have experienced with a previous bugged version of this tool.

Have a nice day!

Edited January 26, 2018 by newcoleco

Sign In

DAN3 Lossless Data Compression (tool and sample)

Recommended Posts

newcoleco

Link to comment

Share on other sites

R.Cade

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

newcoleco

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More