Jump to content
rensoup

Any compressor between rle and lz4 ?

Recommended Posts

5 hours ago, elmer said:

LZ4 was never designed to produce the best compression ratios, it was specifically designed for both fast compression and decompression in order to reduce the memory usage of databases and other large datasets on modern PCs and servers.

I wasn't criticizing LZ4, merely those newer compressors. LZ4 is useful obviously, but for someone to call their compressor 'ultra' or 'small' when their gains are less than 1% is upselling a little.

 

4 hours ago, elmer said:

I suspect that the problem that you've found with my aPLpak format is because you only skipped the first 4 bytes of header info instead of the 12 bytes that you can skip if you are only compressing a single file.

So I did try skipping 12 bytes and indeed it started decrunching the first few bytes correctly but went wrong very quickly... must have messed up the conversion to MADS format... I'll wait for your example.

Share this post


Link to post
Share on other sites
2 hours ago, xxl said:

seems to be a popular compressor.
https://github.com/svendahl/cap.

 

Thanks for that link, I hadn't seen it.

 

It's another implementation of Jorgen Ibsen's aPLib format, but Svendahl wrote it in C# instead of C, and apparently there may be a bug in it (looking at the outstanding pull-request).

 

Emmanuel Marty references "cap" as an inspiration for his apultra (https://github.com/emmanuel-marty/apultra).

 

apultra compresses gpl-3.0.txt down to 12662 bytes, nearly 1/2 KB smaller than Jorgen Ibsen's compression code that I've been using in aplpak.

 

 

Anyway, svendahl's decompressor is fairly similar to mine, and a big improvement on Peter Ferrie's code.

 

My decompressor should be a little faster than svendahl's because I inline a bit more, although looking at his, I expect that the code sizes are very similar.

 

Apart from a few other things in there, just his use of a subroutine call to read bytes from the compressed source should cost him 4 extra frames to decompress gpl-3.0.txt! ;-)

Share this post


Link to post
Share on other sites
52 minutes ago, rensoup said:

So I did try skipping 12 bytes and indeed it started decrunching the first few bytes correctly but went wrong very quickly... must have messed up the conversion to MADS format...

 

Or there's a bug in my decompressor! ;-)

Share this post


Link to post
Share on other sites

Hi!

Quote

To follow up on my original question...

 

I was looking to compress 2 types of data and decompress them in realtime:

 

1. music: which @dmsc masterfully solved with LZSS

2. sprite data.

 

There may probably not be any good solution for 2.

For a single uncompressed 200 bytes frame, LZ4 can't do much, it gave almost no compression. I also tried @Irgendwer's autogamy which compressed slightly better than LZ4. Only deflate gave reasonable results but the decompression time was astronomical.

I got almost as good results as deflate by simply removing empty bytes and storing an extra byte per line (sprites are 8 bytes large max) but the prospect of having to write a sprite routine for this format was too daunting so I gave up.

 

Trying to compress short data (200 bytes is really short) with a generic compressor is not a good idea, as the compressor would not have enough bytes to take advantage of the structure of the data.

 

So, I advice you to use a custom compression format, it is not that difficult.

 

One idea for sprite data: Suppose that consecutive frames of an sprite have some bytes repeated, and that there are a lot of bytes with the value "0".

 

You can design a decompressor as:

 

- Start with one "frame" with all "0".

 

- For each byte to decompress, read one bit:

-- If the bit is 1, copy from the last sprite frame into the current byte

-- If the bit is 0, read one byte and store into the current byte.

 

In 6502 that should be:

; Size of each sprite frame
SPRITE_SIZE = 16

; Decompress one sprite frame
  ldx #SPRITE_SIZE-1
loop:
  jsr read_bit
  lda sprite - SPRITE_SIZE, x
  bcc store
  jst read_byte
store:
  sta sprite, x
  dex
  bpl loop

 

The compressor is also trivial to write.

 

Have Fun!

 

Edited by dmsc

Share this post


Link to post
Share on other sites
2 hours ago, rensoup said:

So I did try skipping 12 bytes and indeed it started decrunching the first few bytes correctly but went wrong very quickly... must have messed up the conversion to MADS format... I'll wait for your example.

 

Here you go, I hope that it works for you!

 

aplib_6502.mad

  • Thanks 1

Share this post


Link to post
Share on other sites
1 hour ago, elmer said:

Here you go, I hope that it works for you!

yep, that did the trick, thanks!

 

I forgot to remove a '<'  ( ldx    <apl_length + 1)

 

I'm only using it on the main executable which is about 40KB uncompressed and about 23KB compressed with deflate and about 300 bytes bigger with aplib.

The major difference is decompression speed, about 4-5s for inflate and about 2s for your apl decompressor. 👍

 

For those who have tried Pop, it's the time taken by that red bar moving across the screen which I had to put there because that pause was a little awkward.

 

Share this post


Link to post
Share on other sites
2 hours ago, dmsc said:

Trying to compress short data (200 bytes is really short) with a generic compressor is not a good idea, as the compressor would not have enough bytes to take advantage of the structure of the data.

 

So, I advice you to use a custom compression format, it is not that difficult.

 

One idea for sprite data: Suppose that consecutive frames of an sprite have some bytes repeated, and that there are a lot of bytes with the value "0".

Yes it was a bad idea. Fortunately I didn't need compression in the end because all the frames seem to fit (at least in modeE format)

 

I didn't expect the display routine to be so slow either (it has to shift, mirror, clip) so any kind of decompression on top of it would have been unacceptable anyway.

 

Your technique seem to rely on reusing the previous frame content ? that could be tricky with all the possible animations in Pop (although most of them seem sequential), plus you are required to write the data in a buffer before putting it on screen.

 

For the next version I've split the sprites in 2 horizontally and used a tighter bounding box, which gave me a 15% saving (about 5KB) as well as a slight speed increase.

 

So it's all good hopefully...

 

  • Like 1

Share this post


Link to post
Share on other sites
On 12/16/2019 at 6:29 PM, rensoup said:

I'm only using it on the main executable which is about 40KB uncompressed and about 23KB compressed with deflate and about 300 bytes bigger with aplib.

The major difference is decompression speed, about 4-5s for inflate and about 2s for your apl decompressor. 👍

 

@rensoup very kindly sent me his PoP main executable to test, so I've taken a look to see how the different compressors do with a real Atari program:

 

Size   File
===========================
40,164 popcore.bin

28,227 popcore.bin.lz4ultra
28,225 popcore.bin.smallz4

26,345 popcore.bin.lzsa1

24,366 popcore.bin.pucrunch

23,929 popcore.bin.lzsa2
23,280 popcore.bin.aplib
23,174 popcore.bin.exomizer
23,012 popcore.bin.deflate

22,926 popcore.bin.apultra

 

Edited by elmer
Added pucrunch and exomizer
  • Like 4

Share this post


Link to post
Share on other sites

Also try:

 

- Exomizer (e.g. Superpacker/Exomizer by TeBe: Mad Team )

- Dj-Packer (e.g. here: AOL, 2 versions ; max. 32k length per segment)

- Fast Packer by SR-U (e.g. here: AOL , can split segments longer than 32k)

- Flash Pack 2.1 by f0x/Taquart (e.g. here: pigwa/Holmes apps , max. 32k length)

 

For Exomizer there are sources available (somewhere at bitbucket). I also had some depacker-sources (text) for Flashpack 2.1 but cannot find them atm., too bad that it is limited to a max. of 32kbytes (it packs+depacks very fast). Fast Packer also packs+depacks quite fast (it uses page 4 for depacking), maybe it uses RLE or something similar ?!?

 

Share this post


Link to post
Share on other sites
1 hour ago, CharlieChaplin said:

Also try:

 

- Exomizer (e.g. Superpacker/Exomizer by TeBe: Mad Team )

- Dj-Packer (e.g. here: AOL, 2 versions ; max. 32k length per segment)

- Fast Packer by SR-U (e.g. here: AOL , can split segments longer than 32k)

- Flash Pack 2.1 by f0x/Taquart (e.g. here: pigwa/Holmes apps , max. 32k length)

 

Sorry, but I have no interest in trying dj-packer, fast packer or flash pack ... I'll leave those tests up to you.

 

Exomizer is one of the other well-known-excellent compressors, so that was worth trying, even though its decompression speed is known to be slow (as shown earlier in this thread).

 

I've added the exomizer results to my earlier post.

 

 

 

 

 

Share this post


Link to post
Share on other sites
16 hours ago, elmer said:

 

Sorry, but I have no interest in trying dj-packer, fast packer or flash pack ... I'll leave those tests up to you.

 

Exomizer is one of the other well-known-excellent compressors, so that was worth trying, even though its decompression speed is known to be slow (as shown earlier in this thread).

 

I've added the exomizer results to my earlier post.

 

 

 

 

 

 

Well, I would do the tests, but I don't have the (40,164 Bytes) Pop main executable... which would be required for a packer comparison.

 

Attached is an older POP intro packed with Code3 Cruncher (depacking is slow/takes long).

 

POPINTRO.xex

Share this post


Link to post
Share on other sites
2 hours ago, CharlieChaplin said:

Well, I would do the tests, but I don't have the (40,164 Bytes) Pop main executable... which would be required for a packer comparison.

 

Here you go, the very same binary... I doubt any of these packers will do a better job though.

 

aplib is a bit of an oddity as it has lz4 like decompression speed with a better compression ratio than deflate (while taking less memory)

 

I've tried on the graphics files but it doesn't perform nearly as well. (.dta is uncompressed )

 

7,337 CHTAB1.dta
5,338 CHTAB1.apu
4,960 CHTAB1.dta.deflate


7,262 CHTAB2.dta
5,170 CHTAB2.apu
4,791 CHTAB2.dta.deflate


5,027 CHTAB3.dta
2,455 CHTAB3.apu
2,284 CHTAB3.dta.deflate

 

inflate is probably the best all around packer on A8

 

PoPCore.bin

Share this post


Link to post
Share on other sites
On 5/13/2019 at 7:34 PM, dmsc said:

LZ4 can be decompressed "almost" in place (with about 8 bytes of gap between compressed/decompressed data).

 

With a "small" (124 bytes) implementation, I get a little less than 49 cycles per byte, this is 500 bytes/PAL frame on GR.0 (the slowest possible), about 650 bytes/PAL frame without DMA.

 

I have finally written an LZSA2 decompressor to compare against my aPLib decompressor.

 

LZ4 definitely benefits in speed by having fewer transitions between matches and literal runs (with its 4-byte-minimum match length). That means that it can spend more time in the fast inner-loop of the copy, and less time decoding the next type of compressed data. The downside is that its compression suffers.

 

Anyway, by comparison, my aPLib decompressor runs at 69 cycles per byte on the gpl-3.0.txt test.

 

My new LZSA2 decompressor runs at 51 cycles per byte in "small" mode, dropping to 46 cycles per byte if you allow it use an extra 15 bytes of code for more inlining.

 

So very comperable to the performance of @dmsc's "small" LZ4 implementation, but with much better compression.

 

It is checked into github, but I'm not going to post it here, because I'm chatting to Emmanuel Marty about some possible changes to the LZSA2 format to make the 6502 decompressor a little bit smaller (and a tiny bit faster).

  • Like 4

Share this post


Link to post
Share on other sites

And now there's an LZSA1 decompressor, too!

 

Testing with the same gpl-3.0.txt test file that we've been using for comparison in this thread ...

 

The new LZSA1 decompressor runs at 51 cycles per byte in "small" mode (168 bytes), dropping to 39 cycles per byte in "fast" mode (205 bytes).

 

To put those results in "frames" terms, so that they can be compared to timings that @xxl posted earlier in the thread ...

 

gpl3.txt - 35147 bytes

exomizer - 12382 bytes + depacker 1 page =~ 12.3 KB, decompress 128 frames (2.6 sec)

deflate - 11559 bytes + depacker 2 pages =~ 11.8 KB, decompress 179 frames (3.6 sec)

LZ4 - 15622 bytes + depacker <150 bytes =~ 15.3 KB, decompress 55 frames (1,1 sec)

LZSA1 - 14621 bytes + depacker 168 bytes =~ 14.4 KB, decompress 50 frames (1,0 sec)

LZSA1 - 14621 bytes + depacker 205 bytes = ~ 14.5 KB, decompress 39 frames (0,8 sec)

 

 

The LZSA2 decompressor has been trimmed down a bit, and is now 256 bytes long for the "fast" mode (at the cost of a couple of % in decompression). Those couple of % can be optionally regained for the cost of an extra 11 bytes of code.

 

The LZSA2 decompressor also has another couple of optional enhancements that take off 10 bytes of length, and boost performance by 3%, but they break compatibility with the standard LZSA2 format.

 

Emmanuel Marty (understandably) doesn't want to change the LZSA2 format, so I'll probably create a fork of his LZSA for 6502 users.

 

On top of that, Emmanuel Marty *was* willing to make some optional changes to aPLib, so APULTRA now supports an "enhanced" mode that is approximately 11% faster to decompress on the 6502.

 

All of the decompressors are checked into the LZSA and APULTRA projects on github, which you can find here ...

 

https://github.com/emmanuel-marty/lzsa

https://github.com/emmanuel-marty/apultra/

 

Note: I don't use MADS, so I'll leave it up to someone else to convert the decompressors to MADS format.

 

The LZSA ones have been converted to ACME assembler format, but the aPLib one is still in PCEAS/HuC assembler format ... aka FUJIAS assembler format now, since I've added Atari .car format to the assembler so that I can develop code for Atari cartridges.

 

 

Anyway ... I *hope* that the overall result of all of this has been to show folks that LZ4 really doesn't have much of a purpose on old 8-bit and 16-bit machines, and that there are other alternatives available to developers ... including the old plain-LZSS variants, which @dmsc has shown still have a use with his LZSS-SAP utility.

 

Edited by elmer
  • Like 6

Share this post


Link to post
Share on other sites
1 hour ago, Heaven/TQA said:

maybe I am late on the party.... but did those RMT players support dual pokey streams, too?

First let's remember that it's not a RMT player, it can play any  tune from the usual trackers because it's just a compressed recording of the pokey streams.

 

Dmsc only wrote the implementation for 1 Pokey but it's piss easy to do both. it would require twice as many buffers though (2KB x number of Pokeys)

 

Small problem: Altirra doesn't record multiple Pokey, it seems easy to add from checking the Altirra source but in the meantime you can just record Pokey0, then hack RMT to send Pokey1 data to Pokey0 and record that and now you have 2 Pokey streams which you can decode on their own.

 

Altirra only records Pokey data once per frame so if you wanted to play a 100-200hz tune you'd just call the RMT/CMC/... player once per frame instead of 2 or 4 times. (Although I find the quality that 100+Hz brings not good enough to justify the cost)

 

The problem that LZS has:

It can only loop to the beginning of a song, so if you play an intro, it will play it again when looping (hack is to have 2 tunes, 1 for the intro and 1 main)

 

Another potential problem: Playing fx by overriding one of the channels, should be ok ?

 

I have mentioned the player many times because I can't stress enough what an improvement it is over standard players: everybody should use it🙂

 

  • Like 1

Share this post


Link to post
Share on other sites

You can use atari800 to record pokey registers. It's what inspired phaeron to add SAP-R recording to Altirra :)

	-pokeyrec                  Enable Pokey registers recording
	-pokeyrec-interval <n>     Sampling interval in scanlines (default: 312)
	-pokeyrec-ascii            Store ascii values (default: raw)
	-pokeyrec-file <filename>  Specify output filename (default: pokeyrec.dat)
	-pokeyrec-stereo           Record second Pokey, too (default: mono)

With atari800, you have to add your own SAP-R header manually to the recorded file. But, it can record stereo (18 bytes per interval, instead of 9), and/or NTSC (set interval to 262), and 2x,3x and 4x speed players (divide interval by 2,3 or 4).

 

atari800's pokey emulation is sub-par, but it doesn't matter how it sounds. You just want to record the registers.

 

  • Like 2

Share this post


Link to post
Share on other sites

yeah sorry for calling the stream player "RMT" but basicly because RMT has its pitfalls with the cycle footprint... ;) doesnt matter which music driver used but stereo... i mean 20 scanlines (simple calc) for having stereo RMT music... now that would be something my guys in Desire would love to use... :D

 

 

Share this post


Link to post
Share on other sites
2 hours ago, Heaven/TQA said:

yeah sorry for calling the stream player "RMT" but basicly because RMT has its pitfalls with the cycle footprint... ;) doesnt matter which music driver used but stereo... i mean 20 scanlines (simple calc) for having stereo RMT music... now that would be something my guys in Desire would love to use... :D

 

 

stereo at the CPU cost of mono 🙂 (at twice the memory cost though)

Share this post


Link to post
Share on other sites
35 minutes ago, rensoup said:

stereo at the CPU cost of mono 🙂 (at twice the memory cost though)

 

35 minutes ago, rensoup said:

stereo at the CPU cost of mono 🙂 (at twice the memory cost though)

You know... on machines built 70s we always balance out speed vs RAM ;) 

  • Like 1

Share this post


Link to post
Share on other sites

Since ZX0 exists now, it seems like a good time to revisit compression results ...

 

==============================

28,227 popcore.bin.lz4ultra

26,345 popcore.bin.lzsa1

23,929 popcore.bin.lzsa2
23,280 popcore.bin.aplib
23,174 popcore.bin.exomizer
23,012 popcore.bin.deflate

22,926 popcore.bin.apultra
22,646 popcore.bin.zx0

==============================

15,509 gpl-3.0.lz4ultra

14,621 gpl-3.0.lzsa1

13,372 gpl-3.0.lzsa2
13,148 gpl-3.0.aplib

12,971 gpl-3.0.zx0
12,658 gpl-3.0.apultra
12,382 gpl-3.0.exomizer

11,559 gpl-3.0.deflate

==============================

 

Looking at those results, and at just how annoyingly slow the ZX0 compressor is currently, then IMHO if a programmer needs better compression than LZSA-2, I guess that it would be best (for the moment) to compress files with APULTRA for development, and then try out ZX0 when your project is nearly ready to ship.

  • Like 1

Share this post


Link to post
Share on other sites

yes, the ZX0 becomes remarkable thanks to its speed of decompression and short decompressor (it requires no buffers) and a compression ratio comparable to aPLib.

 

for use, if you don't need to decompress from a file (memory only) then this can be simplified.

 

https://xxl.atari.pl/zx0-decompressor/

 

and an example of using decompression while loading

bootloader-zx0.atr

  • Like 2

Share this post


Link to post
Share on other sites

Hi!

1 hour ago, xxl said:

yes, the ZX0 becomes remarkable thanks to its speed of decompression and short decompressor (it requires no buffers) and a compression ratio comparable to aPLib.

 

for use, if you don't need to decompress from a file (memory only) then this can be simplified.

 

https://xxl.atari.pl/zx0-decompressor/

Only a quick review, you can replace the last part with:

dzx0s_elias   inc   lenL
dzx0s_elias_loop
              asl   @
              bne   dzx0s_elias_skip
              jsr   xBIOS_GET_BYTE
              sec
              rol   @
dzx0s_elias_skip
              bcc   dzx0s_elias_backtrack
              rts
dzx0s_elias_backtrack
              asl   @
              rol   lenL
              rol   lenH
              jmp   dzx0s_elias_loop

Faster and one byte less. See attached.

 

Have Fun!

bootloader-zx0-dmsc.atr

  • Like 3

Share this post


Link to post
Share on other sites

PHP/PLP corrected. Thanks !!!


:D

 

if decompress from memory and not a file then we can remove this SEC too.

Share this post


Link to post
Share on other sites

I've now written, and experimented with optimizing, a decompressor for ZX0 ... and it's an interesting format. :)

 

In comparison to the Z80, our 6502's branches are faster, but the lack of registers hurts us when it comes to all of the bit-shifting that is used in ZX0's Elias-gamma coding.

 

That means that the loop-unrolling seen in the ZX0's Z80 decompressors doesn't really help us much on the 6502, especially in comparison to the increase in code size.

 

Heck, even simple inling of some of the gamma decoding doesn't help much, although optimizing the gamma decoding loop itself did bring a worthwhile speedup (i.e. %age speed gained was at least as good as %age increase in code size).

 

Just like my previous decompressors, the code has been written to specifically allow for decompression from banked cartridges, or the Atari's banked memory.

 

 

The decompressor is 192 bytes long, and since it is so small, I'm not really bothering to provide a build option for speed/size.

 

Testing shows that ZX0 is about 15% faster to decompress than aPLib, but only about 5% faster than the "enhanced" aPLib format that Emmanuel Marty's APULTRA supports (with the "-e" flag).

 

Then again, the DX0 decompressor is 78 bytes shorter than my "enhanced" aPLib decompressor, so I have to conclude that DX0 is a pretty impressive format! ;-)

 

 

In comparison with LZSA2, ZX0 is about 30% slower to decompress, which is pretty good, but it makes me think that there will still be times where a programmer would choose to use LZSA2 or LZSA1 for their faster decompression speed.

 

zx0_6502.mad

Edited by elmer
New code file, because I missed one of the page-increment macros.
  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...