Jump to content

Recommended Posts

So I had a few K of data, a font and some G2F screen data, and noticed it had lots of repeated runs interspersed with sequential runs of bytes.  So I made this RLE decoder for it, and saved about 32% (on this particular data).  Maybe someone will find it useful.  It doesn't handle runs of non-repeated, non-sequential data, as I didn't have many.  So it was sort of purpose-built.  Uses 5 zero page locations.

 

Quote

 

RLEDECOMP        SUBROUTINE

                 ldy    #1
                 lda    (src_addr),Y     ; load the data byte first
                 pha                          ; and push it
                 dey
                 sty    temp               ; disable the incrementer
                 lda    (src_addr),Y    ; get the command
                 bne    .ahead3
                 pla                         ; pop 'A' and then,
                 rts                         ; exit when CMD is $00
.ahead3     asl
                 bcc    .ahead2        ; if high bit set,
                 inc    temp            ; enable the incrementer
.ahead2     lsr
                 tax
                 pla                        ; pop the data byte
.rle1          sta    (dst_addr),Y
                ;clc                        ; unneeded I guess??
                 adc    temp           ; add the increment
                 inc    dst_addr
                 bne    .ahead1
                 inc    dst_addr+1
.ahead1     dex
                bne    .rle1
                ;clc                        ; apparently unneeded also?
                 lda    #2
                 adc    src_addr
                 sta    src_addr
                 txa                        ; faster than LDA #0
                 adc    src_addr+1
                 sta     src_addr+1
                 bne    RLEDECOMP  ; unconditional

 

 

Sorry about the formatting, I couldn't figure out how to do that on this board.  I think the two CLC instructions are unneeded, it works for me without them.  I also don't have an encoder, because I did it by hand, hahaha...  Just posting it in case someone finds it useful.

 

EDIT FOR EXAMPLE:

 

.BYTE 3,$01,128+5,$04 decodes to: 01 01 01 04 05 06 07 08

Edited by glurk
  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites

But it's an ASL soon followed by an LSR.  So I guess the same zero that went in comes back out !  Hahaha.

 

It's the second CLC that I wonder about.  I left them in there just in case so I can put them back in.

 

EDIT:  Looking at it again, the carry should still be clear at that point (I think)

Edited by glurk
  • Like 1

Share this post


Link to post
Share on other sites

If you have cc65, I have a version of the Stac technique online, but it's for CBM machines and will need to be converted for the Atari.  I can give you a link.  Are you interested?

Share this post


Link to post
Share on other sites

Don't know much about 'Stac' but example code is always good...  This one really came about by accident, and it worked really well for me, so I cleaned up the code and posted.  I tend to like using code I write myself, so I can (usually) understand what it's doing :)

  • Like 1

Share this post


Link to post
Share on other sites

If you're asking me in particular, I'm no expert on data compression at all.  I wrote something similar to the above on an Apple ][ years ago, but this was something I just hacked together...  I'm sure there are others here that know much more about this kinda stuff.

  • Like 1

Share this post


Link to post
Share on other sites

glurk: I thank you for your response.  I was not asking you in particular.  I was asking the group.  I'm asking because I want to improve it.  I tried MTF and some of my own ideas.  Only one of them worked.  It was simply an optimization for performance that performs extra checks to make sure LZ77 blocks were efficient.  I need to redo it.

Share this post


Link to post
Share on other sites

RLE Encoder 1.2

http://madteam.atari8.info/index.php?prod=uzytki

 

the fastest and shortest

rle_depacker

    sta inputPointer+1

loop    jsr getByte
    beq stop
    lsr @

    tay
lp0    jsr getByte
lp1    sta $ffff
outputPointer    equ *-2

    inw outputPointer

    dey
_bpl    bmi loop

    bcs lp0
    bcc lp1

getByte    inx
    sne
    inc inputPointer+1

    lda $ff00,x        ; lo(inputPointer) = 0 !!!
inputPointer    equ *-2

stop    rts

Share this post


Link to post
Share on other sites

tebe-

 

I wrote this one to run from ROM (cartridge-based), so no self-modifying code, plus it decodes runs like $00,$00,$32,$33,$34,$35 which I didn't see any other schemes doing, really.  So hopefully I'm not re-inventing the wheel.  I even looked at NES stuff, which uses RLE a lot, but that stuff is too different architecture.

  • Like 1

Share this post


Link to post
Share on other sites

Replying to myself, I've worked on this some more and got it to also handle runs of arbitrary data.  Here's a before/after sample of g2f data:

 

(128 BYTES)
00 00 00 00 00 01 02 00 00 00 3B 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 07 08 09 0A 3C 3D 3E 3F 40 0F 41 11 3F 42 0B 02 15 43 44 45 0F 41 11 00 00 00 00 00
00 00 00 00 17 18 19 1A 46 47 48 49 4A 4B 4C 4D 46 4E 4F 26 23 50 51 52 53 4C 4D 00 00 00 00 00
00 00 00 00 29 2A 2B 2C 54 55 56 35 57 58 59 5A 5B 5C 34 38 5D 5E 5F 2F 00 59 5A 00 00 00 00 00

(72 BYTES)
05 00 82 01 03 00 01 3B 08 00 01 04 0F 00 84 07 85 3C 48 0F 41 11 3F 42 0B 02 15 83 43 43 0F 41
11 09 00 84 17 88 46 01 46 82 4E 42 26 23 84 50 82 4C 09 00 84 29 83 54 01 35 86 57 42 34 38 83
5D 42 2F 00 82 59 05 00

I think 128 bytes down to 72 is 44% compression.  That's pretty good, IMO.  It's really intended for g2f data, which generates a lot of runs of consecutive bytes.  I haven't cleaned up the code yet, but it works.  But no-one seems interested in it anyway, haha....  🙄

  • Like 1
  • Thanks 2

Share this post


Link to post
Share on other sites
3 hours ago, glurk said:

But no-one seems interested in it anyway, haha....  🙄

it's not true

Share this post


Link to post
Share on other sites

Ok, here's the new version then.  It even uses an (optional) undocumented opcode, which saves 1 byte and 2 cycles.  I guess I need to write an encoder to go with it, but not sure how to go about that...  As a Windows program, or in C, or something else? 

 

Anyway, this code is as tight and fast as I can make it.  Someone more clever than myself might be able to optimize further:

 

RLEDECOMP        SUBROUTINE

            ldy    #1
            lda    (TEMP3),Y        ; load the data byte first
            pha                ; and push it
            dey
            sty    TEMP5            ; disable the incrementer
            lda    (TEMP3),Y        ; get the command
            bne    .ahead3
            pla                ; pop 'A' and then,
            rts                ; exit when CMD is $00
.ahead3        asl
            bmi    .dorun
            bcc    .ahead2        ; if high bit set,
            inc    TEMP5            ; enable the incrementer
.ahead2        lsr
            tax
            pla                ; pop the data byte
.rle1            sta    (TEMP1),Y
            adc    TEMP5            ; add the increment
            inc    TEMP1
            bne    .ahead1
            inc    TEMP2
.ahead1        dex
            bne    .rle1
            lda    #2
            adc    TEMP3
            sta    TEMP3
            bcc    RLEDECOMP
            inc    TEMP4
            bne    RLEDECOMP        ; unconditional
.dorun        .byte $4B,$7F        ; undocumented ASR #$7F, same as AND #$7F, LSR
            tax
            pla
            inc    TEMP3
            bne    .drloop
            inc    TEMP4
.drloop        inc    TEMP3
            bne    .ahead4
            inc    TEMP4
.ahead4        sta    (TEMP1),Y
            lda    (TEMP3),Y
            inc    TEMP1
            bne    .ahead6
            inc    TEMP2
.ahead6        dex
            bne    .drloop
            beq    RLEDECOMP        ; unconditional

TEMP1 and 2 are the destination, TEMP3 and 4 are the source, TEMP5 is the "increment-er" - all zero-page locations.

 

$00 = END of stream

$01-$3F = Repeat the following byte that number of times

$41-$7F = Dump the following n-$40 bytes directly

$81-$FF = Repeat the following byte n-$80 times, incrementing by 1 each byte

 

I (think) that's an accurate description.  In any case, I'm using it and it works...

Edited by glurk

Share this post


Link to post
Share on other sites

There was a slight bug in the above code, the relevant part should be this:

 

.ahead3        asl
            bcc    .ahead2        ; if high bit set,
            inc    TEMP5            ; enable the incrementer
.ahead2        bmi    .dorun
            lsr
            tax

Harry Potter-

This is just a de-compressor for a certain type of data, with specific characteristics.  Graph2Font generated data primarily.  It wouldn't be great at general purpose stuff.  It's "kind of" based on RLEINC for the NES, but they decompress into a 'port' (a single address) on their PPU, and this one works with src/dest addresses.

 

I probably need to test it more to make sure there are no other bugs like the one I caught above.  It's really easy to make an "off-by-one" error that messes up the whole thing.  I haven't tested every possible case, but I think it's solid now(?)  I mean, I'm using it in a program now, and it's working with the data there...

Share this post


Link to post
Share on other sites

The way I look at it - at least for "game stuff", which this is - is that it's OK for the COMPRESSOR to be slow, since more time can/should allow it to compress better, and you only do that part once anyway.  But you want the in-game DECOMPRESSOR to be really fast.

 

That said, I "invented" this scheme in my head, and then wrote a decoder for it.  Writing the code to actually do the compression itself is a whole other thing.  Recursion, ouch, makes my head hurt.  Plus, I guess it should be a Windows-based compressor, and Windows programming is not my thing at all.

 

I actually encoded/compressed my own data by hand, which was not that big a deal, really, mostly involved removing bytes from a text file of data and putting in the counts etc...  For this to really be useful for others, a compressor will need to be written, but I'm not sure how to do that part of it, hahaha.  Never written one before, I just came up with the (simple, really) RLE-variant compression scheme.

Share this post


Link to post
Share on other sites

tebe-

 

I decided to test this on a real g2f image, and I used your "Gollum" image.  The file Gollum_C64_Rensoup_Tebe.scr which is 1200 bytes, compresses down to 253 bytes using this scheme.  I actually encoded it by hand (lots of work) because I don't yet have an encoder, and I'm not sure at all that I really have the skills/know how to write one.  It would have to be a Windows program, or at least commandline C.

But, 1-(253/1200) is 0.789166666 which is 79% compression!!  I'd say that's pretty good.

 

I could REALLY use help writing an encoder for this, from anyone who wants to volunteer.  I don't think I can do it, I'm mostly a 6502 guy, and can't do much with Windows-based programming.

Share this post


Link to post
Share on other sites

It's not just simple RLE, it encodes sequences like 06,07,08,09,0A,0B,0C,0D etc.  That sequence would be encoded like:

 

.byte $88,$06

 

And those n+1 increasing sequences are so common in the .SCR files, that's why I wrote this thing.  Another example:

 

.byte 06,FF,86,DD,42,BA,AE,02,FF decodes to: FF,FF,FF,FF,FF,FF,DD,DE,DF,E0,E1,E2,,BA,AE,FF,FF

 

It's really efficient at n+1 increasing sequences.

Edited by glurk
fixed bad mistake!
  • Like 1

Share this post


Link to post
Share on other sites

I'm working on writing an encoder for this, it's going very slowly.  Here is the current output when run on the "Gollum_C64_Rensoup_Tebe.scr" file:

 

File is 1200 bytes long.

Repeated run of 81 bytes of value: 0

Increasing run of 31 bytes starting with: 1

Increasing run of 3 bytes starting with: 160

Increasing run of 2 bytes starting with: 35

Repeated run of 3 bytes of value: 0

Increasing run of 4 bytes starting with: 37

 

Those values are all decimal - and correct.  It then gets to the $0D byte, which I haven't handled yet.  I'm trying to write this encoder for Windows, and the language I'm using is QB64 (BASIC) and I've never really used it before. 

 

rle.jpg

Share this post


Link to post
Share on other sites
1 hour ago, glurk said:

It's not just simple RLE, it encodes sequences like 06,07,08,09,0A,0B,0C,0D etc.  That sequence would be encoded like:

 

.byte $88,$06

ahhh, now I understand your idea, clever :)

Edited by tebe

Share this post


Link to post
Share on other sites

Perhaps an easier way to implement this is to see it as a form of delta compression. First create the deltas, then use plain RLE on the deltas. Increasing runs result in runs of 1. Decreasing runs result in runs of $ff. Normal runs turn into delta, 0, 0, 0  etc...

 

 

Edited by ivop
  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...