Jump to content
IGNORED

sp65 changed - a lot


karri

Recommended Posts

I'm not as good as those cheaters... but I've got 5454 bytes. 40 bytes to go. Some bytes added because I don't generate jagged edge on the right in one place, but the original image is generally compressed better (about 1 byte in few rows).

 

In the attachment the pallete has bad background color (brownish instead of blue, don't know why), but other colors are more or less correct.

SS.palSS.spr

Link to comment
Share on other sites

Great! I made a typo and my packed score is 5537. And shaped 5499. The original packing seems to leave some strides to eol ungenerated as they have cleared the screen to a colour from start. Your packing packs everything. And to my eye you could easily save the last bytes by choosing an egde colour 12 and leave out the right edges ending in colour 12.

 

You save 42 bytes by using "shaped". My guess is that we have reached the minimum size already.

 

Several of the original games from the gold age of the Lynx use a linked sprite at startup. A pixel zoomed to fill the whole screen in one colour followed by a shaped sprite with edge colour the same as the background pixel.

 

At least I now know how the 4-bit sprite should be packed. The next question is that is there a consistent algorithm that would work for all sprite depths? BPP4...BPP1?

 

Perhaps the question is pre-mature as we are only working with BPP4 right now.

Link to comment
Share on other sites

I've analyzed the differences using diagnostic information from SprTool and implementing the "shape" feature won't save to many bytes. 8 bytes maybe. The other thing... I hypothetize that they did not use any particular algorithm to achieve such compression. Maybe they had "some" algorithm and a tool to manually switch sections between RLE and literal compression and it might have been hand optimized. It might be simpler than writing algorithm with such "prediction" that would be needed to make choices that are embedded in this compression.

 

I think that my algorithm is bit-agnostic and will compress 1bpp images the same way, because it makes choices after counting how many bits is needed to store arbitrary sequence of pixels.

 

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, laoo said:

I think that my algorithm is bit-agnostic and will compress 1bpp images the same way, because it makes choices after counting how many bits is needed to store arbitrary sequence of pixels.

How long sequence of pixels do you have to analyse at a time for finding out the best compression?

Do you always optimize the entire line from the reference point to the edge of the scanline?

Obviously there is nothing being passed over from one scanline to the next as every scanline begins at a byte boundary thelling the bytes used by this scanline.
My algorithm was looking for the longest RRR. Then I compare
LL...LLLLLL vs LL...LLLRRR to see if it saves space. If the longest was space saving I continue by taking away the RRR part and repeat the analysis on LL...LLL until I end up with either a L that is the best run. Or R if run-length made sense.

In theory things like RRLRR vs LLLRR vs LLLLL could provide a bad decision as @42bs pointed out. I really don't know. Creating new algorithms is a bit difficult for me. Obviously you have better compression than me at this point.

Perhaps we need some kind of pattern matching rules? It should be simple to analyse the stride and create a pattern like R3L16L13R3L5R7L15R8L9R4L3R4L2R5L15R3L2R10L2R10L2R3L9R7. All repeating pixels are coded as Rxx. From this we could process the patterns to remove Rx elements if they are better expressed as Lx. The final pattern would then be used to do the actual packing.

'''
    The inpattern is like L10R4R3L3... till end of line
    bpp is 4,3,2,1
    outpattern is like inpattern but saves more space
'''
def bestline(inpattern, bpp):
    ''' Do magic '''
    return outpattern

 

Now it is coded ;) 
Feel free to add some code to make this work.

 

I could make a brute force variant that uses annealing to randomly remove one Rxx record at a time and saves the result if it is better. After a few hundred attempts we call it a day and use the best packing of the line that we found. Computing power is cheap ;) .

Link to comment
Share on other sites

@karri Actually my idea is trivial and it's a reification of your idea:

  

1 hour ago, karri said:

It should be simple to analyse the stride and create a pattern like R3L16L13R3L5R7L15R8L9R4L3R4L2R5L15R3L2R10L2R10L2R3L9R7. All repeating pixels are coded as Rxx. From this we could process the patterns to remove Rx elements if they are better expressed as Lx. The final pattern would then be used to do the actual packing.

 

Only implementation is cumbersome.

 

Phase 1: I'm generating sequence of blocks of unlimited length that are either literal or rle. Run of repeated (at least two) bytes is a rle block, run of bytes where no consecutive bytes are repeated is literal.

Phase 2: Repeat until source sequence is empty: take a number of consecutive (from left to right) blocks that will span at least 16 pixels. If there are n rle blocks try 2^n combinations of treating such block as rle or literal (in 16 pixels there are at most 8 such blocks that gives 256 combinations). It could lead to merging few consecutive literal blocks (created by making neighbor rle block literal in given combination) to one longer literal block. Compute size of encoding for each combination. Take the first to the left block of best encoding and append it to the result sequence.

 

I was trying different number of pixels but for Super Skweek the best result was for 16. It's greedy algorithm and it can't produce an encoding where locally worse pick could lead to better options later on. The places where Super Skweek image was compressed better was such scenarios - worse pick to the left lead to better on further to the right. I thought that increasing scan size (to 32 pixels for example) could overcome this, but then the scoring preferred different encodings that was even worse. Maybe running the algorithm forward and backward and then stitching the results could be fruitful? I don't know.

 

Edited by laoo
  • Like 1
Link to comment
Share on other sites

Ok. Time to fire up my annealing code. The problem is likely to be a discrete optimisation with multiple minimums. As a fitness function the number of bits used for the line is ok. The optimisation vector could be the vector of runlengths to change to literals. Like [1,0,0,0,0] change the 1st run length to literal [1,1,0,0,0] change the 1st and 2nd run length to literal. At most the optimisation vector is 254 elements long if we have the maximum sprite width of 508 pixels.

 

I assume that this would make sense at the end. Hmm. perhaps I have to do some reduction as going through all 2.894802231×10⁷⁶ alternatives may be abit time consuming.

 

Perhaps we could limit the focus to the next 4 run lengths? Like get the best solution for the next 4 run lenths. For 4 run lengths you need to go through only 16 paths. Then accept the leftmost chunk (literal or packed) and proceed. Here we could calculate all options so no need for annealing.

Link to comment
Share on other sites

I am still testing it for bugs.

 

The idea is to compress a scanline at a time.

 

First I find the best solution in doing all permutations on the next 4 runlength packets. It is a moving window trial that solves the first chunk of the remaining line.

 

After that I try the same algorithm with 5 runlength packets. If the result is better I go with 6 runlength packets.

 

So the total run time is not predictable. It may run forever.

 

But it has already found surprising solutions in SuperSqueek.

 

Once I have the bugs ironed out I tell more.

Link to comment
Share on other sites

Well, I still don't know how many bytes this takes exactly as the simulation retuns bits instead of proper bytes per line.

 

The number I have (without stripping away stuff from edges) is 5485 bytes. Some elements got squeezed more than in SuperSkweek. But obviously I need to code this into sp65 to get real data.

 

What bothers me is the huge amount of simultaneous data you need to optimize. My original idea that 4 runlength chunks (with some interleaved literal chunks) would be enough to find the global minimum was completely wrong.

 

The best compressions required to simultaneously manipulate 9 runlenght chunks to find the best solution. After that 10, 11... did not turn up anything better.

 

A modern i7 laptop solved the compression in 0.357 seconds.

 

I will still spend some time to manually compare my findings with the original compression to see if I have missed something.

 

Edit: I continued with more time consuming optimisations. Currently trying 40 runlength chunks simultanously and the space is still getting smaller. This is crazy.

 

Obviously it is not possible to create a simple rule for the optimum compression. Also trying to do this by hand is a complete waste of time.

Link to comment
Share on other sites

I was working on pixels instead od runlenghts and spotted that some possible optimizations are not taken into account at 16 pixels (I've found one such spot in SuperSkweek) but when I've widened the work window (to 20 or 32 pixels) the algorithm had a tendention to prefer literal chunks where using rle was giving better result. This is indeed crazy and I'm not sure why it was so and how to overcome this.

  • Like 1
Link to comment
Share on other sites

A small update/comments from my side:

- The two pass approach is a nice idea.

- My current state (single pass) is 5134 for Karri's BG sprite and 5492 for SuperSqueek

 

So I am 21bytes off compared to the two pass version and I wonder what pattern I did not pay attention for.

 

Current algo is like this:

- Any sequence of 3 or more identical pixels is packed

- Repeating sequences of 2 identical pixels are stored literal if more then 5 times (6*P2 = 54 , L12 = 53)

- Pixels before a 3 or sequence are stored literal (including repeating sequences of 2)

 

 

Edited by 42bs
  • Thanks 1
Link to comment
Share on other sites

@42bs your algorithm sounds really effective.

 

One thing I have been wondering if it is sometimes beneficial to split a part of a long run-length to a literal and a 16 bit runlength. Like:

 

L8P17 -> L9P16 not L8P16L1

 

My algorithm does not take this as a possibility.

 

I also found out that I keep getting wrong results as the algorithm keep making optimisations far away from the start of the line in a counter productive way.

 

Fingers crossed that you nail this.

Link to comment
Share on other sites

1 hour ago, karri said:

@42bs your algorithm sounds really effective.

 

One thing I have been wondering if it is sometimes beneficial to split a part of a long run-length to a literal and a 16 bit runlength. Like:

 

L8P17 -> L9P16 not L8P16L1

Good point, need to check my algo for this use case.

Link to comment
Share on other sites

  • 2 weeks later...
50 minutes ago, laoo said:

@karri @42bs Any progress in the competition? :)

 

Yes and no.

I still have a small bug some where (thanks to Karri´s image I could see it).

Your idea with a second pass seems now the best. I think I could also pin-point all possible optimizations (packed => literal).

But the multiquadrant packing is not working yet and I only support 4 bit sprites for optimization yet.

 

As soon as the last bug is fixed I will put it on github and will describe my approach here.

 

Link to comment
Share on other sites

@42bs My idea can be implemented in one pass because it just checks by brute force every possible encoding in moving window of 16 pixels. It was just easier to implement it in two passes. But I already know it's not optimal as SuperSkweek has places that can't be produced by this approach. So I'm hoping for some groundbreaking results :)

 

Link to comment
Share on other sites

13 minutes ago, laoo said:

@42bs My idea can be implemented in one pass because it just checks by brute force every possible encoding in moving window of 16 pixels. It was just easier to implement it in two passes. But I already know it's not optimal as SuperSkweek has places that can't be produced by this approach. So I'm hoping for some groundbreaking results :)

 

I tried it with brute force method, but then I was thinging of going back and forth in the line but skipped going backwards again. But the multi-pass stayed so I can now add an option "-O0" to disable this stage if it might generate false data.

Also, the code is easier to read (IMHO).

 

Link to comment
Share on other sites

3 minutes ago, 42bs said:

Also, the code is easier to read (IMHO).

Yeah, main reason for multipass. The fundamental theorem of software engineering (FTSE) states that any problem can be solved by introducing an extra level of indirection :)

  • Haha 1
Link to comment
Share on other sites

15 minutes ago, laoo said:

Yeah, main reason for multipass. The fundamental theorem of software engineering (FTSE) states that any problem can be solved by introducing an extra level of indirection :)

*(*(*sp))++;

 

  • Haha 1
Link to comment
Share on other sites

Ok, published it on github. Multi quadrant is working and Karri's idea with the edge pen is also implemented.

 

https://github.com/42Bastian/sprpck

 

Only cygwin exe so far. Linux users may just run make to compile it.

 

Currently optimizing is only tested for 4 bits/pixel, but it can be disabled by -O0.

 

The BMP fix is not implemented yes, but 4bit/1plane PCX are supported now.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...