Beats me what is the use case of the PF family of algorithms, honestly. It's supposed to be a better Shrinkler, but it sometimes isn't. Marginal benefits, big buffers, sucky Windows-only packer, etc...
Things that are LZMA derivatives, such as Shrinkler and PF take their time to spin up the adaptive part of the encoding. Ironically, the more state you have, the longer it takes for it all to spin up. If you had a bigger testcase it would have done quite a bit better.
I actually kinda have an interest in packers, though in a very unusual context. I am unpacking ARMv7 firmware that runs at GHz speeds. Initially, however, I only have 64 kilobytes of SRAM available, before the megabytes of DDR come online. I am using compression to shorten the time to download the firmware from the SPI flash, while only relying on SRAM for decompression. I needed a very simple format, since I had to give it a streaming interface (unpacking that stops on any byte boundary), which meant making a complete depacker by myself. In this context there is not much more to explore other than Shrinkler, LZSA2 and now possibly ZX0. Here's what they do to my firmware. Shrinkler used here is a hack that has double parity, since ARMv7 code is structured around 32-bit machine words.
Size:
PF: 171691
Shrinkler: 172516
ZX0: 199685
LZSA2: 207889
Original: 406996
Time to compress:
LZSA2: under 1s
PF: 2s
Shrinkler: 1m and a few seconds
ZX0: 22+ minutes. >_<
The one obvious improvement that I am still expecting to be made in compression is a departure from universal coding for sizes. ZX0 uses one of the Elias encodings, but that kind of encoding doesn't have an upper bound. If you are willing to have a bounded top, you might be able to eke out a few more bits here and there. It won't improve Shrinkler though, the encoding used there works very well with the adaptive probabilities.
I'd really like to see Squishy let loose on arbitrary data, but right now it can only do executables: http://logicoma.io/squishy/