Jump to content

Photo

TI-99/4a disk-based CRPG


232 replies to this topic

#226 PeteE OFFLINE  

PeteE

    Star Raider

  • 99 posts
  • Location:Beaverton, OR

Posted Thu Aug 17, 2017 10:29 AM

Which LZW algorithm do you use?  I'm looking into using LZW and Huffman coding to compress pattern tables.  My goal is something very simple to decompress since I want the decompression code to take up as little space as possible.  I've made a sprite compression scheme that takes advantage of horizontal/vertical symmetry and reflection/rotation, but it won't work as well on pattern tables.



#227 adamantyr OFFLINE  

adamantyr

    Stargunner

  • Topic Starter
  • 1,189 posts

Posted Thu Aug 17, 2017 11:33 AM

I use a custom one. I essentially created a library of common english letter combinations that are used by the bottom 31 characters. Characters 32-127 are stored as normal. 128-255 are used to denote a leading space character for the other characters. This gets me around 18-20% compression.

 

I tested using some LZW creator applications, but they all get way too complex too fast, attempting to encode the ENTIRE text block and use an entire word (16 to 32 bit) for encoding. Plus almost all of them strip out punctuation and are case-insensitive, which I NEED both of.

 

My only thought is if I could use the 128-255 range for more keyword replacement, I may get slightly better results, but I would need a means to analyze a large text block and cleanly identify the most optimal matches. I've decided for now to forge forward with my current system since it's better than nothing; I can always investigate better compression later.

 

Here's an assembly snip of my encoding/decoding algorithm:

Spoiler


#228 adamantyr OFFLINE  

adamantyr

    Stargunner

  • Topic Starter
  • 1,189 posts

Posted Fri Aug 25, 2017 1:08 PM

New blog entry up!

 

EDIT: Or not... Yahoo is being terrible with Wordpress lately, and it appears Wordpress is refusing to publish new posts (even though it SAYS it's published). The direct link is:

 

http://adamantyr.com...ng-it-together/



#229 Vorticon OFFLINE  

Vorticon

    River Patroller

  • 2,811 posts
  • Location:Eagan, MN, USA

Posted Sat Aug 26, 2017 9:28 AM

Looking really good  :thumbsup: It's great to see that you seem to have little difficulty coming back to complex code and picking up where you left off even if it's been a while. I on the other hand really suck at this, and I have to literally relearn the entire code before picking it up again... This is particularly true for ALC.



#230 adamantyr OFFLINE  

adamantyr

    Stargunner

  • Topic Starter
  • 1,189 posts

Posted Sat Aug 26, 2017 2:24 PM

Looking really good  :thumbsup: It's great to see that you seem to have little difficulty coming back to complex code and picking up where you left off even if it's been a while. I on the other hand really suck at this, and I have to literally relearn the entire code before picking it up again... This is particularly true for ALC.

 

It helps that I comment my code pretty heavily, and the original structure has become pretty ingrained in my head...

 

That said, turning into a module-based structure helps because I do have to go over ALL the code. I haven't touched the travel portion of the code in years so I'm looking forward to going in and optimizing it.



#231 adamantyr OFFLINE  

adamantyr

    Stargunner

  • Topic Starter
  • 1,189 posts

Posted Thu Aug 31, 2017 11:25 AM

Did some analysis of my text encoder... it's actually pretty good, all things considered, with room for improvement!

 

I copied all my text for the first disk to a large text file and did some case-sensitive searches for my suffix replacement, and the results are interesting... The "value" listed is the length of the suffix multiplied by the number of occurrences.

 

Also note that the longer suffixes are done first, so "the" supercedes "he", which then loses count. This has been accounted for in the numbers.

Suffix	Count	Value
the	536	1608
in	635	1270
er	601	1202
you	324	972
and	319	957
es	450	900
re	439	878
ea	395	790
to	379	758
on	374	748
is	367	734
it	356	712
of	304	608
or	282	564
an	274	548
at	256	512
as	246	492
he	233	466
for	136	408
hat	134	402
ti	192	384
are	127	381
so	185	370
we	183	366
th	172	344
ent	104	312
can	76	228
ion	52	156
tio	39	117
has	39	117
by	36	72
tis	2	6

The last one, "tis", is definitely going to be replaced. :) I think what I'll do is replace the bottom five, requiring at least a value of 200.

 



#232 adamantyr OFFLINE  

adamantyr

    Stargunner

  • Topic Starter
  • 1,189 posts

Posted Sat Sep 2, 2017 8:07 PM

Okay, so I've written a new text encoder. It now uses 160 characters (0-31, 128-255) for dictionary key value replacement. I wrote up an encoder that takes in a dictionary list of values and produces a score sheet (how many of each was replaced) so I can actually see which ones work and which ones don't.

 

In order to determine the appropriate n-grams, I wrote up a utility to calculate ALL the n-grams in the given block of text that comprises the first disk. I think the word usage is general enough it should be re-usable for all the disks.

 

I also tested out using a bit-wise approach, with the most "common" letters being stored in 4-bit increments with a length nybble, while non-common characters are stored as a full 8-bit value. In practice, though, it didn't produce any better results.



#233 adamantyr OFFLINE  

adamantyr

    Stargunner

  • Topic Starter
  • 1,189 posts

Posted Mon Nov 13, 2017 10:58 PM

Just a heads up, you can see all my latest updates on this project on my blog!

 

http://www.adamantyr.com/blog

 

Solved my earlier problem with Yahoo by moving to AWS as well. :D






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users