# Game Text Encoding Problem

8 replies to this topic

### #1 DanBorisOFFLINE

DanBoris

Dragonstomper

• 976 posts
• Location:New Jersey, USA

Posted Sat Jul 17, 2010 10:22 AM

I need some help figuring out something, and thought some of the people here might spot whatever I am missing...

I'm playing around with decoding the format of the MSDOS Adventure Construction Set game files. Some of it has been pretty easy to figure out but when I came to the object names I found out that they were compressing this text, but I can't quite figure out the logic. Here is what I know:

- Each character can be one of 40 characters (26 letters, 10 digits, and 4 symbols)
- Every three characters is encoded into 2 bytes
- Here are some of the encodings I have seen, showing the character, hex values and binary values:

```1 = C0 A8	        11000000 10101000
A = 40 06		01000000 00000110
B = 80 0C               10000000 00001100
C = C0 12               11000000 00010010
D = 00 19		00000000 00011001
E = 40 1F		01000000 00011111
F = 80 25               10000000 00100101

A =   40 06		01000000 00000110
AA =  38 13             00111000 00010011
AAA = 69 06	        01101001 00000110

ABC = 93 06             10010011 00000110
```

Anyone have any ideas on this?

### #2 GroovyBeeOFFLINE

GroovyBee

Games Developer

• 8,244 posts
• Busy bee!
• Location:North, England

Posted Sat Jul 17, 2010 10:45 AM

Its some form of radix 40

e.g.

(X-'A')*1
+ (Y-'A')*40
+ (Z-'A')*1600

That'll fit into an unsigned 16 bit word.

### #3 SeaGtGruffOFFLINE

SeaGtGruff

• 5,493 posts
• Location:Georgia, USA

Posted Sat Jul 17, 2010 11:33 AM

Its some form of radix 40

e.g.

(X-'A')*1
+ (Y-'A')*40
+ (Z-'A')*1600

That'll fit into an unsigned 16 bit word.

Yes, I said the same thing, but I lost my internet connection while I was typing my reply, so it got lost when I tried to post it.

Notice the pattern with some of the values shown:

b = 00 00 (I'm guessing that 00 00 is a space) ?
A = 40 06 (add 40 06)
B = 80 0C (add 40 06)
C = C0 12 (add 40 06)
D = 00 19 (add 40 06, then add the carry flag)
E = 40 1F (add 40 06)
F = 80 25 (add 40 06)
G = C0 2B (add 40 06) ?
H = 00 32 (add 40 06, then add the carry flag) ?
etc.

Michael

Edit: Also, notice that it takes the same number of bytes to code 1, 2, or 3 characters, further pointing to a base-40 system.

The only other way I know to code 3 characters in 2 bytes is to split the bits, 5 bits per character, with 1 bit left over, but that gives only 32 characters.

Edited by SeaGtGruff, Sat Jul 17, 2010 11:35 AM.

### #4 SeaGtGruffOFFLINE

SeaGtGruff

• 5,493 posts
• Location:Georgia, USA

Posted Sat Jul 17, 2010 12:05 PM

Anyone have any ideas on this?

Adding to my previous comments, I suggest looking at the encoding systematically (b = SPACE):

bbb = ?? ??
bbA = ?? ??
bbB = ?? ??
bbC = ?? ??
etc.

That should give you the values for the 1s place (0 to 39).

The rest should be a matter of just multiplying by decimal 40 for the 10s place, or by decimal 1600 for the 100s place, but you could verify that systematically:

bAb = ?? ?? (should be the same as bbA times decimal 40)
bBb = ?? ?? (should be the same as bbB times decimal 40)
bCb = ?? ?? (should be the same as bbC times decimal 40)
etc.

Abb = 40 06 (should be the same as bAb divided by decimal 40, or bbA divided by decimal 1600)
Bbb = 80 0C
etc.

But you'd have to take the carry into consideration, since it appears that the carry might be getting added back to the lo byte?

Michael

### #5 SeaGtGruffOFFLINE

SeaGtGruff

• 5,493 posts
• Location:Georgia, USA

Posted Sat Jul 17, 2010 12:14 PM

Another thought:

I think the values shown are lo byte first:

Abb = hex 40 06 = \$0640 = decimal 1600 = 1*1600
Bbb = hex 80 0C = \$0C80 = decimal 3200 = 2*1600
Cbb = hex C0 12 = \$12C0 = decimal 4800 = 3*1600
Dbb = hex 00 19 = \$1900 = decimal 6400 = 4*1600
etc.

Michael

### #6 GroovyBeeOFFLINE

GroovyBee

Games Developer

• 8,244 posts
• Busy bee!
• Location:North, England

Posted Sat Jul 17, 2010 12:17 PM

I think the values shown are lo byte first:

Makes sense since the files come from an x86 based machine which is little endian.

### #7 SeaGtGruffOFFLINE

SeaGtGruff

• 5,493 posts
• Location:Georgia, USA

Posted Sat Jul 17, 2010 12:30 PM

This seems to work for some, but not all, of the examples you posted:

bbb = \$0000 = 0
bbA = \$0001 = 1
bbB = \$0002 = 2
bbC = \$0003 = 3

bAb = \$0028 = 1*40
bBb = \$0050 = 2*40
bCb = \$0078 = 3*40

Abb = \$0640 = 1*1600
Bbb = \$0C80 = 2*1600
Cbb = \$12C0 = 3*1600

AAb = \$0640+\$0028=\$0668 -- you gave \$1338, or 38 13
AAA = \$0640+\$0028+\$0001=\$0669
ABC = \$0640+\$0050+\$0003=\$0693

By my figuring, \$1338 should be CCb, not AAb.

Michael

Edited by SeaGtGruff, Sat Jul 17, 2010 12:32 PM.

### #8 SeaGtGruffOFFLINE

SeaGtGruff

• 5,493 posts
• Location:Georgia, USA

Posted Sat Jul 17, 2010 12:45 PM

Since 1 (presumably 1bb) is encoded as C0 A8, or \$A8C0, which is decimal 43200, which is 27*1600, I'm guessing the characters have the following values:

b = 0 (space)
A = 1
B = 2
C = 3
D = 4
E = 5
F = 6
G = 7
H = 8
I = 9
J = 10
K = 11
L = 12
M = 13
N = 14
O = 15
P = 16
Q = 17
R = 18
S = 19
T = 20
U = 21
V = 22
W = 23
X = 24
Y = 25
Z = 26
1 = 27
2 = 28
3 = 29
4 = 30
5 = 31
6 = 32
7 = 33
8 = 34
9 = 35
0 = 36
? = 37 (unknown symbol)
? = 38 (unknown symbol)
? = 39 (unknown symbol)

These are multipled by 40^0=1, 40^1=40, or 40^2=1600, depending on their position. In example ABC, A is in the 100s place, B is in the 10s place, and C is in the 1s place.

Michael

### #9 DanBorisOFFLINE

DanBoris

Dragonstomper

• Topic Starter
• 976 posts
• Location:New Jersey, USA

Posted Sat Jul 17, 2010 4:40 PM

You guys rock! Thanks!

Yes, "AAb = \$1338" was a mistake, \$0668 is the correct value.

Dan

Edited by DanBoris, Sat Jul 17, 2010 4:40 PM.

#### 0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users