Dealer Demo, part 4: Some Forth at last

Entry posted by Atari_Ace September 1, 2018

564 views

So we spent the last two posts working on tooling to reverse engineer the Dealer Demo, and got the bootloader disassembled for study. Now we can start disassembling the Forth kernel. Let's start with "-dis d00 8":

0D00: EA        ORIG    NOP
0D01: 4C 8D 1C          JMP $1C8D
0D04: EA                NOP
0D05: 4C A1 1C          JMP $1CA1

Looking at the fig-Forth listing, this looks like the start of the kernel. 1C8D will be COLD+2, the cold start routine, and 1CA1 will be WARM, the warm start routine.

The next 13 words should be data words according to the fig-Forth listing, configuration parameters used where initializing the kernel:

0D08: 00 00             .WORD 0
0D0A: 01 00             .WORD 1
0D0C: 00 00             .WORD 0
0D0E: 7F 00             .WORD $7F
0D10: 80 04             .WORD $0480
0D12: EC 00             .WORD $EC
0D14: FF 01             .WORD $01FF
0D16: 00 01             .WORD $0100
0D18: 1F 00             .WORD $1F
0D1A: 00 00             .WORD 0
0D1C: FB 86             .WORD $86FB
0D1E: FB 86             .WORD $86FB
0D20: 01 1C             .WORD $1C01

Of these, we can identify $EC as the TOS (top of stack), $86FB is the pointer to the top of memory (TOP), $1C01 should be the initial vocabulary pointer (VL0).

The next bit of data should be the first definition, the LIT word. In fig-Forth, all definitions start with a byte declaring the length of the word in the bottom 5 bits, then the word itself with the high bit set on the last letter, although in rare cases there will be an extra byte to avoid having the definition fall on a $..FF boundary (why will become clear later). This is called the name field, and its address is referenced as the NFA. There are then two words, the first is the link field which points to the last definition (and is called the LFA). The next word is the parameter field (at the PFA), which points to the code to run for this word (this is the value that must not be at $..FF). For primitive words, it is usually just .WORD *+2, meaning that the code immediately follows the definition.

Knowing this we could slowly disassemble the code definition by definition by hand, but we want our tools to do as much of this as possible for use, so let's implement a definition decompiler.

The first thing we need is a get_string method from a buffer. It should output something that can be appended to a a .BYTE directive. In most cases, surrounding the values with single quotes should be adequate, but recall that the last byte in a definition string will have it's high bit set and thus be unprintable, so let's handle that case in the most general fashion possible so we can use the code for other purposes later:

sub get_string {
  my ($buff) = @_;
  my ($string, $inquote) = ('', 0);
  foreach my $val (unpack "C*", $buff) {
    my $printable = $val >= 0x20 && $val <= 0x7e;
    ($inquote = 1, $string .= "'") if ($printable && !$inquote);
    ($inquote = 0, $string .= "',") if (!$printable && $inquote);
    $string .= sprintf "%c", $val if $printable && $val != 0x27;
    $string .= sprintf "''", $val if $val == 0x27;
    $string .= sprintf "\$%02X", $val if $val >= 10 && !$printable;
    $string .= $val if $val < 10;
    $string .= ',' if !$printable;
  }
  $string .= "'" if $inquote;
  $string =~ s/,$//;  $string;
}

OK, now let's define def_buf, the routine that will parse a definition:

sub def_buf {
  my ($buff, $addr, $size) = @_;
  my $first = unpack "C", substr($buff, 0, 1);
  my $nfaSize = $first & 0x3f;
  if (($first & 0x80) == 0 || $nfaSize == 0 || $nfaSize > 0x1f) {
    forth_buf($buff, $addr, $size);
    return;
  }
  $nfaSize += 1 if (($addr + $nfaSize + 3) & 0xff) == 0xff; # handle CFA = ..ff
  my $string = get_string(substr($buff, 1, $nfaSize));
  my $count = $nfaSize + 1;
  $string = sprintf "%s.BYTE \$%02X,%s", get_label($addr), $first, $string;
  multi_buf($buff, $addr, $count, $string);
  forth_buf(substr($buff, $count), $addr + $count, $size - $count);
}

sub def {
  my ($buff, $addr, $size) = read_img(@_);
  def_buf($buff, $addr, $size);
}

This looks for the length byte first, if the value found doesn't look like a length byte, we go to our general parser called forth_buf (which mostly just dumps .WORDs of data, more on that shortly). It then adjusts the size if we're at the $..FF boundary, gets the string, passes it to multi_buf to output, and then resumes in the general forth_buf parser.

So what does forth_buf look like:

sub print_word {
  my ($buff, $addr, $i) = @_;
  my ($b1, $b2) = unpack "C*", substr($buff, $i, 2);
  my $val = $b1 | ($b2 << ;
  my $sval = sprintf "\$%04X", $val;
  $sval = sprintf "\$%02X", $val if $val < 256;
  $sval = $names->{$val} if exists $names->{$val};
  $sval = sprintf "\$%01X", $val if $val < 16;
  $sval = $val if $val < 10;
  $sval = '*+2' if $addr + $i + 2 == $val;
  my $string = sprintf "%s.WORD %s\n", get_label($addr + $i), $sval;
  print sb($addr + $i, $b1, $b2), $string;  $val;
}

sub forth_buf {
  my ($buff, $addr, $size) = @_;
  my ($val, $val1) = (-1, -1);
  for (my $i = 0; $i + 2 < $size; ) {
    $val1 = $val;
    $val = print_word($buff, $addr, $i);
    $i += 2;
    if ($addr + $i == $val) {
      disasm_buf(substr($buff, $i), $addr + $i, $size - $i);
      last;
    }
  }
}

sub forth {
  my ($buff, $addr, $size) = read_img(@_);
  forth_buf($buff, $addr, $size);
}

It calls print_word to print the word sensibly. When the word is *+2, we switch to disasm_buf, otherwise we call print_word again on the next word.

Now to hook it up into main:

  elsif ($opt =~ /^\-def/) {
    def(@_);
  }
  else {
    forth(@_);
  }

So now let's run "-def d22"

0D22: 83 4C 49          .BYTE $83,'LI',$D4
0D25: D4
0D26: 00 00             .WORD 0
0D28: 2A 0D             .WORD *+2
0D2A: B1 F8             LDA ($F8),Y
0D2C: 48                PHA
0D2D: E6 F8             INC $F8
...

This exactly matches up to the fig-Forth listing, but with some missing constants. It looks like $F8 is the interpreter pointer (IP), so we need to add the following to the listing:

     =00EC      TOS = $EC
     =00F0      N = $F0
     =00F8      IP = N+8
     =00FB      W = IP+3
     =00FD      UP = W+2
     =00FF      XSAVE = UP+2

We can now start disassembling and inserting the labels as we go along from the fig-Forth listing to get a more readable listing.

Continuing disassembling we see the PUSH, PUT, NEXT, CLIT and SETUP routines. The first difference between Dealer Demo and the published fig-Forth can now be seen. The NEXT routine doesn't have a JSR TRACE call and all the associated tracing code has been omitted. This is to be expected, since that code is there for the initial debugging of the kernel during bootstrapping, there's no need to keep it once the Forth is up an running.

There is another small difference, the single byte literal command CLIT is implemented, but the name and link for it have been dropped so you can't use it. Nonetheless we'll see it used in the code later, so how it got removed is a bit of a mystery.

NEXT is the most important routine in all of Forth, so it's helpful to study it for a moment.

0D42: A0 01     NEXT    LDY #1
0D44: B1 F8             LDA (IP),Y
0D46: 85 FC             STA W+1
0D48: 88                DEY
0D49: B1 F8             LDA (IP),Y
0D4B: 85 FB             STA W
0D4D: 18                CLC
0D4E: A5 F8             LDA IP
0D50: 69 02             ADC #2
0D52: 85 F8             STA IP
0D54: 90 02             BCC L54
0D56: E6 F9             INC IP+1
0D58: 4C FA 00  L54     JMP W-1

How does this work? First, we move the value pointed to by the interpreter pointer (IP) into the W register. Then we increment IP by 2 and then JMP to W-1. W-1 was setup with the indirect jump opcode at startup (6C), so this immediately jumps again to the value at the address we just picked up from the instruction pointer. So Forth basically uses this NEXT routine to automatically jump to addresses held in memory one after the other. In a way it executes code a bit like the 6502, except there are no opcodes, every operation is a 2-byte pointer to an address in memory, which itself points to the code we need to execute. This implementation style for Forth is known as indirect threaded code.

Let's work through an example. Suppose the IP is currently pointing to an address defined with .WORD LIT (i.e. the code for a literal value). LIT equals $0D28, and the value at that address is $0D2A. So when NEXT is called, it copies the value $0D28 into the W register, increments IP by 2, and then calls JMP W-1. The next instruction is therefore JMP ($0D28), which sets the program counter to $0D2A and continues running LDA (IP),Y, et cetera. The code for LIT itself consumes the next word and puts it on the data stack, and increments IP by 2, then falls through to NEXT to fetch the next word.

It's now hopefully obvious why no definition's parameter field can be on address $..FF. JMP (xxxx) on the 6502 has a bug where it will incorrectly fetch the destination when the address crosses the page boundary, and the NEXT routine relies on JMP (xxxx) to function. This isn't a problem though, we just need to detect when this is about to happen and move the definition by one byte, which is why the name string sometime has an extra byte.

Continuing our disassembly, we find the next few words:

-def  WORD     NFA   PFA    same as fig-Forth?
d74   EXECUTE  L75   EXEC   Yes
d8d   BRANCH   L89   BRAN   Yes
dab   0BRANCH  L107  ZBRAN  Yes
dcd   (LOOP)   L127  PLOOP  Yes
dfc   (+LOOP)  L154  PPLOO  Yes
e36   (DO)     L185  PDO    Yes

In each case we have to add at least a couple of labels to the list (one at the NFA, and then one two lines later at the PFA), more if there are branches in the code. In a few cases, we need to revert a symbol inserted to better match the fig-Forth listing, but largely the tool we have is doing much of the work once we recognize the start of a definition.

In all of these cases, the listing matches the implementation in the original fig-Forth listing.

As we've been disassembling, we have also been modifying set_name to cover more cases. In particular W-1, N-1, N+1, N+2, ..., N+7, NEXT+2, and BRAN+2 should be recognized, so we've add the following lines to set_name:

  $names->{$val+2} = "$label+2" if grep { $_ eq $label } qw/NEXT BRAN/;
  $names->{$val-1} = "$label-1" if grep { $_ eq $label } qw/N W/;
  if ($label eq 'N') {
    $names->{$val+$_} = "$label+$_" for (2..7);
  }