Jump to content
  • entries
    45
  • comments
    10
  • views
    10,348

Dealer Demo part 2, Let's make a disassembler!


Atari_Ace

716 views

So to decompile the Dealer Demo, we need to start by peeking at the boot sectors to see how it starts.

000010: SECTOR: 001: FILE:
     0  ff 01 80 04 c0 e4 a9 f4-d0 06 00 0d 01 00 80 00   ................
    10  85 f0 a9 52 8d 02 03 ad-8c 04 8d 0a 03 ad 8d 04   ...R............

The first six bytes of the boot sector describes the boot sequence. The first byte is a flag byte (0xff) and the second byte is the number of sectors to load (DBSECT), in this case one. The third and fourth byte are where to start copying the sectors to on boot ($0480). The fifth and sixth bytes is the boot address (BOOTAD), in this case ($e4c0) a pointer to an RTS instruction in 800A/B O.S. Execution resumes after the boot address at the seventh byte.

So the Dealer Demo boot is pretty spartan. It loads a single sector and then that sector has to load the rest of the program. So we need to disassemble the rest of sector 1.

When I first disassembled this, I used Altirra's debugger to do this (after booting the disk, break into the debugger and type "u 480"), but that had a few drawbacks. By default, Altirra will disassemble both valid and "invalid" opcodes, so when the disassembly is going through a mixture of data and code, it will generate a lot of weird disassembly that is really just data. I'd rather when it hit invalid opcodes to just emit the data as .BYTE statements, since the invalid opcodes are rarely used. Another drawback is that it is disassembling what's in memory, and some of the memory by the time we get around to disassembling it has changed. So I decided it was time to move away from Altirra and write my own disassembler.

Writing a disassembler is pretty easy. You need an array containing a definition of each of the 256 possible opcodes. Most of them will just map to .BYTE <value>, but the valid ones will have a name and a mode. The mode determines how much additional data to read (0, 1 or 2 bytes) and how to format the line. Move the pointer forward that many bytes and repeat the process. In other words, something like so:

sub disasm_buf {
  my ($buff, $addr, $size) = @_;
  my $opcodes = opcodes();
  my $i = 0;  for (my $i = 0; $i + 1 < $size;) {
    my $code = unpack "C", substr($buff, $i, 1);
    my $def = $opcodes->[$code];
    my $count = print_op($buff, $addr, $i, $def);
    $i += $count;
  }
}

Here all the heavy lifting will be done by print_op, which relies on the definitions from the opcodes() function to know what to print.

What does opcodes look like?

sub opcodes {
  my $opcodes = [];
  $opcodes->[$_] = { name => (sprintf ".BYTE \$%02X", $_), mode => 'ILL' } for (0..255);
  my $modes = {
    'IMP' => [
      '00BRK', '40RTI', '60RTS',
      '08PHP', '18CLC', '28PLP', '38SEC', '48PHA', '58CLI', '68PLA', '78SEI',
      '88DEY', '98TYA', 'a8TAY', 'b8CLV', 'c8INY', 'd8CLD', 'e8INX', 'f8SED',
      '8aTXA', '9aTXS', 'aaTAX', 'baTSX', 'caDEX', 'eaNOP' ],
    'ZP' => [
      '24BIT', '84STY', 'a4LDY', 'c4CPY', 'e4CPX',
      '05ORA', '25AND', '45EOR', '65ADC', '85STA', 'a5LDA', 'c5CMP', 'e5SBC',
      '06ASL', '26ROL', '46LSR', '66ROR', '86STX', 'a6LDX', 'c6DEC', 'e6INC' ],
    'ZP,X' => [
      '94STY', 'b4LDY',
      '15ORA', '35AND', '55EOR', '75ADC', '95STA', 'b5LDA', 'd5CMP', 'f5SBC',
      '16ASL', '36ROL', '56LSR', '76ROR', 'd6DEC', 'f6INC' ],
    'IMM' => [
      'a0LDY', 'c0CPY', 'e0CPX', 'a2LDX',
      '09ORA', '29AND', '49EOR', '69ADC',
      'a9LDA', 'c9CMP', 'e9SBC' ],
    'REL' => [
      '10BPL', '30BMI', '50BVC', '70BVS', '90BCC', 'b0BCS', 'd0BNE', 'f0BEQ' ],
    'ABS' => [
      '20JSR', '2cBIT', '4cJMP', '8cSTY', 'acLDY', 'ccCPY', 'ecCPX',
      '0dORA', '2dAND', '4dEOR', '6dADC', '8dSTA', 'adLDA', 'cdCMP', 'edSBC',
      '0eASL', '2eROL', '4eLSR', '6eROR', '8eSTX', 'aeLDX', 'ceDEC', 'eeINC' ],
    'ABS,X' => [
      'bcLDY', '1dORA', '3dAND', '5dEOR', '7dADC',
      '9dSTA', 'bdLDA', 'ddCMP', 'fdSBC', '1eASL', '3eROL', '5eLSR', '7eROR',
      'deDEC', 'feINC' ],
    'ABS,Y' => [
      '19ORA', '39AND', '59EOR', '79ADC', '99STA', 'b9LDA', 'd9CMP', 'f9SBC', 'beLDX' ],
    ')Y' => [
      '11ORA', '31AND', '51EOR', '71ADC', '91STA', 'b1LDA', 'd1CMP', 'f1SBC' ],
    'X)' => [
      '01ORA', '21AND', '41EOR', '61ADC', '81STA', 'a1LDA', 'c1CMP', 'e1SBC' ],
    'ACC' => [ '0aASL', '2aROL', '4aLSR', '6aROR' ],
    'IND' => [ '6cJMP' ],
  };
  foreach my $mode (keys %$modes) {
    $opcodes->[hex substr($_, 0, 2)] = { name => substr($_, 2), mode => $mode } for (@{$modes->{$mode}});
  }
  $opcodes;
}

Although it looks formidable, it really isn't. The function creates an array of 256 definitions, with a name of ".BYTE $xx" and a mode of 'ILL', for illegal. It then takes data from the $modes hash table to fill in the valid opcodes. For example in the key 'ACC' there is a value '0aASL'. So that expands to $opcodes->[0x0a] = { name => 'ASL', mode => 'ACC' }, or opcode 0x0a is "ASL" in accumulator mode (i.e. ASL A). In other words, I've represented the valid opcode information in a compact form, and used a little code to expand it into a more useful representation.

When I first did this of course I missed a few opcodes, and reversed SEI and SED (which went undetected for a long time because they are rare opcodes) but I've used this routine for several months now and am fairly certain it's accurate now.

print_op is a bit more complicated, but not too horrible. To implement it, let's first write a helper function.

sub sb {
  if (scalar(@_) == 2) {
    sprintf "%04X: %02X        ", @_;
  }
  elsif (scalar(@_) == 3) {
    sprintf "%04X: %02X %02X     ", @_;
  }
  elsif (scalar(@_) == 4) {
    sprintf "%04X: %02X %02X %02X  ", @_;
  }
}

This routine is designed to show bytes (thus the name sb) in the first 16 characters of a line: sb(0x480, 0xff, 0x01, 0x80) for instance would show:

0480: FF 01 80 
with an appropriate amount of padding after the bytes.

Ok, we're now ready for print_op
our $names = {};
sub print_op {
  my ($buff, $addr, $i, $def) = @_;
  my $mode = $def->{mode};
  my $sval = '';
  my $count = 1;
  if ($mode eq 'IMM') {
    my $val = unpack "C", substr($buff, $i + 1, 1);
    $sval = sprintf "\$%02X", $val;
    $sval = $val if $val < 10;
    $sval = " #$sval";
    $count = 2;
  }
  elsif ($mode eq 'ZP' || $mode eq 'ZP,X' || $mode eq 'ZP,Y' ||
         $mode eq 'IMM' || $mode eq ')Y' || $mode eq 'X)') {
    my $val = unpack "C", substr($buff, $i + 1, 1);
    $sval = $names->{$val} || sprintf "\$%02X", $val;
    $sval = $val if $val < 10;    $sval = " $sval" if $mode eq 'ZP';
    $sval = " $sval,X" if $mode eq 'ZP,X';
    $sval = " $sval,Y" if $mode eq 'ZP,Y';
    $sval = " ($sval,X)" if $mode eq 'X)';
    $sval = " ($sval),Y" if $mode eq ')Y';
    $count = 2;
  }
  elsif ($mode eq 'REL') {
    my ($b1) = unpack "c", substr($buff, $i + 1, 1);
    my $val = $addr + $i + 2 + $b1;
    $sval = $names->{$val} || sprintf "\$%04X", $val;
    $sval = " $sval";    $count = 2;
  }
  elsif ($mode eq 'ABS' || $mode eq 'ABS,Y' || $mode eq 'ABS,X' || $mode eq 'IND') {
    my $val = unpack "v", substr($buff, $i + 1, 2);
    $sval = $names->{$val} || sprintf "\$%04X", $val;
    $sval = " $sval" if $mode eq 'ABS';
    $sval = " $sval,X" if $mode eq 'ABS,X';
    $sval = " $sval,Y" if $mode eq 'ABS,Y';
    $sval = " ($sval)" if $mode eq 'IND';
    $count = 3;
  }
  elsif ($mode eq 'ACC') {
    $sval = ' A';
  }
  elsif ($mode eq 'IMP' || $mode eq 'ILL') {
  }
  else {
    die "$mode";
  }
  my $string = sprintf "%s%s%s\n", '        ', $def->{name}, $sval;
  print sb($addr + $i, unpack "C*", substr($buff, $i, $count)), $string;
}

So for each mode, we compute the value to append after the definition name. For instance, for 'ACC' we set $sval = ' A', and for 'REL' (a relative branch) we compute the destination and use that for $sval.

In a few spots there is reference to the $names hash. This is an empty hash currently, but eventually we will put well known symbols into it so that we don't always output numbers.

OK, we are almost there. We need a read_file routine of course. We also need a read_img routine that will help us translate addresses to offsets into the file. For now, we'll just hard code that the first sector is 0x480 to 0x4ff and change this in the future.

sub read_file {
  my ($file, $size, $pos) = @_;
  $size = -s $file if !defined $size || $size == 0;
  open my $fh, '<', $file or die "open: $file\n";
  binmode $fh;  seek $fh, $pos, 0 if defined $pos;
  my $buff;  read $fh, $buff, $size;  close $fh;
  return $buff;
}

sub read_img {
  my $buff = read_file('../../atr/dealerdemo.atr', 0x80, 0x10);
  ($buff, 0x480, length($buff));
}

The path in read_img is for my particular setup, you may need to change it. OK, now let's hook it all up.

sub disasm {
  my ($buff, $addr, $size) = read_img(@_);
  disasm_buf($buff, $addr, $size);
}

disasm(@ARGV);

And run it to get a listing:

0480: FF                .BYTE $FF
0481: 01 80             ORA ($80,X)
0483: 04                .BYTE $04
0484: C0 E4             CPY #$E4
0486: A9 F4             LDA #$F4
0488: D0 06             BNE $0490
048A: 00                BRK
048B: 0D 01 00          ORA $0001
048E: 80                .BYTE $80
048F: 00                BRK
0490: 85 F0             STA $F0
0492: A9 52             LDA #$52
0494: 8D 02 03          STA $0302
0497: AD 8C 04          LDA $048C
049A: 8D 0A 03          STA $030A
049D: AD 8D 04          LDA $048D
04A0: 8D 0B 03          STA $030B
04A3: A9 01             LDA #1
04A5: 8D 01 03          STA $0301
04A8: AD 8A 04          LDA $048A
04AB: 8D 04 03          STA $0304

Not perfect, but a great start for only ~160 lines of code. We need to start populating the $names array to make this a bit more readable, and improve read_img so we can target where the disassembly takes place, but we'll talk about how we're going to do that next time.
 

  • Like 1

3 Comments


Recommended Comments

Is your plan to disassemble the boot-loader, then translate that to Forth? I guess I lost the connection to the first post... :\

 

-dZ.

Link to comment

The goal is to produce a listing similar to the published Ragsdale 6502 fig-Forth published listing so that we can see the Forth 'bones' in the Dealer Demo, and then be able to study its implementation (in particular the customizations made to the Forth kernel). But I want to walk through all the steps I went through in the process of making a readable/valuable listing, since I believe the techniques are at least as important as the result. It should hopefully be clearer soon, this post was focused on getting the tooling started.

 

The next post will extend the tooling a bit to complete the bootloader and start disassembling the Forth kernel (the next 128 bytes or so). The post after that I'll extend the tooling a bit more to simplify and accelerate the process.

 

I actually started this project in 2016 disassembling parts of the In-Store Demonstration cartridge using just the Altirra debugger. I didn't make a lot of progress, but I recognized some bits of the design. In March of 2018 I started looking at APX Forth with the PDF of the original fig-Forth handy, but still just the Altirra debugger. Doing it by hand took a lot of time, but I persevered and produced a fairly complete listing of the APX Forth kernel. In May, I went back to looking at the cartridge and some other Forths as well. I wrote and refined tools to do more and more of the disassembly automatically, eventually reaching the point where I could typically generate a listing of kernel with only a couple of days work.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...