Jump to content
IGNORED

Disassembling assembly language


Willsy

Recommended Posts

What's the accepted way to disassemble an op-code into an instruction mnemonic?

 

I mean, given an op-code, how does one first identify which instruction format the op-code belongs to? There must an efficient sequence to go through but starting at the op-code formats and their op-code fields is not inspiring me.

 

Anyone?

Link to comment
Share on other sites

In TIImageTool's disassembler, I keep the mnemonics with their base opcode in a table, together with a number specifying the format, and the number is an index in a table with masks. For each presented opcode, I run through the list of mnemonics, applying the associated mask to the opcode, and compare the result to the base opcode. If they match, I know the command and the format and can then continue with the operands.

Link to comment
Share on other sites

In MAME's emulation of the TMS 99xx, I create a kind of B-tree with four levels. Every node may have up to 16 children. The tree is traversed in up to four steps, according to the four hex digits. Commands may appear more than once in that tree.

 

Suppose that the machine instruction is 0460; in that case, we start with child 0 of the root node, then go to its child 4. In the next level, children 4, 5, 6, and 7 all point to the B microprogram. Since B is defined to have a 10 bit opcode, the search is terminated at that point, and control is transferred to the B microprogram.

 

I had to do this tree search because I could not afford a linear search through the list every time the emulated 99xx encounters an operation.

 

This is only possible because the opcode is a continuous bit string starting at the left; also, we know that every opcode is at least 4 bits long.

Link to comment
Share on other sites

If you want to do it manually, then look at section 24.9 of the E/A manual, which lists op-codes and instructions. Then looking at your code, for many instructions you can identify the op-code just by looking at the first 1, 2 or 3 hex digits, without having to get into instruction formats or fields. For example, if the op-code starts with a C then it's a MOV instruction. A 1D is SBO. 09 is SRL. 020 is LI. And so on.

Link to comment
Share on other sites

There's a working implementation in the tms9900 binutils package in binutils-2.19.1/opcodes/tms9900-dis.c. Look for the print_insn_tms9900 function. This function does basically the same thing mizapf describes.

 

Here's some pseudocode:

index = (opcode >> 12) & 0x0F 
switch(index)
{
  case 0:
    index = (opcode >>  & 0x0F
    switch(index)
    {
      case 0, 1, 12, 13, 14, 15:
        format[] = {"","","","","","","","","","","","","","","",}
        break

      case 2, 3:
        index = (opcode >> 4) & 0x1F
        format[] = {"li","", "ai","", "andi","", "ori","", "ci","", "stwp","", "stst","", "lwpi","", "limi","", "idle","", "rset","", "rtwp","", "ckon", "", "ckof","", "lrex", "","","",} 
        break

      case 4, 5, 6, 7:
        index = (opcode >> 6) & 0x0F
        format[] = {"blwp", "b", "x", "clr", "neg", "inv", "inc", "inct", "dec", "dect", "bl", "swpb", "seto", "abs", "", ""}
        break

      case 8, 9, 10, 11:
        format[] = {"", "", "", "", "", "", "", "", "sra", "srl", "sla", "slc"}
        break
    }
    break
    
  case 1:
    index = (opcode >>  & 0x0F
    format[] = {"jmp", "jlt", "jle", "jeq", "jhe", "jgt", "jne", "jnc", "jno", "jl", "jh", jop", "sbo", sbz", "tb"}
    break

  case 2, 3:
    index = (opcode >> 10) & 0x07
    format[] = {"coc", "czc", "xor", "xop", "ldcr", "stcr", "mpy", "div"}
    break
      
  default:
    format[] = {"", "", "", "", "szc", "szcb", "s", "sb", "c", "cb", "a", "ab", "mov", "movb", "soc", "socb"}
    break
}

if(format[index] != "")
  decode as format[index]
else
  invalid instruction

There's more to it than this, but this should give you somewhere to start.

 

Good Luck!

  • Like 1
Link to comment
Share on other sites

It's not a straight lookup, unless you want a lookup table with 65536 entries. Each instruction format has a particular opcode length associated, so you have to iterate over all formats and see if the word you have anded with the format's opcode length matches any opcode with that format.

 

To see an implementation, see xda99.py in the xdt99 suite, search for "def decode".

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

9900 instruction decoding is pretty straight forward, especially when compared to the 8-bit CPUs of the day. The 9900 is very orthogonal and uses 7 (depending on how you count) so-called "instruction formats":

         0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
        ---------------------------------------------------------------+
1 arith  1 |opcode | B |  Td   |       D       |  Ts   |       S       |
2 arith  0   1 |opc| B |  Td   |       D       |  Ts   |       S       |
3 math   0   0   1 | --opcode- |     D or C    |  Ts   |       S       |
4 jump   0   0   0   1 | ----opcode--- |     signed displacement       |
5 shift  0   0   0   0   1 | --opcode- |       C       |       W       |
6 pgm    0   0   0   0   0   1 | ----opcode--- |  Ts   |       S       |
7 ctrl   0   0   0   0   0   0   1 | ----opcode--- |     not used      |
7 ctrl   0   0   0   0   0   0   1 | opcode & immd | X |       W       |

To isolate the format you use a priority encoder. Notice that in the first 7-bits there will only *ever* be one bit that is set. After that the opcode field is 1..4 bits depending on the format, which means there are are no more than 16 instructions in the most complex format. The other bits are used in various ways to identify what registers are being operated on, what kind of memory operations, etc. Notice, for example, that the source (S) field is *always* in bits 12..15 and the source-mode (Ts) is always bits 10..11, in all the formats that use Ts and S. Same for Td, D, and W.

 

If you approach this from a hardware perspective instead of a functional model (i.e. don't think like a programmer), it is really very easy to take apart the instructions and figure out what they are, and what they operating on.

  • Like 7
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...