Disassembling assembly language

Willsy · December 2, 2017

What's the accepted way to disassemble an op-code into an instruction mnemonic?

I mean, given an op-code, how does one first identify which instruction format the op-code belongs to? There must an efficient sequence to go through but starting at the op-code formats and their op-code fields is not inspiring me.

Anyone?

+mizapf · December 2, 2017

In TIImageTool's disassembler, I keep the mnemonics with their base opcode in a table, together with a number specifying the format, and the number is an index in a table with masks. For each presented opcode, I run through the list of mnemonics, applying the associated mask to the opcode, and compare the result to the base opcode. If they match, I know the command and the format and can then continue with the operands.

+mizapf · December 2, 2017

In MAME's emulation of the TMS 99xx, I create a kind of B-tree with four levels. Every node may have up to 16 children. The tree is traversed in up to four steps, according to the four hex digits. Commands may appear more than once in that tree.

Suppose that the machine instruction is 0460; in that case, we start with child 0 of the root node, then go to its child 4. In the next level, children 4, 5, 6, and 7 all point to the B microprogram. Since B is defined to have a 10 bit opcode, the search is terminated at that point, and control is transferred to the B microprogram.

I had to do this tree search because I could not afford a linear search through the list every time the emulated 99xx encounters an operation.

This is only possible because the opcode is a continuous bit string starting at the left; also, we know that every opcode is at least 4 bits long.

Stuart · December 2, 2017

If you want to do it manually, then look at section 24.9 of the E/A manual, which lists op-codes and instructions. Then looking at your code, for many instructions you can identify the op-code just by looking at the first 1, 2 or 3 hex digits, without having to get into instruction formats or fields. For example, if the op-code starts with a C then it's a MOV instruction. A 1D is SBO. 09 is SRL. 020 is LI. And so on.

insomnia · December 2, 2017

There's a working implementation in the tms9900 binutils package in binutils-2.19.1/opcodes/tms9900-dis.c. Look for the print_insn_tms9900 function. This function does basically the same thing mizapf describes.

Here's some pseudocode:

index = (opcode >> 12) & 0x0F 
switch(index)
{
  case 0:
    index = (opcode >>  & 0x0F
    switch(index)
    {
      case 0, 1, 12, 13, 14, 15:
        format[] = {"","","","","","","","","","","","","","","",}
        break

      case 2, 3:
        index = (opcode >> 4) & 0x1F
        format[] = {"li","", "ai","", "andi","", "ori","", "ci","", "stwp","", "stst","", "lwpi","", "limi","", "idle","", "rset","", "rtwp","", "ckon", "", "ckof","", "lrex", "","","",} 
        break

      case 4, 5, 6, 7:
        index = (opcode >> 6) & 0x0F
        format[] = {"blwp", "b", "x", "clr", "neg", "inv", "inc", "inct", "dec", "dect", "bl", "swpb", "seto", "abs", "", ""}
        break

      case 8, 9, 10, 11:
        format[] = {"", "", "", "", "", "", "", "", "sra", "srl", "sla", "slc"}
        break
    }
    break
    
  case 1:
    index = (opcode >>  & 0x0F
    format[] = {"jmp", "jlt", "jle", "jeq", "jhe", "jgt", "jne", "jnc", "jno", "jl", "jh", jop", "sbo", sbz", "tb"}
    break

  case 2, 3:
    index = (opcode >> 10) & 0x07
    format[] = {"coc", "czc", "xor", "xop", "ldcr", "stcr", "mpy", "div"}
    break
      
  default:
    format[] = {"", "", "", "", "szc", "szcb", "s", "sb", "c", "cb", "a", "ab", "mov", "movb", "soc", "socb"}
    break
}

if(format[index] != "")
  decode as format[index]
else
  invalid instruction

There's more to it than this, but this should give you somewhere to start.

Good Luck!

ralphb · December 4, 2017

It's not a straight lookup, unless you want a lookup table with 65536 entries. Each instruction format has a particular opcode length associated, so you have to iterate over all formats and see if the word you have anded with the format's opcode length matches any opcode with that format.

To see an implementation, see xda99.py in the xdt99 suite, search for "def decode".

matthew180 · December 13, 2017

9900 instruction decoding is pretty straight forward, especially when compared to the 8-bit CPUs of the day. The 9900 is very orthogonal and uses 7 (depending on how you count) so-called "instruction formats":

         0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
        ---------------------------------------------------------------+
1 arith  1 |opcode | B |  Td   |       D       |  Ts   |       S       |
2 arith  0   1 |opc| B |  Td   |       D       |  Ts   |       S       |
3 math   0   0   1 | --opcode- |     D or C    |  Ts   |       S       |
4 jump   0   0   0   1 | ----opcode--- |     signed displacement       |
5 shift  0   0   0   0   1 | --opcode- |       C       |       W       |
6 pgm    0   0   0   0   0   1 | ----opcode--- |  Ts   |       S       |
7 ctrl   0   0   0   0   0   0   1 | ----opcode--- |     not used      |
7 ctrl   0   0   0   0   0   0   1 | opcode & immd | X |       W       |

To isolate the format you use a priority encoder. Notice that in the first 7-bits there will only *ever* be one bit that is set. After that the opcode field is 1..4 bits depending on the format, which means there are are no more than 16 instructions in the most complex format. The other bits are used in various ways to identify what registers are being operated on, what kind of memory operations, etc. Notice, for example, that the source (S) field is *always* in bits 12..15 and the source-mode (Ts) is always bits 10..11, in all the formats that use Ts and S. Same for Td, D, and W.

If you approach this from a hardware perspective instead of a functional model (i.e. don't think like a programmer), it is really very easy to take apart the instructions and figure out what they are, and what they operating on.

Willsy · December 13, 2017

Thank you Matthew. That is the clearest explanation I've ever seen. I was wondering how the 9900 did it. That's how.

Sign In

Disassembling assembly language

Recommended Posts

Willsy

Link to comment

Share on other sites

+mizapf

Link to comment

Share on other sites

+mizapf

Link to comment

Share on other sites

Stuart

Link to comment

Share on other sites

insomnia

Link to comment

Share on other sites

ralphb

Link to comment

Share on other sites

matthew180

Link to comment

Share on other sites

Willsy

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More