Jump to content

Photo

Disassembling assembly language


7 replies to this topic

#1 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,029 posts
  • Location:Uzbekistan (no, really!)

Posted Sat Dec 2, 2017 3:42 PM

What's the accepted way to disassemble an op-code into an instruction mnemonic?

I mean, given an op-code, how does one first identify which instruction format the op-code belongs to? There must an efficient sequence to go through but starting at the op-code formats and their op-code fields is not inspiring me.

Anyone?

#2 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,622 posts
  • Location:Germany

Posted Sat Dec 2, 2017 4:38 PM

In TIImageTool's disassembler, I keep the mnemonics with their base opcode in a table, together with a number specifying the format, and the number is an index in a table with masks. For each presented opcode, I run through the list of mnemonics, applying the associated mask to the opcode, and compare the result to the base opcode. If they match, I know the command and the format and can then continue with the operands.



#3 mizapf OFFLINE  

mizapf

    River Patroller

  • 2,622 posts
  • Location:Germany

Posted Sat Dec 2, 2017 4:50 PM

In MAME's emulation of the TMS 99xx, I create a kind of B-tree with four levels. Every node may have up to 16 children. The tree is traversed in up to four steps, according to the four hex digits. Commands may appear more than once in that tree.

 

Suppose that the machine instruction is 0460; in that case, we start with child 0 of the root node, then go to its child 4. In the next level, children 4, 5, 6, and 7 all point to the B microprogram. Since B is defined to have a 10 bit opcode, the search is terminated at that point, and control is transferred to the B microprogram.

 

I had to do this tree search because I could not afford a linear search through the list every time the emulated 99xx encounters an operation.

 

This is only possible because the opcode is a continuous bit string starting at the left; also, we know that every opcode is at least 4 bits long.



#4 Stuart OFFLINE  

Stuart

    Dragonstomper

  • 712 posts
  • Location:Southampton, UK

Posted Sat Dec 2, 2017 5:30 PM

If you want to do it manually, then look at section 24.9 of the E/A manual, which lists op-codes and instructions. Then looking at your code, for many instructions you can identify the op-code just by looking at the first 1, 2 or 3 hex digits, without having to get into instruction formats or fields. For example, if the op-code starts with a C then it's a MOV instruction. A 1D is SBO. 09 is SRL. 020 is LI. And so on.



#5 insomnia OFFLINE  

insomnia

    Star Raider

  • 78 posts
  • Location:Pittsburgh, PA

Posted Sat Dec 2, 2017 5:47 PM

There's  a working implementation in the tms9900 binutils package in binutils-2.19.1/opcodes/tms9900-dis.c. Look for the print_insn_tms9900 function. This function does basically the same thing mizapf describes.

 

Here's some pseudocode:

index = (opcode >> 12) & 0x0F 
switch(index)
{
  case 0:
    index = (opcode >> 8) & 0x0F
    switch(index)
    {
      case 0, 1, 12, 13, 14, 15:
        format[] = {"","","","","","","","","","","","","","","",}
        break

      case 2, 3:
        index = (opcode >> 4) & 0x1F
        format[] = {"li","", "ai","", "andi","", "ori","", "ci","", "stwp","", "stst","", "lwpi","", "limi","", "idle","", "rset","", "rtwp","", "ckon", "", "ckof","", "lrex", "","","",} 
        break

      case 4, 5, 6, 7:
        index = (opcode >> 6) & 0x0F
        format[] = {"blwp", "b", "x", "clr", "neg", "inv", "inc", "inct", "dec", "dect", "bl", "swpb", "seto", "abs", "", ""}
        break

      case 8, 9, 10, 11:
        format[] = {"", "", "", "", "", "", "", "", "sra", "srl", "sla", "slc"}
        break
    }
    break
    
  case 1:
    index = (opcode >> 8) & 0x0F
    format[] = {"jmp", "jlt", "jle", "jeq", "jhe", "jgt", "jne", "jnc", "jno", "jl", "jh", jop", "sbo", sbz", "tb"}
    break

  case 2, 3:
    index = (opcode >> 10) & 0x07
    format[] = {"coc", "czc", "xor", "xop", "ldcr", "stcr", "mpy", "div"}
    break
      
  default:
    format[] = {"", "", "", "", "szc", "szcb", "s", "sb", "c", "cb", "a", "ab", "mov", "movb", "soc", "socb"}
    break
}

if(format[index] != "")
  decode as format[index]
else
  invalid instruction

There's more to it than this, but this should give you somewhere to start.

 

Good Luck!



#6 ralphb OFFLINE  

ralphb

    Dragonstomper

  • 533 posts
  • Location:Germany

Posted Mon Dec 4, 2017 5:27 AM

It's not a straight lookup, unless you want a lookup table with 65536 entries.  Each instruction format has a particular opcode length associated, so you have to iterate over all formats and see if the word you have anded with the format's opcode length matches any opcode with that format.

 

To see an implementation, see xda99.py in the xdt99 suite, search for "def decode".



#7 matthew180 OFFLINE  

matthew180

    River Patroller

  • 2,412 posts
  • Location:Castaic, California

Posted Yesterday, 11:27 PM

9900 instruction decoding is pretty straight forward, especially when compared to the 8-bit CPUs of the day.  The 9900 is very orthogonal and uses 7 (depending on how you count) so-called "instruction formats":
 

         0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
        ---------------------------------------------------------------+
1 arith  1 |opcode | B |  Td   |       D       |  Ts   |       S       |
2 arith  0   1 |opc| B |  Td   |       D       |  Ts   |       S       |
3 math   0   0   1 | --opcode- |     D or C    |  Ts   |       S       |
4 jump   0   0   0   1 | ----opcode--- |     signed displacement       |
5 shift  0   0   0   0   1 | --opcode- |       C       |       W       |
6 pgm    0   0   0   0   0   1 | ----opcode--- |  Ts   |       S       |
7 ctrl   0   0   0   0   0   0   1 | ----opcode--- |     not used      |
7 ctrl   0   0   0   0   0   0   1 | opcode & immd | X |       W       |

To isolate the format you use a priority encoder.  Notice that in the first 7-bits there will only *ever* be one bit that is set.  After that the opcode field is 1..4 bits depending on the format, which means there are are no more than 16 instructions in the most complex format.  The other bits are used in various ways to identify what registers are being operated on, what kind of memory operations, etc.  Notice, for example, that the source (S) field is *always* in bits 12..15 and the source-mode (Ts) is always bits 10..11, in all the formats that use Ts and S.  Same for Td, D, and W.

 

If you approach this from a hardware perspective instead of a functional model (i.e. don't think like a programmer), it is really very easy to take apart the instructions and figure out what they are, and what they operating on.



#8 Willsy OFFLINE  

Willsy

    River Patroller

  • Topic Starter
  • 3,029 posts
  • Location:Uzbekistan (no, really!)

Posted Today, 4:58 AM

Thank you Matthew. That is the clearest explanation I've ever seen. I was wondering how the 9900 did it. That's how.




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users