Re: shellcode -> asm?

From: Stephen (sa7oriat_private)
Date: Tue Oct 08 2002 - 13:53:09 PDT

  • Next message: Markus Friedl: "Re: OpenSSH Vulns (new?) Priv seperation"

    Many people have proported to be able to go from the hex of the shellcode
    back to the actual human readable asm. Many people, dont seem to do it
    properly. So I started writing something on my own to do it, one of the
    biggest difficulties I had is (specifically on x86) basically demonstrated
    below.  Many assume that all that is needed to do is construct a big
    struct or array of the hex values of all the x86 commands, and simply step
    through the shellcode doing the translation back to the corresponding asm
    instruction. Using this method is REALLY unreliable, and is basically
    impossible because of the way x86 handles some instructions based on the
    operands etc.
    
    for example:
    
    0x80483b0 <main+20>:    mov    $0xb,%eax
    0x80483b5 <main+25>:    mov    %esi,%ebx
    
    two mov instructions that presumably have the same opcode right!?
    so if we x/bx main+20 and main+25 the same hex opcode should presumably
    be there. this isnt the case.
    
    (gdb) x/bx main+20
    0x80483b0 <main+20>:    0xb8
    (gdb) x/bx main+25
    0x80483b5 <main+25>:    0x89
    
    if you get the INtel x86 developers notes you can generally get a list
    of the hex opcodes for the instructions (24319101.pdf).
    We can see that MOV has many faces one of which is 0x89, but as
    demonstrated above, we cant rely on this as a general rule, so it is not
    as easy as it looks.
    
    Many "disassemblers" just construct large matrix of opcodes, their sizes
    and such, but this really isnt accurate. What I see that most people
    have done is to take the hex opcodes and then to convert them to binary
    and take the bits that correspond to the actual x86 command and OR them
    with the values of the operands of the operation (registers, etc) and then
    convert them back to hex and test if they match with values in the
    shellcode string. THis is VERY painstaking, and again considerably
    unreliable. I suggest perhaps perusing the source
    code of gdb to see how it does the OR and all its stuff (x86).
    opcodes/i386-dis.c is a good place to get started (in the gdb src tree).
    
    When it comes down to it, x86 is VERY nasty. good luck, I would try to
    start small, and just keep building upon the routines that do the
    coversion. Using the bitwise OR is just a good a method to start with as
    any. For most x86 shellcode building a really rough matrix of coversion
    values and doing ORs has worked in most GENERAL cases.
    
    
    On Tue, 8 Oct 2002, Sean Zadig wrote:
    
    > Hi,
    > I'm doing some research into creating variants of common attacks, but I ran
    > into a problem of sorts. For most of the attacks I have, the shellcode
    > consists of the overflow and the actual malicious code that is run. I want
    > to be able to isolate the overflow from the rest of the shellcode and use
    > that to create attack variants. Problem is, I don't know where one ends and
    > the other begins! I figure if I turn the hex-encoded shellcode back into
    > assembly code, I could probably figure it out. I'm familiar with how to do
    > the reverse in gdb, but is it possible to do what I want? To restate:
    > shellcode -> asm is what I need. If this is a simple thing, my apologies -
    > but the security-basics list rejected my post =)
    >    -Sean Zadig
    >
    > -----
    > Sean Zadig
    > Student, UC Davis
    > PGP Key ID: 0xDE44A79F
    > 7EE1 C80A A0C1 B224 45CE  F74B 5835 0115 DE44 A79F
    >
    >
    > _________________________________________________________________
    > Chat with friends online, try MSN Messenger: http://messenger.msn.com
    >
    



    This archive was generated by hypermail 2b30 : Tue Oct 08 2002 - 15:09:12 PDT