Many people have proported to be able to go from the hex of the shellcode back to the actual human readable asm. Many people, dont seem to do it properly. So I started writing something on my own to do it, one of the biggest difficulties I had is (specifically on x86) basically demonstrated below. Many assume that all that is needed to do is construct a big struct or array of the hex values of all the x86 commands, and simply step through the shellcode doing the translation back to the corresponding asm instruction. Using this method is REALLY unreliable, and is basically impossible because of the way x86 handles some instructions based on the operands etc. for example: 0x80483b0 <main+20>: mov $0xb,%eax 0x80483b5 <main+25>: mov %esi,%ebx two mov instructions that presumably have the same opcode right!? so if we x/bx main+20 and main+25 the same hex opcode should presumably be there. this isnt the case. (gdb) x/bx main+20 0x80483b0 <main+20>: 0xb8 (gdb) x/bx main+25 0x80483b5 <main+25>: 0x89 if you get the INtel x86 developers notes you can generally get a list of the hex opcodes for the instructions (24319101.pdf). We can see that MOV has many faces one of which is 0x89, but as demonstrated above, we cant rely on this as a general rule, so it is not as easy as it looks. Many "disassemblers" just construct large matrix of opcodes, their sizes and such, but this really isnt accurate. What I see that most people have done is to take the hex opcodes and then to convert them to binary and take the bits that correspond to the actual x86 command and OR them with the values of the operands of the operation (registers, etc) and then convert them back to hex and test if they match with values in the shellcode string. THis is VERY painstaking, and again considerably unreliable. I suggest perhaps perusing the source code of gdb to see how it does the OR and all its stuff (x86). opcodes/i386-dis.c is a good place to get started (in the gdb src tree). When it comes down to it, x86 is VERY nasty. good luck, I would try to start small, and just keep building upon the routines that do the coversion. Using the bitwise OR is just a good a method to start with as any. For most x86 shellcode building a really rough matrix of coversion values and doing ORs has worked in most GENERAL cases. On Tue, 8 Oct 2002, Sean Zadig wrote: > Hi, > I'm doing some research into creating variants of common attacks, but I ran > into a problem of sorts. For most of the attacks I have, the shellcode > consists of the overflow and the actual malicious code that is run. I want > to be able to isolate the overflow from the rest of the shellcode and use > that to create attack variants. Problem is, I don't know where one ends and > the other begins! I figure if I turn the hex-encoded shellcode back into > assembly code, I could probably figure it out. I'm familiar with how to do > the reverse in gdb, but is it possible to do what I want? To restate: > shellcode -> asm is what I need. If this is a simple thing, my apologies - > but the security-basics list rejected my post =) > -Sean Zadig > > ----- > Sean Zadig > Student, UC Davis > PGP Key ID: 0xDE44A79F > 7EE1 C80A A0C1 B224 45CE F74B 5835 0115 DE44 A79F > > > _________________________________________________________________ > Chat with friends online, try MSN Messenger: http://messenger.msn.com >
This archive was generated by hypermail 2b30 : Tue Oct 08 2002 - 15:09:12 PDT