Re: Webtrends HTTP Server %20 bug (UTF-8)

From: Peter W (peterwat_private)
Date: Fri Jun 08 2001 - 12:40:56 PDT

  • Next message: Theo de Raadt: "Re: SSH / X11 auth: needless complexity -> security problems?"

    On Fri, Jun 08, 2001 at 04:51:57AM +0100, Glynn Clements wrote:
    > 
    > Eric Hacker wrote:
    
    > > Conveniently, UTF8 uses the same
    > > values as ASCII for ASCII representation. Above the standard ASCII 127
    > > character representation, UTF8 uses multi-byte strings beginning with 0xC1.
    > 
    > No; the sequences for codes 128 to 255 begin with 0xC2 and 0xC3
    
    And encodings for 256 - (2^32 -1) use other values in the first octet.
    
    Two points here:
    
     1) Eric wrote "As a URL cannot contain spaces or other special characters, 
    URL encoding is used to transport them. Thus all UTF8 characters above ASCII 
    are supposed to be URL encoded in order to be sent."
    
    It's not at all clear to me a) that UTF-8 sequences are allowed in *any*
    HTTP headers (request or response) or b) how a server or client would decide
    whether a possible UTF-8 sequence like %C3%B3 is UTF-8 for the single value
    0xF3 or the two-character phrase 0xC3 + 0xB3. All indications in the RFCs
    (2068, 1738, 1808) suggest that only characters 0x00 - 0xFF are expected in
    the various headers, and that no UTF-8, double-byte, or other
    representations are allowed.
    
     2) The UTF-8 rules are kinda funny. 0xFE and 0xFF are illegal everywhere,
    and other characters may be illegal depending on their placement, e.g. a
    "starting" octet with 2^7 on and 2^6 off, or a "subsequent" octet that
    doesn't have 2^7 on and 2^6 off. I wouldn't be surprised if some UTF-8
    parsing routines don't handle illegal characters gracefully, or if
    applications don't gracefully trap errors reported by the UTF-8 parsing
    routines, etc. This might be worth some testing.
    
    -Peter
    



    This archive was generated by hypermail 2b30 : Sun Jun 10 2001 - 14:32:04 PDT