'unicode' vs URL encoding.

From: Cris Bailiff (c.bailiffat_private)
Date: Wed May 30 2001 - 06:46:20 PDT

  • Next message: Nobuo Miwa: "Re: TrendMicro Interscan VirusWall RegGo.dll BOf"

    To eDvice Security Services,
    
    Your bugtraq item on the NetGAP appliance incorrectly talks about the NetGAP
    system miss-interpreting '%65' as a 'unicode' encoding of the letter 'e'.
    
    This misconception has become prevalent in recent bugtraq postings, so I hope to
    try and clear it up for future reference - '%' encoding is used for the encoding
    of any 'non-legal' characters in URL format strings. The bug is that netgap does
    not 'URL decode' the string before doing comparisons.
    
    '%' (URL) Encoding is *not* unicode encoding - unicode is a multibyte character
    set, which uses binary values outside the 32-127 range of printable ASCII. When
    unicode characters are used in URLs, they are usually/often expressed in 'utf-8'
    encoding, which uses a short sequence of binary values to encode a full unicode
    character. Many of the values used in utf-8 encoding of unicode are illegal in
    URLs without using 'URL encoding' (% escaping), but not all % escaped characters
    represent either utf-8 or unicode...
    
    This is often mixed up because a number of Microsoft IIS vulnerabilities recently
    have been due to incorrect 'unicode' decoding and/or incorrect detection of utf-8
    encoded unicode characters, some of which was due to ambiguitites in the
    checking/removing of URL encoding. However, many more web server bugs are related
    solely to the common mistake of simply not removing URL encoding before doing
    security checks, such as the one demonstrated in NetGAP.
    
    I feel it important to distinguish these two classes wherever possible, as common
    unicode decoding errors are likely to impact a variety of security related
    software in future, even when that software has nothing to do with web
    applications or URL processing. Care in unicode handling is still required even
    when URL encoding issues have been correctly dealt with, and likewise, not using
    unicode does not prevent URL encoding from being a security problem...
    
    Cris Bailiff
    c.bailiffat_private
    



    This archive was generated by hypermail 2b30 : Wed May 30 2001 - 10:17:55 PDT