> It's not at all clear to me a) that UTF-8 sequences are allowed in *any* > HTTP headers (request or response) or b) how a server or client would decide MS Internet Explorer has an option to "Always send URLs as UTF-8". The help text states that this option, "Specifies whether to use UTF-8, a standard that defines characters so they are readable in any language. This enables you to exchange Internet addresses (URLs) that contain characters from any language." It is unclear whether IE sends UTF-8 URLs in requests, when sending links via e-mail, when saving bookmarks, or in some other case. > 2) The UTF-8 rules are kinda funny. 0xFE and 0xFF are illegal everywhere, > and other characters may be illegal depending on their placement, e.g. a > "starting" octet with 2^7 on and 2^6 off, or a "subsequent" octet that > doesn't have 2^7 on and 2^6 off. I wouldn't be surprised if some UTF-8 > parsing routines don't handle illegal characters gracefully, or if > applications don't gracefully trap errors reported by the UTF-8 parsing > routines, etc. This might be worth some testing. > > -Peter I attempted to post a query regarding this a while back but it got rejected. A very thorough and robust Unicode sanity-checking routine would be highly useful (and probably such a thing exists; I've never had to deal with this). z _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
This archive was generated by hypermail 2b30 : Mon Jun 11 2001 - 09:53:22 PDT