> If the base level standard is 7-bit ASCII (not! 8-bit), it is > really easy to extend it to UTF-8 without breaking stuff. > Double-byte charset stuff is IMHO evil and should just plain > be avoided. Well, isn't UTF-8 a kind of DBCS encoding? And have you followed the limited acceptance Unicode receives in Japan. The problem are statements like yours IMHO. If I were Japanese, I wouldn't like to read the the encoding I need to use to make things working is "evil". I agree there are issues with DBCS and it is not easy to use. But there is JIS, S-JIS, EUC and we need to live with that. If we don't, our standards will probably not of any interest in those markets that have the need for DBCS. And over the years, these markets will outgrow the others, at least in number of people involved. However, I agree that first steps should be taken first. Let's get an initial version running with ANSI. Then let's think about what we can do for other encodings. BTW: this is something that beep has already solved ;) > > Just keep in mind that a log receiver that only understands > ASCII could potentially parse a message COMPLETELY > differently from one that > understands UTF-8, since e.g. double quotes can be > (mis)represented in alternate UTF-8 encodings. :/ [1] Agree - there should be an exchange on which charset is to be used... > [1] I'm of the view that any UTF-8 generator that uses UTF-8 > escapes to > represent 7-bit ASCII characters is plain b0rken, and an > UTF-8 parser should just refuse to listen to it. It is > unfortunate that it is even possible to _do_ this; the spec > should have been built so that an > encoded \x00 is \x80, but that's too late now. Fully agree on that - including that it's too late ;) Rainer _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Wed Jan 08 2003 - 08:15:45 PST