Re: [logs] Charset selection (Was: Re: EventLog library)

From: Mikael Olsson (mikael.olssonat_private)
Date: Tue Jan 07 2003 - 13:09:58 PST

  • Next message: Andrew Ross: "RE: [logs] Syslog payload format"

    Rainer Gerhards wrote:
    > 
    > > > * output modules character conversion
    > > >   should we use a single character encoding on the wire
    > > >   (UTF8?) would it be
    > > >   mandatory or it should be configurable?
    > >
    > > Just one. I prefer plain ASCII ;)
    > 
    > I opt for confgurable. ANSI does not work for double-byte character
    > sets. 
    
    Just a quick note on charsets.
    
    If the base level standard is 7-bit ASCII (not! 8-bit), it is really
    easy to extend it to UTF-8 without breaking stuff.
    Double-byte charset stuff is IMHO evil and should just plain be avoided.
    
    Just keep in mind that a log receiver that only understands ASCII could
    potentially parse a message COMPLETELY differently from one that 
    understands UTF-8, since e.g. double quotes can be (mis)represented in
    alternate UTF-8 encodings. :/  [1]
    
    
    -- 
    Mikael Olsson, Clavister AB
    Storgatan 12, Box 393, SE-891 28 ÖRNSKÖLDSVIK, Sweden
    Phone: +46 (0)660 29 92 00   Mobile: +46 (0)70 26 222 05
    Fax: +46 (0)660 122 50       WWW: http://www.clavister.com
    
    [1] I'm of the view that any UTF-8 generator that uses UTF-8 escapes to 
    represent 7-bit ASCII characters is plain b0rken, and an UTF-8 parser
    should just refuse to listen to it.  It is unfortunate that it is even
    possible to _do_ this; the spec should have been built so that an 
    encoded \x00 is \x80, but that's too late now.
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    http://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Wed Jan 08 2003 - 08:48:49 PST