Re: [logs] Charset selection (Was: Re: EventLog library)

From: Mikael Olsson (mikael.olssonat_private)
Date: Wed Jan 08 2003 - 03:32:25 PST

  • Next message: Rainer Gerhards: "RE: [logs] Charset selection (Was: Re: EventLog library)"

    Rainer Gerhards wrote:
    > > If the base level standard is 7-bit ASCII (not! 8-bit), it is
    > > really easy to extend it to UTF-8 without breaking stuff.
    > > Double-byte charset stuff is IMHO evil and should just plain
    > > be avoided.
    > Well, isn't UTF-8 a kind of DBCS encoding? And have you followed the
    > limited acceptance Unicode receives in Japan. The problem are statements
    > like yours IMHO. If I were Japanese, I wouldn't like to read the the
    > encoding I need to use to make things working is "evil".
    Japanese and chinese writing systems are evil, too :)  </flamebait>
    Hrm, I might have gone a bit overboard there. DBCS using lead bytes
    might still be easy to use (it doesn't insert NULs, does it?).
    I was thinking more along the lines of Win32 Unicode, which I do
    believe is nothing but evil, partly from a storage/protocol point
    of view, but mostly from a programming point of view.
    I've been forced to deal with unicode in the past, only to get 
    tripped up by such trivial facts as "how the HELL do you store
    a unicode string in an SQL database?  -- Whoops, can't be done,
    unless you store it as a blob, and then you can't search on it".
    UTF-8 doesn't really have such problems.  It can be copied/stored/etc
    with normal string management routines, as long as you keep the 
    string intact and don't truncate it.  Is this also the case
    with DBCS encoding?
    Mikael Olsson, Clavister AB
    Storgatan 12, Box 393, SE-891 28 ÖRNSKÖLDSVIK, Sweden
    Phone: +46 (0)660 29 92 00   Mobile: +46 (0)70 26 222 05
    Fax: +46 (0)660 122 50       WWW:
    "Senex semper diu dormit"
    LogAnalysis mailing list

    This archive was generated by hypermail 2b30 : Wed Jan 08 2003 - 08:32:08 PST