Re: [logs] Charset selection (Was: Re: EventLog library)

Previous message: Rainer Gerhards: "RE: [logs] Syslog payload format"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

mikael.olssonat_private

Rainer Gerhards wrote:
> 
> > If the base level standard is 7-bit ASCII (not! 8-bit), it is
> > really easy to extend it to UTF-8 without breaking stuff.
> > Double-byte charset stuff is IMHO evil and should just plain
> > be avoided.
> 
> Well, isn't UTF-8 a kind of DBCS encoding? And have you followed the
> limited acceptance Unicode receives in Japan. The problem are statements
> like yours IMHO. If I were Japanese, I wouldn't like to read the the
> encoding I need to use to make things working is "evil".

Japanese and chinese writing systems are evil, too :)  </flamebait>

Hrm, I might have gone a bit overboard there. DBCS using lead bytes
might still be easy to use (it doesn't insert NULs, does it?).

I was thinking more along the lines of Win32 Unicode, which I do
believe is nothing but evil, partly from a storage/protocol point
of view, but mostly from a programming point of view.
I've been forced to deal with unicode in the past, only to get 
tripped up by such trivial facts as "how the HELL do you store
a unicode string in an SQL database?  -- Whoops, can't be done,
unless you store it as a blob, and then you can't search on it".

UTF-8 doesn't really have such problems.  It can be copied/stored/etc
with normal string management routines, as long as you keep the 
string intact and don't truncate it.  Is this also the case
with DBCS encoding?

-- 
Mikael Olsson, Clavister AB
Storgatan 12, Box 393, SE-891 28 ÖRNSKÖLDSVIK, Sweden
Phone: +46 (0)660 29 92 00   Mobile: +46 (0)70 26 222 05
Fax: +46 (0)660 122 50       WWW: http://www.clavister.com

"Senex semper diu dormit"
_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
http://lists.shmoo.com/mailman/listinfo/loganalysis