Re: [logs] Charset selection (Was: Re: EventLog library)

From: Bennett Todd (betat_private)
Date: Thu Jan 09 2003 - 09:59:26 PST

  • Next message: Rainer Gerhards: "RE: [logs] Charset selection (Was: Re: EventLog library)"

    2003-01-08T20:34:34 Darren Reed:
    > Is there a compelling reason to keep traffic between log daemons in
    > "text strings" rather than wrap them up in something else with byte
    > counts and no CR-LF stuff and just exchange typed data in a manner
    > that allows you to be ignorant of what character set is in use ?
    
    I'd argue rather that if we aren't going to ignore this issue, we
    should settle it by mandating strict 7-bit US-ASCII printables
    in the normal 8bit embedding. If we produce a specification or
    implementation that's tolerant of 8bit messages, we're setting
    ourselves up for a bomb to go off under our kiesters down the
    road, when different log text processors apply radically different
    interpretations to the exact same logged message --- and some of
    those interpretations tickle bugs causing security problems.
    
    If instead we force people who want to syslog kanji, or accented
    characters, or anything else outside of strict 7bit US-ASCII to go
    with some encoding onto US-ASCII, like e.g. SGML entity references;
    then we'd have the characteristic that implementations would have
    the privilege of being blind to charsets without running a risk of
    introducing security problems.
    
    This isn't a critique of the appropriateness of the general concept
    of being binary-transparent and letting people pick interpretations
    that suit 'em; in many venues that works really well. But logging
    tends to lie fairly near to security concerns, and right now
    charsets are a fraught area, with different people advocating
    different solutions, applying different interpretations to
    8-bit-binary data, and in some cases opening unexpected ways to
    slip dangerous embedded characters past screeners trying to block
    them.
    
    Suppose someone wants to write a nice generic logfile viewer, that
    presents sliced-n-diced log data to a web browser. They're already
    going to be having to escape "<", ">", and "&" in the logged text
    before croaking it out at the browser. Let's not force them to also
    know every possible way anyone can ever invent to encode those in
    any possible multibyte charset.
    
    -Bennett
    
    
    

    _______________________________________________ LogAnalysis mailing list LogAnalysisat_private http://lists.shmoo.com/mailman/listinfo/loganalysis



    This archive was generated by hypermail 2b30 : Thu Jan 09 2003 - 12:05:10 PST