2003-01-08T20:34:34 Darren Reed: > Is there a compelling reason to keep traffic between log daemons in > "text strings" rather than wrap them up in something else with byte > counts and no CR-LF stuff and just exchange typed data in a manner > that allows you to be ignorant of what character set is in use ? I'd argue rather that if we aren't going to ignore this issue, we should settle it by mandating strict 7-bit US-ASCII printables in the normal 8bit embedding. If we produce a specification or implementation that's tolerant of 8bit messages, we're setting ourselves up for a bomb to go off under our kiesters down the road, when different log text processors apply radically different interpretations to the exact same logged message --- and some of those interpretations tickle bugs causing security problems. If instead we force people who want to syslog kanji, or accented characters, or anything else outside of strict 7bit US-ASCII to go with some encoding onto US-ASCII, like e.g. SGML entity references; then we'd have the characteristic that implementations would have the privilege of being blind to charsets without running a risk of introducing security problems. This isn't a critique of the appropriateness of the general concept of being binary-transparent and letting people pick interpretations that suit 'em; in many venues that works really well. But logging tends to lie fairly near to security concerns, and right now charsets are a fraught area, with different people advocating different solutions, applying different interpretations to 8-bit-binary data, and in some cases opening unexpected ways to slip dangerous embedded characters past screeners trying to block them. Suppose someone wants to write a nice generic logfile viewer, that presents sliced-n-diced log data to a web browser. They're already going to be having to escape "<", ">", and "&" in the logged text before croaking it out at the browser. Let's not force them to also know every possible way anyone can ever invent to encode those in any possible multibyte charset. -Bennett
This archive was generated by hypermail 2b30 : Thu Jan 09 2003 - 12:05:10 PST