Re: Re[2]: [logs] Logging: World Domination

cadamsat_private

On Thursday, August 22, 2002, at 09:58 , Bennett Todd wrote:
> 2002-08-21-17:59:58 Chris Adams:
>> On Wednesday, August 21, 2002, at 12:05 , Greg Black wrote:
>>> | if you propose something like this and don't use XML, the first
>>> | question you're going to get will invariably be "why didn't you
>>> | use XML?"
>>>
>>> To which a reasonable answer is: "because it sucks."
>>
>> Hyperbole is unlikely to prove very persuasive here.
>
> I can't speak for Greg's intent here, but I didn't take his comment
> as hyperbole, just a calm and reasonable statement of the well-known
> truth.

It's hyperbole. Your message is useful because it contains actual points 
in favor as opposed to an unsupported statement which is obviously 
exagerated.

> He put it compactly. A more elaborate statement might be "XML is a
> very heavy-weight framework for constructing languages; while it may
> be valuable in certain contexts involving highly automated,
> distributed, and heterogenous maintenance of gigantic corpuses of

That sure sounds like logging to me. We need automated creation, 
distribution and analysis to work across numerous different platforms 
with products from hundreds of different vendors. That's the whole point 
to interchange languages like XML - you can translate it into anything 
you like when it gets to the final storage point but use a standard 
format to cross the vendor boundaries.

>> Not using XML means giving up everything from the existing
>> parsers and language support to the XML support many databases
>> are starting to have (given the value of a database for ad-hoc
>> queries, I'm inclined to say that's worth a little bloat to get
>> all of your logs into one).
>
> Ad-hoc queries are practical with various frameworks. XML-based
> databases are just the hairiest, slowest, most fragile one around.

Pure XML databases have improved considerably in the last couple years 
but I wasn't thinking about those so much as the fact that there are now 
a large number of tools to pull XML data into more conventional 
databases.

> Giving up existing parsers and language support is only a negative
> if there exists a securely-written, high-performance, portable XML
> parser toolkit. I've never heard of one. So it sounds like giving up
> XML support tools would be beneficial for this application.

I get the impression that you haven't seriously looked at XML in several 
years. There are now a number of high-performance validating XML parsers 
available, both commercial and open source. Language support has also 
improved considerably - it's now increasingly common simply to give a 
function some XML and get back an (array|object|hash) with the contents 
or vice versa.

We could provide the same thing in for any new format but that's a huge 
amount of work we can get for free if we decide that while XML isn't 
perfect it is good enough to do the job.

>> The size issue becomes a lot less of a problem if you've designed your
>> DTD properly (e.g. resisting the urge to be unnecessarily verbose -
>> <event host="..." timestamp="1234567890"> instead of
>> <event><ip_hostname>fqdn.example.com</ip_hostname><timestamp>Fri Feb 13
>> 15:31:30 PST 2009</timestamp>) and are using compression.
>
> But even:
>
> 	<event host="..." timestamp="1234567890">
>
> would seem to me to be less desireable than
>
> 	1234567890 ...
>
> I sure know which I'd rather parse.

The second one is easier. Unfortunately, "1234567890 ..." contains no 
information about what each of the fields actually means and we need to 
have a number of optional fields which are only applicable to certain 
classes of message or are vendor specific. If we use XML, we don't have 
to change anything. If we're using white-space separated lists, we have 
to throw out everything and replace it with some sort of tagged format 
when we realize that we'd like to do more complex analysis and an smtp 
server has a fundamentally different set of things it can report than a 
firewall, database, web server or storage manager.

Of course, there's a different answer to "I sure know which I'd rather 
parse": it'll take me less time to do "$events = XMLIn('syslog.xml')" 
than it will to parse anything.

>> The processing time concern is more of a problem but XML parsers have
>> advanced considerably over the last few years. A well designed DTD
>> should be surprisingly close to something like the typical Perl script
>> which has to parse all of the slightly different variations of the same
>> syslog message.
>
> The point of this discussion (assuming I've understood Tina's intent
> properly) is to do away with the slightly different variations, to
> produce a canonical structured format suitable for highly automated
> processing. I've yet to see anything that XML would add to this,
> that I would like to see added.

There seemed to be a general consensus that we need a replacement for 
syslog which is tag based to allow different bits of information to be 
recorded in a structured fashion - that's why we need more than the 
simpler format you proposed can deliver. The question is whether we 
should invent our own format or use a standard format like XML. I think 
that the overhead of using XML will not be significant compared to using 
any other tagged format and that there's a big advantage to picking a 
widely used, well supported standard.

In a perfect world we'd have time to develop the One True Log format and 
legions of programmers to spend months providing support for it 
everywhere. Since that's not the case, I inclined to say that anything 
which allows us to spend less time reinventing the wheel and more time 
on analysis is a good thing.

Chris

_______________________________________________
LogAnalysis mailing list
LogAnalysisat_private
https://lists.shmoo.com/mailman/listinfo/loganalysis