On Thursday, August 22, 2002, at 09:58 , Bennett Todd wrote: > 2002-08-21-17:59:58 Chris Adams: >> On Wednesday, August 21, 2002, at 12:05 , Greg Black wrote: >>> | if you propose something like this and don't use XML, the first >>> | question you're going to get will invariably be "why didn't you >>> | use XML?" >>> >>> To which a reasonable answer is: "because it sucks." >> >> Hyperbole is unlikely to prove very persuasive here. > > I can't speak for Greg's intent here, but I didn't take his comment > as hyperbole, just a calm and reasonable statement of the well-known > truth. It's hyperbole. Your message is useful because it contains actual points in favor as opposed to an unsupported statement which is obviously exagerated. > He put it compactly. A more elaborate statement might be "XML is a > very heavy-weight framework for constructing languages; while it may > be valuable in certain contexts involving highly automated, > distributed, and heterogenous maintenance of gigantic corpuses of That sure sounds like logging to me. We need automated creation, distribution and analysis to work across numerous different platforms with products from hundreds of different vendors. That's the whole point to interchange languages like XML - you can translate it into anything you like when it gets to the final storage point but use a standard format to cross the vendor boundaries. >> Not using XML means giving up everything from the existing >> parsers and language support to the XML support many databases >> are starting to have (given the value of a database for ad-hoc >> queries, I'm inclined to say that's worth a little bloat to get >> all of your logs into one). > > Ad-hoc queries are practical with various frameworks. XML-based > databases are just the hairiest, slowest, most fragile one around. Pure XML databases have improved considerably in the last couple years but I wasn't thinking about those so much as the fact that there are now a large number of tools to pull XML data into more conventional databases. > Giving up existing parsers and language support is only a negative > if there exists a securely-written, high-performance, portable XML > parser toolkit. I've never heard of one. So it sounds like giving up > XML support tools would be beneficial for this application. I get the impression that you haven't seriously looked at XML in several years. There are now a number of high-performance validating XML parsers available, both commercial and open source. Language support has also improved considerably - it's now increasingly common simply to give a function some XML and get back an (array|object|hash) with the contents or vice versa. We could provide the same thing in for any new format but that's a huge amount of work we can get for free if we decide that while XML isn't perfect it is good enough to do the job. >> The size issue becomes a lot less of a problem if you've designed your >> DTD properly (e.g. resisting the urge to be unnecessarily verbose - >> <event host="..." timestamp="1234567890"> instead of >> <event><ip_hostname>fqdn.example.com</ip_hostname><timestamp>Fri Feb 13 >> 15:31:30 PST 2009</timestamp>) and are using compression. > > But even: > > <event host="..." timestamp="1234567890"> > > would seem to me to be less desireable than > > 1234567890 ... > > I sure know which I'd rather parse. The second one is easier. Unfortunately, "1234567890 ..." contains no information about what each of the fields actually means and we need to have a number of optional fields which are only applicable to certain classes of message or are vendor specific. If we use XML, we don't have to change anything. If we're using white-space separated lists, we have to throw out everything and replace it with some sort of tagged format when we realize that we'd like to do more complex analysis and an smtp server has a fundamentally different set of things it can report than a firewall, database, web server or storage manager. Of course, there's a different answer to "I sure know which I'd rather parse": it'll take me less time to do "$events = XMLIn('syslog.xml')" than it will to parse anything. >> The processing time concern is more of a problem but XML parsers have >> advanced considerably over the last few years. A well designed DTD >> should be surprisingly close to something like the typical Perl script >> which has to parse all of the slightly different variations of the same >> syslog message. > > The point of this discussion (assuming I've understood Tina's intent > properly) is to do away with the slightly different variations, to > produce a canonical structured format suitable for highly automated > processing. I've yet to see anything that XML would add to this, > that I would like to see added. There seemed to be a general consensus that we need a replacement for syslog which is tag based to allow different bits of information to be recorded in a structured fashion - that's why we need more than the simpler format you proposed can deliver. The question is whether we should invent our own format or use a standard format like XML. I think that the overhead of using XML will not be significant compared to using any other tagged format and that there's a big advantage to picking a widely used, well supported standard. In a perfect world we'd have time to develop the One True Log format and legions of programmers to spend months providing support for it everywhere. Since that's not the case, I inclined to say that anything which allows us to spend less time reinventing the wheel and more time on analysis is a good thing. Chris _______________________________________________ LogAnalysis mailing list LogAnalysisat_private https://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Thu Aug 22 2002 - 12:38:09 PDT