2002-08-21-17:59:58 Chris Adams: > On Wednesday, August 21, 2002, at 12:05 , Greg Black wrote: > >| if you propose something like this and don't use XML, the first > >| question you're going to get will invariably be "why didn't you > >| use XML?" > > > >To which a reasonable answer is: "because it sucks." > > Hyperbole is unlikely to prove very persuasive here. I can't speak for Greg's intent here, but I didn't take his comment as hyperbole, just a calm and reasonable statement of the well-known truth. He put it compactly. A more elaborate statement might be "XML is a very heavy-weight framework for constructing languages; while it may be valuable in certain contexts involving highly automated, distributed, and heterogenous maintenance of gigantic corpuses of structured text, both XML the language specification standard and also the tools available to help implement it are vastly too complex for many, perhaps most of the jobs to which people try to apply it. Or, in short, XML sucks. At least for anything besides a somewhat cleaned-up replacement for SGML. > Not using XML means giving up everything from the existing > parsers and language support to the XML support many databases > are starting to have (given the value of a database for ad-hoc > queries, I'm inclined to say that's worth a little bloat to get > all of your logs into one). Ad-hoc queries are practical with various frameworks. XML-based databases are just the hairiest, slowest, most fragile one around. Giving up existing parsers and language support is only a negative if there exists a securely-written, high-performance, portable XML parser toolkit. I've never heard of one. So it sounds like giving up XML support tools would be beneficial for this application. > The size issue becomes a lot less of a problem if you've designed your > DTD properly (e.g. resisting the urge to be unnecessarily verbose - > <event host="..." timestamp="1234567890"> instead of > <event><ip_hostname>fqdn.example.com</ip_hostname><timestamp>Fri Feb 13 > 15:31:30 PST 2009</timestamp>) and are using compression. But even: <event host="..." timestamp="1234567890"> would seem to me to be less desireable than 1234567890 ... I sure know which I'd rather parse. > The processing time concern is more of a problem but XML parsers have > advanced considerably over the last few years. A well designed DTD > should be surprisingly close to something like the typical Perl script > which has to parse all of the slightly different variations of the same > syslog message. The point of this discussion (assuming I've understood Tina's intent properly) is to do away with the slightly different variations, to produce a canonical structured format suitable for highly automated processing. I've yet to see anything that XML would add to this, that I would like to see added. > In both cases, neither would be a significant problem even now and > Moore's law suggests this won't change for the worse. Extra complexity needs some justification. How will XML improve our position relative to a few fixed fields followed by heirarchically-assigned tokens? A whitespace-separated token list is sufficiently expressive for everything I've heard claimed that we want to do now; and the increased flexibility that XML offers would seem to me to be a negative for this job. Or, to put it succinctly, "XML sucks". -Bennett
This archive was generated by hypermail 2b30 : Thu Aug 22 2002 - 10:13:51 PDT