Re: Re[2]: [logs] Logging: World Domination

From: Chris Adams (cadamsat_private)
Date: Thu Aug 22 2002 - 12:25:18 PDT

  • Next message: Ogle Ron (Rennes): "RE: Re[2]: [logs] Logging: World Domination"

    On Thursday, August 22, 2002, at 09:58 , Bennett Todd wrote:
    > 2002-08-21-17:59:58 Chris Adams:
    >> On Wednesday, August 21, 2002, at 12:05 , Greg Black wrote:
    >>> | if you propose something like this and don't use XML, the first
    >>> | question you're going to get will invariably be "why didn't you
    >>> | use XML?"
    >>>
    >>> To which a reasonable answer is: "because it sucks."
    >>
    >> Hyperbole is unlikely to prove very persuasive here.
    >
    > I can't speak for Greg's intent here, but I didn't take his comment
    > as hyperbole, just a calm and reasonable statement of the well-known
    > truth.
    
    It's hyperbole. Your message is useful because it contains actual points 
    in favor as opposed to an unsupported statement which is obviously 
    exagerated.
    
    > He put it compactly. A more elaborate statement might be "XML is a
    > very heavy-weight framework for constructing languages; while it may
    > be valuable in certain contexts involving highly automated,
    > distributed, and heterogenous maintenance of gigantic corpuses of
    
    That sure sounds like logging to me. We need automated creation, 
    distribution and analysis to work across numerous different platforms 
    with products from hundreds of different vendors. That's the whole point 
    to interchange languages like XML - you can translate it into anything 
    you like when it gets to the final storage point but use a standard 
    format to cross the vendor boundaries.
    
    >> Not using XML means giving up everything from the existing
    >> parsers and language support to the XML support many databases
    >> are starting to have (given the value of a database for ad-hoc
    >> queries, I'm inclined to say that's worth a little bloat to get
    >> all of your logs into one).
    >
    > Ad-hoc queries are practical with various frameworks. XML-based
    > databases are just the hairiest, slowest, most fragile one around.
    
    Pure XML databases have improved considerably in the last couple years 
    but I wasn't thinking about those so much as the fact that there are now 
    a large number of tools to pull XML data into more conventional 
    databases.
    
    > Giving up existing parsers and language support is only a negative
    > if there exists a securely-written, high-performance, portable XML
    > parser toolkit. I've never heard of one. So it sounds like giving up
    > XML support tools would be beneficial for this application.
    
    I get the impression that you haven't seriously looked at XML in several 
    years. There are now a number of high-performance validating XML parsers 
    available, both commercial and open source. Language support has also 
    improved considerably - it's now increasingly common simply to give a 
    function some XML and get back an (array|object|hash) with the contents 
    or vice versa.
    
    We could provide the same thing in for any new format but that's a huge 
    amount of work we can get for free if we decide that while XML isn't 
    perfect it is good enough to do the job.
    
    >> The size issue becomes a lot less of a problem if you've designed your
    >> DTD properly (e.g. resisting the urge to be unnecessarily verbose -
    >> <event host="..." timestamp="1234567890"> instead of
    >> <event><ip_hostname>fqdn.example.com</ip_hostname><timestamp>Fri Feb 13
    >> 15:31:30 PST 2009</timestamp>) and are using compression.
    >
    > But even:
    >
    > 	<event host="..." timestamp="1234567890">
    >
    > would seem to me to be less desireable than
    >
    > 	1234567890 ...
    >
    > I sure know which I'd rather parse.
    
    The second one is easier. Unfortunately, "1234567890 ..." contains no 
    information about what each of the fields actually means and we need to 
    have a number of optional fields which are only applicable to certain 
    classes of message or are vendor specific. If we use XML, we don't have 
    to change anything. If we're using white-space separated lists, we have 
    to throw out everything and replace it with some sort of tagged format 
    when we realize that we'd like to do more complex analysis and an smtp 
    server has a fundamentally different set of things it can report than a 
    firewall, database, web server or storage manager.
    
    Of course, there's a different answer to "I sure know which I'd rather 
    parse": it'll take me less time to do "$events = XMLIn('syslog.xml')" 
    than it will to parse anything.
    
    >> The processing time concern is more of a problem but XML parsers have
    >> advanced considerably over the last few years. A well designed DTD
    >> should be surprisingly close to something like the typical Perl script
    >> which has to parse all of the slightly different variations of the same
    >> syslog message.
    >
    > The point of this discussion (assuming I've understood Tina's intent
    > properly) is to do away with the slightly different variations, to
    > produce a canonical structured format suitable for highly automated
    > processing. I've yet to see anything that XML would add to this,
    > that I would like to see added.
    
    There seemed to be a general consensus that we need a replacement for 
    syslog which is tag based to allow different bits of information to be 
    recorded in a structured fashion - that's why we need more than the 
    simpler format you proposed can deliver. The question is whether we 
    should invent our own format or use a standard format like XML. I think 
    that the overhead of using XML will not be significant compared to using 
    any other tagged format and that there's a big advantage to picking a 
    widely used, well supported standard.
    
    In a perfect world we'd have time to develop the One True Log format and 
    legions of programmers to spend months providing support for it 
    everywhere. Since that's not the case, I inclined to say that anything 
    which allows us to spend less time reinventing the wheel and more time 
    on analysis is a good thing.
    
    Chris
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    https://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Thu Aug 22 2002 - 12:38:09 PDT