Re: Re[2]: [logs] Logging: World Domination

From: Bennett Todd (betat_private)
Date: Thu Aug 22 2002 - 14:02:01 PDT

  • Next message: Darren Reed: "Re: Re[2]: [logs] Logging: World Domination"

    2002-08-22-15:25:18 Chris Adams:
    > >He put it compactly. A more elaborate statement might be "XML is a
    > >very heavy-weight framework for constructing languages; while it may
    > >be valuable in certain contexts involving highly automated,
    > >distributed, and heterogenous maintenance of gigantic corpuses of
    > 
    > That sure sounds like logging to me.
    
    If, however, you hadn't truncated my comment, it's possible it would
    sound less like logging.
    
    When I refer to "gigantic corpuses of structured text", I'm not
    referring to log data; I'm referring to e.g. all the web pages on
    the world-wide web; all the reference manuals in the Linux
    Documentation Project; all the documentation that a company the size
    of Sun cranks out; etc. The traditional homes of SGML. XML, which is
    just SGML with some of the more obscure cruft trimmed off, is useful
    in the same places that SGML is useful. It remains a lousy choice
    for lightweight applications. Its flexibility is actually in
    opposition to the goal of the project currently under discussion.
    
    > We need automated creation, distribution and analysis to work
    > across numerous different platforms with products from hundreds of
    > different vendors.
    
    We can express the semantics that we need with a record that's a
    linear list of whitespace-separated tokens on a single text line,
    with some fixed fields aways required, followed by heirarchically
    assigned tokens, first one defining a category ("OS"; "Firewall";
    "DB"; "Webserver"; ...); then a separate list of tokens for each of
    those categories; the remainder of the record with format defined
    appropriately for each of those; and so forth.
    
    Or we could express the same thing with XML, to buy ourselves some
    buzzword compliance at the expense of a preposterously more complex
    (==inefficient, nonportable, bugridden, security-problem-inducing)
    parser for a class of complex structured languages. If we want to
    sabotage this project, XML would be a fine step.
    
    Or, as the previous poster more reasonably put it, "XML sucks".
    
    > That's the whole point to interchange languages like XML - you
    > can translate it into anything you like when it gets to the final
    > storage point but use a standard format to cross the vendor
    > boundaries.
    
    That's the whole point to a canonical reference format. XML is more
    complex than we need for this application; the only things it
    contributes --- increased flexibility in record format, increased
    likelihood of security and performance and portability problems due
    to the vastly more complex parser required --- are things that we do
    not want.
    
    > I get the impression that you haven't seriously looked at XML in several 
    > years.
    
    I keep an eye on it, but the only problems for which it's a
    reasonable answer that I've seen are problems in the traditional
    SGML spaces.
    
    > There are now a number of high-performance validating XML parsers 
    > available, both commercial and open source.
    
    Find me one anywhere approaching the simplicity of
    split-on-whitespace and I get interested. If it buys us nothing but
    increased complexity, then it's a lose.
    
    > We could provide the same thing in for any new format but that's a huge 
    > amount of work we can get for free if we decide that while XML isn't 
    > perfect it is good enough to do the job.
    
    It's only a huge amount of work for people who have trouble
    splitting text on whitespace. I hope we won't have them working on
    these tools. I sure wouldn't want anybody who couldn't easily code
    split on whitespace doing security-critical coding with an XML
    toolkit. That sounds like a really really bad idea.
    
    > >But even:
    > >
    > >	<event host="..." timestamp="1234567890">
    > >
    > >would seem to me to be less desireable than
    > >
    > >	1234567890 ...
    > >
    > >I sure know which I'd rather parse.
    > 
    > The second one is easier. Unfortunately, "1234567890 ..." contains no 
    > information about what each of the fields actually means and we need to
    > have a number of optional fields which are only applicable to certain 
    > classes of message or are vendor specific.
    
    If we want to be able to process the XML in a fashion more automated
    than existing syslog noise, we have to force the XML to be
    structured and conform to strict rules. We can enforce the same
    constraints on a string of tokens. XML doesn't make this job easier.
    
    > If we use XML, we don't have to change anything. If we're using
    > white-space separated lists, we have to throw out everything
    > and replace it with some sort of tagged format when we realize
    > that we'd like to do more complex analysis and an smtp server
    > has a fundamentally different set of things it can report than a
    > firewall, database, web server or storage manager.
    
    Nope, we just need to have a different family of tags for SMTP
    server records from the family of tags for firewalls, which in turn
    would be distinct from database, webserver, storage manager, etc. We
    must enforce the same data structuring requirements either way; if
    we _use_ the extra flexibility XML gives us, we lose the ability to
    automate the processing. They're equivalent for this job; let's use
    the simpler one.
    
    > Of course, there's a different answer to "I sure know which I'd rather 
    > parse": it'll take me less time to do "$events = XMLIn('syslog.xml')" 
    > than it will to parse anything.
    
    Now find me an XML parser that can be used that way, portably across
    platforms, which performs reasonably well, and whose code isn't
    enough more complex than a simple whitespace split to guarantee new
    and special security problems, and you'll have succeeded in making a
    reasonable argument that XML isn't much worse than a simple
    whitespace list of tokens. I have yet to see any way in which it
    would be better.
    
    > There seemed to be a general consensus that we need a replacement for 
    > syslog [...]
    
    I missed that consensus. We need a more formally structured log
    format, allowing more structured representation of heirarchical
    classification, and perhaps some other things.
    
    > [...] which is tag based to allow different bits of information to
    > be recorded in a structured fashion - that's why we need more than
    > the simpler format you proposed can deliver.
    
    I've yet to see that. We need a prefix of fixed fields, followed by
    heirarchically assigned tokens, possibly followed by something else.
    But the only use for the flexibility of XML would be to help ensure
    that the data could not be automatically processed any better than
    the current syslog raw strings.
    
    > The question is whether we should invent our own format or use a
    > standard format like XML.
    
    XML is a lovely standard, well-suited to a widlly different
    application domain. For jobs like this, it sucks.
    
    > I think that the overhead of using XML will not be significant
    > compared to using any other tagged format and that there's a big
    > advantage to picking a widely used, well supported standard.
    
    Whew. Where's my XML parser that's not substantially heavier-weight
    than perl's split// or C's strtok(3)? Find me one and I'll back off.
    Paying orders of magnitude in performance and code complexity to buy
    the flexibility to prevent our project from succeeding sounds like a
    lose to me.
    
    > In a perfect world we'd have time to develop the One True Log
    > format and legions of programmers to spend months providing
    > support for it everywhere. Since that's not the case, I inclined
    > to say that anything which allows us to spend less time
    > reinventing the wheel and more time on analysis is a good thing.
    
    XML is more work for everybody to implement, and makes the job
    harder, and less likely to succeed. XML sucks.
    
    -Bennett
    
    
    

    _______________________________________________ LogAnalysis mailing list LogAnalysisat_private https://lists.shmoo.com/mailman/listinfo/loganalysis



    This archive was generated by hypermail 2b30 : Thu Aug 22 2002 - 14:42:05 PDT