Re: [logs] Re: Generic Log Message Parsing Tool

From: Marcus J. Ranum (mjrat_private)
Date: Wed Jun 05 2002 - 12:36:26 PDT

  • Next message: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"

    Sweth Chandramouli wrote:
    >few enough people really understand how they work
    >that, beyond a certain level of complexity, they invariably get them
    >wrong in one way or another.  That said, I don't have the same religious
    >objections to regexes that you seem to[...]
    
    I agree with you that people invariably get them wrong!! That's
    not a "religious" objection - that's a perfectly legitimate and
    rational objection. Indeed, all my objections to regexps are
    based on technical rationales that may be externally assessed.
    When someone claims something is a matter of "religion" they
    are implying that it's an argument based on faith or imaginary
    playmates or other metaphysical attributes, not on something that
    can be externally measured. My comments about regexps were not
    motivated by theistic dogma, or dogma of any sort whatsoever -
    they were motivated by a hell of a lot of experience with parsing
    and writing regexps and an appreciation of the strengths and
    weaknesses of the technique. :) Anyhow - I know you didn't
    intend to hit a hot button but calling something "religious" is
    a hefty insult where I come from. ;)
    
    >> The approach I was working on relied on correct matching of
    >> combinations of space and non-space. Regexps are really a pain
    >> in the butt if you want to match on whitespace. You need to use
    >> something like: " *" oops wait there could be "[ \t]*" and oops
    >> you can't handle newlines right... Eeeew...  Regexps are a good
    >> tool for simple searching - they're not a good tool for simple
    >> parsing.
    >        Here's where we disagree most.  For simple parsing, I
    >think there's little better than a well-understood regex engine.
    
    OK, fine. We're not doing simple parsing. :)
    
    >  The
    >two I mentioned earlier are steller for things like you are describing,
    >with macros like "\s" to match whitespace (and a flag to allow that to
    >include newlines if dealing with a multiline pattern space), and it's
    >trivial to to set case-insensitivity for either an entire regex
    >("/your_regex_here/i") or a small portion of it
    >("/your_(?i:regex)_here/").
    
    You are correct. What you're saying is that with sufficiently
    energetic application of duct tape spit and baling wire you
    can build Notre Dame Cathedral. I'll grant you that. But I think
    that developing vastly superior syntax(es) and approaches for
    the kind of parsing needed for log analysis would take much less
    time than even bothering to understand the regexps and overcome
    their flaws. :) You can do gotos in regexps, too! Which means
    you can implement a turing computer in regexps. So you can parse
    anything with a sufficiently butt-ugly regexp and enough tape.
    But why bother!?
    
    I didn't even go into the question of parsing binary data, which
    would be darned useful for any advanced log parser...
    
    If the industry is going to make big strides in log parsing
    it's got to be sh&t simple to write new parse rules for new
    log messages as they appear. 
    
    >(Let me
    >again iterate that I am NOT advocating a pure regex implementation.
    >I've seen attempts at that, and they make my stomach churn.))
    
    I know you're not. :) And I'm not trying to be deliberately a pain
    in the neck on this issue - I just fear that a lot of people who
    look at the log parsing problem immediately reach for duct tape,
    spit, and baling wire and start hammering nails without thinking
    the problem through. And - in this environment - perl regexps appear
    to be the most popular form of duct tape. ;) It's a shame to think
    that lots of people are going to burn lots of brain cycles
    re-implementing the same things that don't work very well when
    it'd be really straightforward for someone to blast a single
    bullet through the whole problem.
    
    mjr.
    ---
    Marcus J. Ranum				http://www.ranum.com
    Computer and Communications Security	mjrat_private
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    



    This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 12:41:28 PDT