Re: [logs] Re: Generic Log Message Parsing Tool

From: Marcus J. Ranum (mjrat_private)
Date: Wed Jun 05 2002 - 12:36:26 PDT

Next message: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"

Previous message: Dale.Drewat_private: "RE: [logs] Re: Generic Log Message Parsing Tool"
In reply to: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Next in thread: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Next in thread: Marcus J. Ranum: "Re: [logs] Re: Generic Log Message Parsing Tool"
Reply: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sweth Chandramouli wrote:
>few enough people really understand how they work
>that, beyond a certain level of complexity, they invariably get them
>wrong in one way or another.  That said, I don't have the same religious
>objections to regexes that you seem to[...]

I agree with you that people invariably get them wrong!! That's
not a "religious" objection - that's a perfectly legitimate and
rational objection. Indeed, all my objections to regexps are
based on technical rationales that may be externally assessed.
When someone claims something is a matter of "religion" they
are implying that it's an argument based on faith or imaginary
playmates or other metaphysical attributes, not on something that
can be externally measured. My comments about regexps were not
motivated by theistic dogma, or dogma of any sort whatsoever -
they were motivated by a hell of a lot of experience with parsing
and writing regexps and an appreciation of the strengths and
weaknesses of the technique. :) Anyhow - I know you didn't
intend to hit a hot button but calling something "religious" is
a hefty insult where I come from. ;)

>> The approach I was working on relied on correct matching of
>> combinations of space and non-space. Regexps are really a pain
>> in the butt if you want to match on whitespace. You need to use
>> something like: " *" oops wait there could be "[ \t]*" and oops
>> you can't handle newlines right... Eeeew...  Regexps are a good
>> tool for simple searching - they're not a good tool for simple
>> parsing.
>        Here's where we disagree most.  For simple parsing, I
>think there's little better than a well-understood regex engine.

OK, fine. We're not doing simple parsing. :)

>  The
>two I mentioned earlier are steller for things like you are describing,
>with macros like "\s" to match whitespace (and a flag to allow that to
>include newlines if dealing with a multiline pattern space), and it's
>trivial to to set case-insensitivity for either an entire regex
>("/your_regex_here/i") or a small portion of it
>("/your_(?i:regex)_here/").

You are correct. What you're saying is that with sufficiently
energetic application of duct tape spit and baling wire you
can build Notre Dame Cathedral. I'll grant you that. But I think
that developing vastly superior syntax(es) and approaches for
the kind of parsing needed for log analysis would take much less
time than even bothering to understand the regexps and overcome
their flaws. :) You can do gotos in regexps, too! Which means
you can implement a turing computer in regexps. So you can parse
anything with a sufficiently butt-ugly regexp and enough tape.
But why bother!?

I didn't even go into the question of parsing binary data, which
would be darned useful for any advanced log parser...

If the industry is going to make big strides in log parsing
it's got to be sh&t simple to write new parse rules for new
log messages as they appear. 

>(Let me
>again iterate that I am NOT advocating a pure regex implementation.
>I've seen attempts at that, and they make my stomach churn.))

I know you're not. :) And I'm not trying to be deliberately a pain
in the neck on this issue - I just fear that a lot of people who
look at the log parsing problem immediately reach for duct tape,
spit, and baling wire and start hammering nails without thinking
the problem through. And - in this environment - perl regexps appear
to be the most popular form of duct tape. ;) It's a shame to think
that lots of people are going to burn lots of brain cycles
re-implementing the same things that don't work very well when
it'd be really straightforward for someone to blast a single
bullet through the whole problem.

mjr.
---
Marcus J. Ranum				http://www.ranum.com
Computer and Communications Security	mjrat_private

---------------------------------------------------------------------
To unsubscribe, e-mail: loganalysis-unsubscribeat_private
For additional commands, e-mail: loganalysis-helpat_private

Next message: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Previous message: Dale.Drewat_private: "RE: [logs] Re: Generic Log Message Parsing Tool"
In reply to: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Next in thread: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Next in thread: Marcus J. Ranum: "Re: [logs] Re: Generic Log Message Parsing Tool"
Reply: Sweth Chandramouli: "Re: [logs] Re: Generic Log Message Parsing Tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 12:41:28 PDT