Sweth Chandramouli wrote: >few enough people really understand how they work >that, beyond a certain level of complexity, they invariably get them >wrong in one way or another. That said, I don't have the same religious >objections to regexes that you seem to[...] I agree with you that people invariably get them wrong!! That's not a "religious" objection - that's a perfectly legitimate and rational objection. Indeed, all my objections to regexps are based on technical rationales that may be externally assessed. When someone claims something is a matter of "religion" they are implying that it's an argument based on faith or imaginary playmates or other metaphysical attributes, not on something that can be externally measured. My comments about regexps were not motivated by theistic dogma, or dogma of any sort whatsoever - they were motivated by a hell of a lot of experience with parsing and writing regexps and an appreciation of the strengths and weaknesses of the technique. :) Anyhow - I know you didn't intend to hit a hot button but calling something "religious" is a hefty insult where I come from. ;) >> The approach I was working on relied on correct matching of >> combinations of space and non-space. Regexps are really a pain >> in the butt if you want to match on whitespace. You need to use >> something like: " *" oops wait there could be "[ \t]*" and oops >> you can't handle newlines right... Eeeew... Regexps are a good >> tool for simple searching - they're not a good tool for simple >> parsing. > Here's where we disagree most. For simple parsing, I >think there's little better than a well-understood regex engine. OK, fine. We're not doing simple parsing. :) > The >two I mentioned earlier are steller for things like you are describing, >with macros like "\s" to match whitespace (and a flag to allow that to >include newlines if dealing with a multiline pattern space), and it's >trivial to to set case-insensitivity for either an entire regex >("/your_regex_here/i") or a small portion of it >("/your_(?i:regex)_here/"). You are correct. What you're saying is that with sufficiently energetic application of duct tape spit and baling wire you can build Notre Dame Cathedral. I'll grant you that. But I think that developing vastly superior syntax(es) and approaches for the kind of parsing needed for log analysis would take much less time than even bothering to understand the regexps and overcome their flaws. :) You can do gotos in regexps, too! Which means you can implement a turing computer in regexps. So you can parse anything with a sufficiently butt-ugly regexp and enough tape. But why bother!? I didn't even go into the question of parsing binary data, which would be darned useful for any advanced log parser... If the industry is going to make big strides in log parsing it's got to be sh&t simple to write new parse rules for new log messages as they appear. >(Let me >again iterate that I am NOT advocating a pure regex implementation. >I've seen attempts at that, and they make my stomach churn.)) I know you're not. :) And I'm not trying to be deliberately a pain in the neck on this issue - I just fear that a lot of people who look at the log parsing problem immediately reach for duct tape, spit, and baling wire and start hammering nails without thinking the problem through. And - in this environment - perl regexps appear to be the most popular form of duct tape. ;) It's a shame to think that lots of people are going to burn lots of brain cycles re-implementing the same things that don't work very well when it'd be really straightforward for someone to blast a single bullet through the whole problem. mjr. --- Marcus J. Ranum http://www.ranum.com Computer and Communications Security mjrat_private --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 12:41:28 PDT