Sweth Chandramouli wrote: > A Parse::RecDescent grammar in Perl is almost exactly that. >Case in point, here's a quick P::RD grammar I came up with the other day >to parse shell globbing syntax: > > globstring: token(s) > token: metastring | globchar | literal > metastring: escaped_char | character_class > escaped_char: /\\./ > character_class: openbracket > [ negation | internal_bracket ] > class_token(s) > closebracket > openbracket: /\[/ > negation: /!/ > internal_bracket: /[][]/ > class_token: escaped_char | character_range | class_char > character_range: /.-./ > class_char: /[^]]/ > closebracket: /]/ > globchar: asterisk | questionmark > asterisk: '*' > questionmark: '?' > literal: /./ This is pretty cool!!! I think its a bit low-level for log parsing but that may just be off the cuff without sufficient thought. The first thing I noticed when I was writing Fargo was that many (but sadly not all!) log messages have invariant formatted data at the beginning of each line. On most UNIX boxes that's a date/time string. Following that - on many UNIX boxes - comes a machine name/program name/PID and then the real meat. I'd been figuring one would want to parse on the order of: headerstuff: datestamp machinename: programname '[' pid ']' datestamp machinename: datestamp programname '[' pid ']' oops. Now we can't be sure what's what because the layout of machinname and programname is O/S dependent and since they are arbitrary strings we'd need to either parse by programname (EEK!) or have a system/version switch inside. Ugh! :( I'd been working it so that the Fargo rules could be loaded with OS-specific info to help the parser - that way you could tell the parser which rules to load or not to load based on clues from the environment. This would be a pain with systems forwarding syslogs between various UNIX flavors... > . No left recursion, which I think you were worried about >in a different post, and the regexes could just as easily be replaced >with string matches if you really wanted to; I'll address the concerns >about regexes in another post, however. :) Left recursion was what scared me. I'm not sure you need it to do left-to-right log parsing. Do you? If left recursion is omitted you can play the trick I talked about in my first posting, where you make the evaluator build a prefix tree. Heck, you could actually build the prefix tree at a character level and then just parse character by character like a regexp evaluator does inside!!! :) :) mjr. --- Marcus J. Ranum http://www.ranum.com Computer and Communications Security mjrat_private --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 13:13:00 PDT