Re: [logs] Re: Generic Log Message Parsing Tool

From: Marcus J. Ranum (mjrat_private)
Date: Wed Jun 05 2002 - 13:00:30 PDT

  • Next message: Marcus J. Ranum: "Re: [logs] Re: Generic Log Message Parsing Tool"

    Sweth Chandramouli wrote:
    >        A Parse::RecDescent grammar in Perl is almost exactly that.
    >Case in point, here's a quick P::RD grammar I came up with the other day
    >to parse shell globbing syntax:
    >
    >      globstring:         token(s)
    >      token:              metastring | globchar | literal
    >      metastring:         escaped_char | character_class
    >      escaped_char:       /\\./
    >      character_class:    openbracket
    >                          [ negation | internal_bracket ]
    >                          class_token(s)
    >                          closebracket
    >      openbracket:        /\[/
    >      negation:           /!/
    >      internal_bracket:   /[][]/
    >      class_token:        escaped_char | character_range | class_char
    >      character_range:    /.-./
    >      class_char:         /[^]]/
    >      closebracket:       /]/
    >      globchar:           asterisk | questionmark
    >      asterisk:           '*'
    >      questionmark:       '?'
    >      literal:            /./
    
    This is pretty cool!!! I think its a bit low-level for log
    parsing but that may just be off the cuff without sufficient
    thought.
    
    The first thing I noticed when I was writing Fargo was that
    many (but sadly not all!) log messages have invariant formatted
    data at the beginning of each line. On most UNIX boxes that's a
    date/time string. Following that - on many UNIX boxes - comes
    a machine name/program name/PID and then the real meat. I'd
    been figuring one would want to parse on the order of:
    
    headerstuff:
            datestamp machinename: programname '[' pid ']'
            datestamp machinename:
            datestamp programname '[' pid ']'
    
    oops. Now we can't be sure what's what because the layout of
    machinname and programname is O/S dependent and since they are
    arbitrary strings we'd need to either parse by programname (EEK!)
    or have a system/version switch inside. Ugh! :(
    
    I'd been working it so that the Fargo rules could be loaded with
    OS-specific info to help the parser - that way you could tell the
    parser which rules to load or not to load based on clues from the
    environment. This would be a pain with systems forwarding
    syslogs between various UNIX flavors...
    
    >        .  No left recursion, which I think you were worried about
    >in a different post, and the regexes could just as easily be replaced
    >with string matches if you really wanted to; I'll address the concerns
    >about regexes in another post, however.  :)
    
    Left recursion was what scared me. I'm not sure you need it to do
    left-to-right log parsing. Do you? If left recursion is omitted you
    can play the trick I talked about in my first posting, where you make
    the evaluator build a prefix tree. Heck, you could actually build the
    prefix tree at a character level and then just parse character by character
    like a regexp evaluator does inside!!! :)  :)
    
    mjr.
    ---
    Marcus J. Ranum				http://www.ranum.com
    Computer and Communications Security	mjrat_private
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    



    This archive was generated by hypermail 2b30 : Wed Jun 05 2002 - 13:13:00 PDT