Re: [logs] Generic Log Message Parsing Tool

From: Marcus J. Ranum (photonerdat_private)
Date: Tue Jun 11 2002 - 15:12:05 PDT

  • Next message: Michael Katz: "Re: [logs] nimda web server logs"

    Sweth Chandramouli wrote:
    >        The trick to this is that the abstract values usually
    >have to be context-dependent--the values that you extract from, say,
    >a sendmail message are different the ones you extract from a BIND
    >message.
    
    You're right. Unfortunately, you want _some_ to be context dependent
    and some not to be. Paul Robertson and I spent a couple days and
    luncheons arguing about this one a few months ago, and concluded
    that basically you can't make everyone (or every process) happy, and have
    to make some WAGs as you do the parse.
    
    For example, part of the strength of a log analysis system would be
    its ability to correlate common values. I.e.: to recognize that the source
    IP address of a message is the same in 13 messages that happened
    within .4 seconds of eachother. That's interesting. But it implies that
    you can't nail every data item down to something very specific - unless
    you get into typed values, in which case you have a much bigger
    parsing problem. :(  I don't think this is cleanly solveable because there
    needed to have been a good logging standard written around 1981 and
    there wasn't.
    
    Consider if you have all your messages try to stick a value as appropriate
    into each output called:
    alert_severity=
    that's cool because you can now do something useful with alert_severity
    matching. But what if you've got an alert that already has a severity?
    Do you map it to alert_severity - what if it's scaled wrong? That doesn't
    fly so you now have:
    alert_severity=6
    firewall_alert_severity=65
    oops. :(  I think you need to be draconian.
    
    Let's try a worse example: Let's do what Paul and I were doing and
    have 2 values:
    target_address=
    source_address=
    now, everything _tries_ to map something that makes sense to target_address
    and source_address. That would be cool because now you can search for
    instances where event1:target_address==eventN:target_address or even
    target_address=source_address or whatever. So there's clear value in not
    making things too specific. You could say:
    sendmail_target_address=
    telnet_target_address=
    but then you're too specific. what we did was define a bunch of things we
    figured you could probably munge just about anything into, and then the
    munger could always use specific_application_values as it saw fit for the
    app specific data. That actually works nicely. Consider:
    
    target_address=some ip...
    source_address=some ip...
    target_path=http://some/url/or/other
    target_app=httpd
    source_app=unknown
    httpd_method=GET
    httpd_user=whatever
    
    So the parser did its best to munge things down to a _SMALL_ common
    data dictionary but allowed lots of room for specific prefixed stuff. You could
    get all the app specific stuff by just pulling all the things prefixed by the app
    name.
    
    Typing is a special kind of hell we avoided. We figured it's useful if you're trying
    to correlate when there are ambiguous values. I.e.:
    (ip_addr)httpd_remote=
    (ip_addr)source_address=
    
    then you can try to correlate on common values of similar types. Except
    that then you have to correctly parse and munge data into correct types
    and that'd be a heck of a lot more work than just trying to match strings.
    Of course then you can match based on type conversion (i.e.:
    (ip_addr)source_address=16.67.32.100
    (ip_addr)source_address=burfle.dco.dec.com
    
    (I forget if those addresses are the same anymore but it's an implied
    DNS lookup)
    
    >  Trying to force all of your messages to fit the same arbitrary
    >data structure will just cause you headaches
    
    I actually think that the headaches of doing procrustean translation
    are smaller than the headaches of trying to have an open-ended data
    structure. I think procrustean translation with the ability to put app-specific
    data in a form that is clearly separate is probably a good way to go.
    
    mjr.
    ---
    Marcus J. Ranum			Computer and communications Security
    mjrat_private		http://www.ranum.com
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: loganalysis-unsubscribeat_private
    For additional commands, e-mail: loganalysis-helpat_private
    



    This archive was generated by hypermail 2b30 : Tue Jun 11 2002 - 16:42:58 PDT