Re: [logs] Generic Log Message Parsing Tool

From: Marcus J. Ranum (photonerdat_private)
Date: Tue Jun 11 2002 - 15:12:05 PDT

Next message: Michael Katz: "Re: [logs] nimda web server logs"

Previous message: Tina Bird: "Re: [logs] nimda web server logs"
In reply to: Sweth Chandramouli: "Re: [logs] Generic Log Message Parsing Tool"
Next in thread: Russell Fulton: "Re: [logs] Generic Log Message Parsing Tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sweth Chandramouli wrote:
>        The trick to this is that the abstract values usually
>have to be context-dependent--the values that you extract from, say,
>a sendmail message are different the ones you extract from a BIND
>message.

You're right. Unfortunately, you want _some_ to be context dependent
and some not to be. Paul Robertson and I spent a couple days and
luncheons arguing about this one a few months ago, and concluded
that basically you can't make everyone (or every process) happy, and have
to make some WAGs as you do the parse.

For example, part of the strength of a log analysis system would be
its ability to correlate common values. I.e.: to recognize that the source
IP address of a message is the same in 13 messages that happened
within .4 seconds of eachother. That's interesting. But it implies that
you can't nail every data item down to something very specific - unless
you get into typed values, in which case you have a much bigger
parsing problem. :(  I don't think this is cleanly solveable because there
needed to have been a good logging standard written around 1981 and
there wasn't.

Consider if you have all your messages try to stick a value as appropriate
into each output called:
alert_severity=
that's cool because you can now do something useful with alert_severity
matching. But what if you've got an alert that already has a severity?
Do you map it to alert_severity - what if it's scaled wrong? That doesn't
fly so you now have:
alert_severity=6
firewall_alert_severity=65
oops. :(  I think you need to be draconian.

Let's try a worse example: Let's do what Paul and I were doing and
have 2 values:
target_address=
source_address=
now, everything _tries_ to map something that makes sense to target_address
and source_address. That would be cool because now you can search for
instances where event1:target_address==eventN:target_address or even
target_address=source_address or whatever. So there's clear value in not
making things too specific. You could say:
sendmail_target_address=
telnet_target_address=
but then you're too specific. what we did was define a bunch of things we
figured you could probably munge just about anything into, and then the
munger could always use specific_application_values as it saw fit for the
app specific data. That actually works nicely. Consider:

target_address=some ip...
source_address=some ip...
target_path=http://some/url/or/other
target_app=httpd
source_app=unknown
httpd_method=GET
httpd_user=whatever

So the parser did its best to munge things down to a _SMALL_ common
data dictionary but allowed lots of room for specific prefixed stuff. You could
get all the app specific stuff by just pulling all the things prefixed by the app
name.

Typing is a special kind of hell we avoided. We figured it's useful if you're trying
to correlate when there are ambiguous values. I.e.:
(ip_addr)httpd_remote=
(ip_addr)source_address=

then you can try to correlate on common values of similar types. Except
that then you have to correctly parse and munge data into correct types
and that'd be a heck of a lot more work than just trying to match strings.
Of course then you can match based on type conversion (i.e.:
(ip_addr)source_address=16.67.32.100
(ip_addr)source_address=burfle.dco.dec.com

(I forget if those addresses are the same anymore but it's an implied
DNS lookup)

>  Trying to force all of your messages to fit the same arbitrary
>data structure will just cause you headaches

I actually think that the headaches of doing procrustean translation
are smaller than the headaches of trying to have an open-ended data
structure. I think procrustean translation with the ability to put app-specific
data in a form that is clearly separate is probably a good way to go.

mjr.
---
Marcus J. Ranum			Computer and communications Security
mjrat_private		http://www.ranum.com

---------------------------------------------------------------------
To unsubscribe, e-mail: loganalysis-unsubscribeat_private
For additional commands, e-mail: loganalysis-helpat_private

Next message: Michael Katz: "Re: [logs] nimda web server logs"
Previous message: Tina Bird: "Re: [logs] nimda web server logs"
In reply to: Sweth Chandramouli: "Re: [logs] Generic Log Message Parsing Tool"
Next in thread: Russell Fulton: "Re: [logs] Generic Log Message Parsing Tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Tue Jun 11 2002 - 16:42:58 PDT