Sweth Chandramouli wrote: > The trick to this is that the abstract values usually >have to be context-dependent--the values that you extract from, say, >a sendmail message are different the ones you extract from a BIND >message. You're right. Unfortunately, you want _some_ to be context dependent and some not to be. Paul Robertson and I spent a couple days and luncheons arguing about this one a few months ago, and concluded that basically you can't make everyone (or every process) happy, and have to make some WAGs as you do the parse. For example, part of the strength of a log analysis system would be its ability to correlate common values. I.e.: to recognize that the source IP address of a message is the same in 13 messages that happened within .4 seconds of eachother. That's interesting. But it implies that you can't nail every data item down to something very specific - unless you get into typed values, in which case you have a much bigger parsing problem. :( I don't think this is cleanly solveable because there needed to have been a good logging standard written around 1981 and there wasn't. Consider if you have all your messages try to stick a value as appropriate into each output called: alert_severity= that's cool because you can now do something useful with alert_severity matching. But what if you've got an alert that already has a severity? Do you map it to alert_severity - what if it's scaled wrong? That doesn't fly so you now have: alert_severity=6 firewall_alert_severity=65 oops. :( I think you need to be draconian. Let's try a worse example: Let's do what Paul and I were doing and have 2 values: target_address= source_address= now, everything _tries_ to map something that makes sense to target_address and source_address. That would be cool because now you can search for instances where event1:target_address==eventN:target_address or even target_address=source_address or whatever. So there's clear value in not making things too specific. You could say: sendmail_target_address= telnet_target_address= but then you're too specific. what we did was define a bunch of things we figured you could probably munge just about anything into, and then the munger could always use specific_application_values as it saw fit for the app specific data. That actually works nicely. Consider: target_address=some ip... source_address=some ip... target_path=http://some/url/or/other target_app=httpd source_app=unknown httpd_method=GET httpd_user=whatever So the parser did its best to munge things down to a _SMALL_ common data dictionary but allowed lots of room for specific prefixed stuff. You could get all the app specific stuff by just pulling all the things prefixed by the app name. Typing is a special kind of hell we avoided. We figured it's useful if you're trying to correlate when there are ambiguous values. I.e.: (ip_addr)httpd_remote= (ip_addr)source_address= then you can try to correlate on common values of similar types. Except that then you have to correctly parse and munge data into correct types and that'd be a heck of a lot more work than just trying to match strings. Of course then you can match based on type conversion (i.e.: (ip_addr)source_address=16.67.32.100 (ip_addr)source_address=burfle.dco.dec.com (I forget if those addresses are the same anymore but it's an implied DNS lookup) > Trying to force all of your messages to fit the same arbitrary >data structure will just cause you headaches I actually think that the headaches of doing procrustean translation are smaller than the headaches of trying to have an open-ended data structure. I think procrustean translation with the ability to put app-specific data in a form that is clearly separate is probably a good way to go. mjr. --- Marcus J. Ranum Computer and communications Security mjrat_private http://www.ranum.com --------------------------------------------------------------------- To unsubscribe, e-mail: loganalysis-unsubscribeat_private For additional commands, e-mail: loganalysis-helpat_private
This archive was generated by hypermail 2b30 : Tue Jun 11 2002 - 16:42:58 PDT