[logs] Re: regex-less parsing of messages

From: Rainer Gerhards (rgerhards@private)
Date: Thu Dec 08 2005 - 23:48:12 PST


Well,

I didn't jump on this thread because I do not have a method that would
relief me of knowing how to parse. If I see how things have evolved now,
my information might be helpful anyhow. In my paper

http://www.monitorware.com/en/workinprogress/nature-of-syslog-data.php

I describe how to use patterns/templates for raw logs that translates
them into "higher-level" objects. These higher level objects are
generalized. The analysis algorithm works on them. So while you still
need to know the lower level formats, it is much easier to adopt to
changing formats and new devices - at least in theory. We are currently
implementing this approach in a product. When I have some experience
from the outcome, I can report (if there is interest in that).

I personally believe it is not possible - with current technology - to
process logs without knowledge of their format. But maybe (hopefully ;))
I am wrong. I am most interested in any generic approach.

Rainer 

> -----Original Message-----
> From: 
> loganalysis-bounces+rgerhards=hq.adiscon.com@private 
> [mailto:loganalysis-bounces+rgerhards=hq.adiscon.com@private
> oo.com] On Behalf Of Anton Chuvakin
> Sent: Friday, December 09, 2005 12:44 AM
> To: LogAnalysis@private
> Subject: [logs] Re: regex-less parsing of messages
> 
> All,
> 
> So, it looks like this discussion did generate some solutions and very
> cool ideas!
> 
> Here is my summary with comments, of sorts.
> 
> 1. If parsing/tokenizing is hard, wait for the XML standard to emerge.
> And then things will be easy indeed. There is definite value in this
> one... a humorous value :-)
> 2. Don't tokenize, there are good (?) tools to analyzes logs without
> it (BTW, by tokenizing I meant not only 'splitting things', but also
> naming/categorizing the resulting  chunks)
> 3. I am surprised that nobody picked up on the 'can we solve this
> problem if you have  a lot of similar log data to look at' Clustering
> and similar approaches seem, IMHO, "almost doable."
> 
> I also wanted to comment on analysis methods that do not rely on
> tokenized logs.  I agree that they can solve some problems (mentioned
> in this thread), but I suspect that they will hit a sturdy wall in
> some others.  For example, I do not think that tracing an email
> message thru logs from multiple diverse devices can be solved without
> understanding each device log format. Similarly, rule-based
> correlation approaches require knowledge of the nature of specific log
> fields (such as source, destination, table name, etc)
> 
> And, finally, I wanted to address this one:
> 
> >I see it, we'll be stuck with "expert systems" for a while - 
> the market for
> >log analysis software is not that rich to justify the type 
> of investments
> >required to keep a couple of Ph.D's on your payroll.
> Hmmm, what makes you say so? Some solutions that help with logs are
> not exactly bargain priced, if you know what I mean :-) I do not think
> that the market is small, considering that *everybody* has the 'log
> problem' to some extent... And the problem can only become worse, thus
> increasing the market and providing jobs for those Ph.D.s :-)
> 
> Next, I will launch something on the use of data mining for 
> logs...stand by :-)
> 
> Best,
> --
> Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA
>          http://www.chuvakin.org
>     http://www.securitywarrior.com
> _______________________________________________
> LogAnalysis mailing list
> LogAnalysis@private
> http://lists.shmoo.com/mailman/listinfo/loganalysis
> 
_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Fri Dec 09 2005 - 00:23:11 PST