All, So, it looks like this discussion did generate some solutions and very cool ideas! Here is my summary with comments, of sorts. 1. If parsing/tokenizing is hard, wait for the XML standard to emerge. And then things will be easy indeed. There is definite value in this one... a humorous value :-) 2. Don't tokenize, there are good (?) tools to analyzes logs without it (BTW, by tokenizing I meant not only 'splitting things', but also naming/categorizing the resulting chunks) 3. I am surprised that nobody picked up on the 'can we solve this problem if you have a lot of similar log data to look at' Clustering and similar approaches seem, IMHO, "almost doable." I also wanted to comment on analysis methods that do not rely on tokenized logs. I agree that they can solve some problems (mentioned in this thread), but I suspect that they will hit a sturdy wall in some others. For example, I do not think that tracing an email message thru logs from multiple diverse devices can be solved without understanding each device log format. Similarly, rule-based correlation approaches require knowledge of the nature of specific log fields (such as source, destination, table name, etc) And, finally, I wanted to address this one: >I see it, we'll be stuck with "expert systems" for a while - the market for >log analysis software is not that rich to justify the type of investments >required to keep a couple of Ph.D's on your payroll. Hmmm, what makes you say so? Some solutions that help with logs are not exactly bargain priced, if you know what I mean :-) I do not think that the market is small, considering that *everybody* has the 'log problem' to some extent... And the problem can only become worse, thus increasing the market and providing jobs for those Ph.D.s :-) Next, I will launch something on the use of data mining for logs...stand by :-) Best, -- Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA http://www.chuvakin.org http://www.securitywarrior.com _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Thu Dec 08 2005 - 16:36:24 PST