Re: [logs] Bayes - good or bad?

From: John Reuning (john@private)
Date: Wed Feb 23 2005 - 13:12:29 PST


I did something similar a short while ago.  Software used for the
experiment processed syslog log files.  It classified log messages as
anomalous based on TF-IDF weighting of terms in syslog log files.  

http://www.ibiblio.org/john/pubs/johnreuning_sils_unc.pdf
This document has pretty graphs. :-)
http://www.ibiblio.org/john/statlog/

The results were what you'd expect from a generic term weight approach
-- okay but not great.  The problem in syslog messages is that
independent terms (non-numeric tokens, by my definition) aren't great
discriminators.  They're either very common (login, logout) or almost
random (timestamps, tcp sequence numbers).  I've wondered if results
would be improved by adding what you described: 1) relevance feedback
for known good or bad messages and 2) manual classification and
weighting of terms.

Thanks,

-jrr

On Wed, 2005-02-23 at 15:07 -0500, Anton A. Chuvakin wrote:
> All,
> 
> I figured I would come out of hibernation with this fun inquiry: what's
> the overall opinion of the list of 'going Bayesian' on logs. Sure, it
> works for spam, but log challenges are a pretty different beast.
> 
> I've been playing with my reiplementation of Marcus Ranum's fnort, and it
> seems that the only way to get good sensible results out of it is to have
> good training data. As you can guess, the above is just another way of
> saying that "it doesn't work" :-)
> 
> If I separate log lines into good and bad (easy, huh...) and then feed
> them line by line into Bayesian classifier (such as bogofilter) for
> training, and then stuff an unknown sample into it, I only get the lines
> equal to whatever was bad classified as bad. E.g. if 'ssh auth failed' was
> in a 'known bad' sample, bogofilter will mark them as bad in the unknown
> sample. In other words, the results are the same as with a simple pattern
> matching.
> 
> Any other experiences? Ideas? Comments?
> 
> Best,
_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Thu Feb 24 2005 - 10:11:05 PST