[logs] Bayes - good or bad?

From: Anton A. Chuvakin (anton@private)
Date: Wed Feb 23 2005 - 12:07:46 PST


All,

I figured I would come out of hibernation with this fun inquiry: what's
the overall opinion of the list of 'going Bayesian' on logs. Sure, it
works for spam, but log challenges are a pretty different beast.

I've been playing with my reiplementation of Marcus Ranum's fnort, and it
seems that the only way to get good sensible results out of it is to have
good training data. As you can guess, the above is just another way of
saying that "it doesn't work" :-)

If I separate log lines into good and bad (easy, huh...) and then feed
them line by line into Bayesian classifier (such as bogofilter) for
training, and then stuff an unknown sample into it, I only get the lines
equal to whatever was bad classified as bad. E.g. if 'ssh auth failed' was
in a 'known bad' sample, bogofilter will mark them as bad in the unknown
sample. In other words, the results are the same as with a simple pattern
matching.

Any other experiences? Ideas? Comments?

Best,
-- 
Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA
     http://www.info-secure.org
   http://www.securitywarrior.com

_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis



This archive was generated by hypermail 2.1.3 : Wed Feb 23 2005 - 12:37:44 PST