[logs] Bayes - good or bad?

Previous message: Erik Norman: "[logs] Datagram SyslogAgent 3.0 released"
In reply to: Daniel Cid: "[logs] OsAudit v0.1 (log gathering, monitoring and analysis) available."
Next in thread: John Reuning: "Re: [logs] Bayes - good or bad?"
Reply: John Reuning: "Re: [logs] Bayes - good or bad?"
Reply: Jian Zhen: "Re: [logs] Bayes - good or bad?"
Reply: Stuart Staniford: "RE: [logs] Bayes - good or bad?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

anton@private

All,

I figured I would come out of hibernation with this fun inquiry: what's
the overall opinion of the list of 'going Bayesian' on logs. Sure, it
works for spam, but log challenges are a pretty different beast.

I've been playing with my reiplementation of Marcus Ranum's fnort, and it
seems that the only way to get good sensible results out of it is to have
good training data. As you can guess, the above is just another way of
saying that "it doesn't work" :-)

If I separate log lines into good and bad (easy, huh...) and then feed
them line by line into Bayesian classifier (such as bogofilter) for
training, and then stuff an unknown sample into it, I only get the lines
equal to whatever was bad classified as bad. E.g. if 'ssh auth failed' was
in a 'known bad' sample, bogofilter will mark them as bad in the unknown
sample. In other words, the results are the same as with a simple pattern
matching.

Any other experiences? Ideas? Comments?

Best,
-- 
Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA
     http://www.info-secure.org
   http://www.securitywarrior.com

_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis