All, I figured I would come out of hibernation with this fun inquiry: what's the overall opinion of the list of 'going Bayesian' on logs. Sure, it works for spam, but log challenges are a pretty different beast. I've been playing with my reiplementation of Marcus Ranum's fnort, and it seems that the only way to get good sensible results out of it is to have good training data. As you can guess, the above is just another way of saying that "it doesn't work" :-) If I separate log lines into good and bad (easy, huh...) and then feed them line by line into Bayesian classifier (such as bogofilter) for training, and then stuff an unknown sample into it, I only get the lines equal to whatever was bad classified as bad. E.g. if 'ssh auth failed' was in a 'known bad' sample, bogofilter will mark them as bad in the unknown sample. In other words, the results are the same as with a simple pattern matching. Any other experiences? Ideas? Comments? Best, -- Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA http://www.info-secure.org http://www.securitywarrior.com _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Wed Feb 23 2005 - 12:37:44 PST