My general opinion on learning in IDS/IPS/SEM systems is it can be useful, but you have to know what you are learning and why. It has to be in the context of a model of what the underlying variable thing is, and what directions it might vary and not vary. Completely abstract learning technologies aren't terribly useful because the underlying data is not stationary. (This is true of human learning too - we come into the world wired to believe that the world is made of 3D moving object, faces, languages with subject-object-action grammars, etc. We just learn details to flesh out the general scheme. Similarly, security monitoring systems need to come into the world knowing about worms, viruses, portscans, attacks, normal behavior of protocols etc, and just learn details of how normal behavior is at some site in order to better distinguish the bad from the good). I think also the experience of the community has been that the specific nature of the learning technology (Bayesian, neural nets, decision trees, etc) is of less importance than the choice of features to do inference on. Stuart. Stuart Staniford, Principal Scientist Nevis Networks stuart@private 408-327-4652 > -----Original Message----- > From: > loganalysis-bounces+stuart=nevisnetworks.com@private > [mailto:loganalysis-bounces+stuart=nevisnetworks.com@private > oo.com] On Behalf Of Anton A. Chuvakin > Sent: Wednesday, February 23, 2005 12:08 PM > To: loganalysis@private > Subject: [logs] Bayes - good or bad? > > > All, > > I figured I would come out of hibernation with this fun > inquiry: what's > the overall opinion of the list of 'going Bayesian' on logs. Sure, it > works for spam, but log challenges are a pretty different beast. > > I've been playing with my reiplementation of Marcus Ranum's > fnort, and it > seems that the only way to get good sensible results out of > it is to have > good training data. As you can guess, the above is just another way of > saying that "it doesn't work" :-) > > If I separate log lines into good and bad (easy, huh...) and then feed > them line by line into Bayesian classifier (such as bogofilter) for > training, and then stuff an unknown sample into it, I only > get the lines > equal to whatever was bad classified as bad. E.g. if 'ssh > auth failed' was > in a 'known bad' sample, bogofilter will mark them as bad in > the unknown > sample. In other words, the results are the same as with a > simple pattern > matching. > > Any other experiences? Ideas? Comments? > > Best, > -- > Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA > http://www.info-secure.org > http://www.securitywarrior.com > > _______________________________________________ > LogAnalysis mailing list > LogAnalysis@private > http://lists.shmoo.com/mailman/listinfo/loganalysis > _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Mon Feb 28 2005 - 19:37:01 PST