RE: [logs] Bayes - good or bad?

stuart@private

My general opinion on learning in IDS/IPS/SEM systems is it can be useful,
but you have to know what you are learning and why.  It has to be in the
context of a model of what the underlying variable thing is, and what
directions it might vary and not vary.  Completely abstract learning
technologies aren't terribly useful because the underlying data is not
stationary.

(This is true of human learning too - we come into the world wired to
believe that the world is made of 3D moving object, faces, languages with
subject-object-action grammars, etc.  We just learn details to flesh out the
general scheme.  Similarly, security monitoring systems need to come into
the world knowing about worms, viruses, portscans, attacks, normal behavior
of protocols etc, and just learn details of how normal behavior is at some
site in order to better distinguish the bad from the good).

I think also the experience of the community has been that the specific
nature of the learning technology (Bayesian, neural nets, decision trees,
etc) is of less importance than the choice of features to do inference on.

Stuart.

Stuart Staniford, Principal Scientist
Nevis Networks
stuart@private
408-327-4652

> -----Original Message-----
> From: 
> loganalysis-bounces+stuart=nevisnetworks.com@private 
> [mailto:loganalysis-bounces+stuart=nevisnetworks.com@private
> oo.com] On Behalf Of Anton A. Chuvakin
> Sent: Wednesday, February 23, 2005 12:08 PM
> To: loganalysis@private
> Subject: [logs] Bayes - good or bad?
> 
> 
> All,
> 
> I figured I would come out of hibernation with this fun 
> inquiry: what's
> the overall opinion of the list of 'going Bayesian' on logs. Sure, it
> works for spam, but log challenges are a pretty different beast.
> 
> I've been playing with my reiplementation of Marcus Ranum's 
> fnort, and it
> seems that the only way to get good sensible results out of 
> it is to have
> good training data. As you can guess, the above is just another way of
> saying that "it doesn't work" :-)
> 
> If I separate log lines into good and bad (easy, huh...) and then feed
> them line by line into Bayesian classifier (such as bogofilter) for
> training, and then stuff an unknown sample into it, I only 
> get the lines
> equal to whatever was bad classified as bad. E.g. if 'ssh 
> auth failed' was
> in a 'known bad' sample, bogofilter will mark them as bad in 
> the unknown
> sample. In other words, the results are the same as with a 
> simple pattern
> matching.
> 
> Any other experiences? Ideas? Comments?
> 
> Best,
> -- 
> Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA
>      http://www.info-secure.org
>    http://www.securitywarrior.com
> 
> _______________________________________________
> LogAnalysis mailing list
> LogAnalysis@private
> http://lists.shmoo.com/mailman/listinfo/loganalysis
> 

_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis