Re: [logs] Bayes - good or bad?

Previous message: Rainer Gerhards: "[logs] off-topic: syslog in Wikipedia"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

anton@private

>Yes, to start with it's no better than pattern matching. But over time
>will it save work in maintaining the patterns and detecting new issues?

Why would it? Admittedly, the best way to verify it is to try it on a
large scale, diligently training the system on good/bad logs. The only
advantage I suspect will realize is that you might avoid the pain of
writing regexes for all the _variations_ of a 'known bad' thing.
Hopefully, the Bayesian classifier will figure it out on its own :-)

>Normalization of the data might also help. Making sure that the input is more
>similar each time by adding fields. Like if some syslog messages don't have a
>year, put one in, etc.

Removing the date and [in some cases] the host name helps as does message
type classification of some kind.  On the other hand, mentioning
'normalization' and 'syslog' in the same sentence is kind of blasphemy
:-), since the data is so unstructured. But since Bayes methods deal with
spam successfully, unstructured logs should not be a problem from that
point of view. However, unlike spam, logs are not always good or bad and
explaining the concept of 'attention-worthy' to the classifier program
seems pretty impossible :-)

Best,
-- 
Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA
     http://www.info-secure.org
   http://www.securitywarrior.com

_______________________________________________
LogAnalysis mailing list
LogAnalysis@private
http://lists.shmoo.com/mailman/listinfo/loganalysis