Re: [logs] Bayes - good or bad?

From: Anton A. Chuvakin (anton@private)
Date: Mon Mar 21 2005 - 10:03:41 PST

>Yes, to start with it's no better than pattern matching. But over time
>will it save work in maintaining the patterns and detecting new issues?

Why would it? Admittedly, the best way to verify it is to try it on a
large scale, diligently training the system on good/bad logs. The only
advantage I suspect will realize is that you might avoid the pain of
writing regexes for all the _variations_ of a 'known bad' thing.
Hopefully, the Bayesian classifier will figure it out on its own :-)

>Normalization of the data might also help. Making sure that the input is more
>similar each time by adding fields. Like if some syslog messages don't have a
>year, put one in, etc.

Removing the date and [in some cases] the host name helps as does message
type classification of some kind.  On the other hand, mentioning
'normalization' and 'syslog' in the same sentence is kind of blasphemy
:-), since the data is so unstructured. But since Bayes methods deal with
spam successfully, unstructured logs should not be a problem from that
point of view. However, unlike spam, logs are not always good or bad and
explaining the concept of 'attention-worthy' to the classifier program
seems pretty impossible :-)

Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA

LogAnalysis mailing list

This archive was generated by hypermail 2.1.3 : Mon Mar 21 2005 - 13:02:54 PST