>Yes, to start with it's no better than pattern matching. But over time
>will it save work in maintaining the patterns and detecting new issues?

Why would it? Admittedly, the best way to verify it is to try it on a
large scale, diligently training the system on good/bad logs. The only
advantage I suspect will realize is that you might avoid the pain of
writing regexes for all the _variations_ of a 'known bad' thing.
Hopefully, the Bayesian classifier will figure it out on its own :-)

>Normalization of the data might also help. Making sure that the input is more
>similar each time by adding fields. Like if some syslog messages don't have a
>year, put one in, etc.

Removing the date and [in some cases] the host name helps as does message
type classification of some kind.  On the other hand, mentioning
'normalization' and 'syslog' in the same sentence is kind of blasphemy
:-), since the data is so unstructured. But since Bayes methods deal with
spam successfully, unstructured logs should not be a problem from that
point of view. However, unlike spam, logs are not always good or bad and
explaining the concept of 'attention-worthy' to the classifier program
seems pretty impossible :-)

