>Yes, to start with it's no better than pattern matching. But over time >will it save work in maintaining the patterns and detecting new issues? Why would it? Admittedly, the best way to verify it is to try it on a large scale, diligently training the system on good/bad logs. The only advantage I suspect will realize is that you might avoid the pain of writing regexes for all the _variations_ of a 'known bad' thing. Hopefully, the Bayesian classifier will figure it out on its own :-) >Normalization of the data might also help. Making sure that the input is more >similar each time by adding fields. Like if some syslog messages don't have a >year, put one in, etc. Removing the date and [in some cases] the host name helps as does message type classification of some kind. On the other hand, mentioning 'normalization' and 'syslog' in the same sentence is kind of blasphemy :-), since the data is so unstructured. But since Bayes methods deal with spam successfully, unstructured logs should not be a problem from that point of view. However, unlike spam, logs are not always good or bad and explaining the concept of 'attention-worthy' to the classifier program seems pretty impossible :-) Best, -- Anton A. Chuvakin, Ph.D., GCIA, GCIH, GCFA http://www.info-secure.org http://www.securitywarrior.com _______________________________________________ LogAnalysis mailing list LogAnalysis@private http://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2.1.3 : Mon Mar 21 2005 - 13:02:54 PST