On Tue, 20 Aug 2002 nateat_private wrote: > The SA developers have a huge corpus of SPAM and "not-SPAM" messages > that they apply a genetic algorithm to (not my term) which ends up with > patterns that describe SPAM and non-SPAM messages. I suspect many of you > are reading this message right now with a "X-Spam-Status:" header in it, > SA is a highly effective tool. i'll just chime in a bit: SA is ok. but it's not terribly bright, either, and unless you tweak it a lot you foul up a lot. instead consider a Baysean model. i've been digging around in spam a lot lately and have come to loathe signature matching models like SA and prefer statistical methods like Baysean stuff. some links: http://www.ai.mit.edu/~jrennie/ifile/ small tool to do spam filtering using this. http://www.paulgraham.com/spam.html another discussion on this follow the links off of those two pages. lots to read. applying this to logs would be easy, robust, and extendable. ___________________________ jose nazario, ph.d. joseat_private http://www.monkey.org/~jose/ _______________________________________________ LogAnalysis mailing list LogAnalysisat_private https://lists.shmoo.com/mailman/listinfo/loganalysis
This archive was generated by hypermail 2b30 : Tue Aug 20 2002 - 13:50:32 PDT