Re: [logs] Logging: World Domination

From: Jose Nazario (joseat_private)
Date: Tue Aug 20 2002 - 12:13:31 PDT

  • Next message: bretwatsonat_private: "Re: [logs] What's normal?"

    On Tue, 20 Aug 2002 nateat_private wrote:
    
    > The SA developers have a huge corpus of SPAM and "not-SPAM" messages
    > that they apply a genetic algorithm to (not my term) which ends up with
    > patterns that describe SPAM and non-SPAM messages. I suspect many of you
    > are reading this message right now with a "X-Spam-Status:" header in it,
    > SA is a highly effective tool.
    
    i'll just chime in a bit:
    
    SA is ok. but it's not terribly bright, either, and unless you tweak it a
    lot you foul up a lot.
    
    instead consider a Baysean model. i've been digging around in spam a lot
    lately and have come to loathe signature matching models like SA and
    prefer statistical methods like Baysean stuff.
    
    some links:
    	http://www.ai.mit.edu/~jrennie/ifile/
    		small tool to do spam filtering using this.
    	http://www.paulgraham.com/spam.html
    		another discussion on this
    
    follow the links off of those two pages. lots to read.
    
    applying this to logs would be easy, robust, and extendable.
    
    ___________________________
    jose nazario, ph.d.			joseat_private
    					http://www.monkey.org/~jose/
    
    _______________________________________________
    LogAnalysis mailing list
    LogAnalysisat_private
    https://lists.shmoo.com/mailman/listinfo/loganalysis
    



    This archive was generated by hypermail 2b30 : Tue Aug 20 2002 - 13:50:32 PDT