SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)

From: SPAM/PORN FILTER (spam_porn_filter@private)
Date: Wed Jun 04 2003 - 00:36:13 PDT

  • Next message: Jacob Redding: "Re: SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)"

    This note has been flagged as 
    	Likely PORN
    	Possibly SPAM
    
    The following test(s) were positive
    	the word(s)
    		'penis' 
    	followed by the phrase(s) 
    		'move', 'back and forth across', and 'forcing'
    
            Canadian Grammar/Phrasing/Spelling
    
    The results of the test(s) show that this is
    	77% likely PORN (23/30)
    	81% likely SPAM (27/33)
    
    
    On Tue, 2003-06-03 at 13:39, Crispin Cowan wrote:
    > Shaun Savage wrote:
    > 
    > > It looks at raw text. The tokens are found using a fixed set of
    > > delimiters.  The reason for this is the mozilla spam filter uses the
    > > html tags to help determine spam, alot of spam uses 'color' font.  Also
    > > ~ one of the delimiters is '<' '>'  so it can't determine what is a html
    > > tag. 
    > 
    > Thanks!
    > 
    > Unfortunate that it is only looking at raw text. There is valuable info 
    > in the formatted text, precisely because of this hack of splitting words 
    > with HTML comments, so that word-recognizing filters like Bayes won't 
    > recognize "pe<!-- interruption -->nis" as "penis". The spammer can move 
    > the interruption back and forth across the word, put arbitrarily clean 
    > text (e.g. from project Gutenberg) in the "interruption", forcing 10X 
    > training time on the Bayesian filter.
    > 
    > Crispin
    



    This archive was generated by hypermail 2b30 : Wed Jun 04 2003 - 01:22:46 PDT