Re: SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)

From: Crispin Cowan (crispin@private)
Date: Wed Jun 04 2003 - 01:38:36 PDT

  • Next message: Justin Kurynny: "CRIME MPLS Guru?"

    I can't actually tell whether this is a joke or not. In either case, 
    it's funny :-)
    
    Crispin
    
    SPAM/PORN FILTER wrote:
    
    >This note has been flagged as 
    >	Likely PORN
    >	Possibly SPAM
    >
    >The following test(s) were positive
    >	the word(s)
    >		'penis' 
    >	followed by the phrase(s) 
    >		'move', 'back and forth across', and 'forcing'
    >
    >        Canadian Grammar/Phrasing/Spelling
    >
    >The results of the test(s) show that this is
    >	77% likely PORN (23/30)
    >	81% likely SPAM (27/33)
    >
    >
    >On Tue, 2003-06-03 at 13:39, Crispin Cowan wrote:
    >  
    >
    >>Shaun Savage wrote:
    >>
    >>    
    >>
    >>>It looks at raw text. The tokens are found using a fixed set of
    >>>delimiters.  The reason for this is the mozilla spam filter uses the
    >>>html tags to help determine spam, alot of spam uses 'color' font.  Also
    >>>~ one of the delimiters is '<' '>'  so it can't determine what is a html
    >>>tag. 
    >>>      
    >>>
    >>Thanks!
    >>
    >>Unfortunate that it is only looking at raw text. There is valuable info 
    >>in the formatted text, precisely because of this hack of splitting words 
    >>with HTML comments, so that word-recognizing filters like Bayes won't 
    >>recognize "pe<!-- interruption -->nis" as "penis". The spammer can move 
    >>the interruption back and forth across the word, put arbitrarily clean 
    >>text (e.g. from project Gutenberg) in the "interruption", forcing 10X 
    >>training time on the Bayesian filter.
    >>
    >>Crispin
    >>    
    >>
    >
    >  
    >
    
    -- 
    Crispin Cowan, Ph.D.           http://immunix.com/~crispin/
    Chief Scientist, Immunix       http://immunix.com
                http://www.immunix.com/shop/
    



    This archive was generated by hypermail 2b30 : Wed Jun 04 2003 - 02:17:23 PDT