Re: SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)

From: Jacob Redding (dextor@private)
Date: Wed Jun 04 2003 - 00:09:42 PDT

  • Next message: Crispin Cowan: "Re: SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)"

      that's hilarious!!! ;)
    
    On Wed, 4 Jun 2003, SPAM/PORN FILTER wrote:
    
    > This note has been flagged as
    > 	Likely PORN
    > 	Possibly SPAM
    >
    > The following test(s) were positive
    > 	the word(s)
    > 		'penis'
    > 	followed by the phrase(s)
    > 		'move', 'back and forth across', and 'forcing'
    >
    >         Canadian Grammar/Phrasing/Spelling
    >
    > The results of the test(s) show that this is
    > 	77% likely PORN (23/30)
    > 	81% likely SPAM (27/33)
    >
    >
    > On Tue, 2003-06-03 at 13:39, Crispin Cowan wrote:
    > > Shaun Savage wrote:
    > >
    > > > It looks at raw text. The tokens are found using a fixed set of
    > > > delimiters.  The reason for this is the mozilla spam filter uses the
    > > > html tags to help determine spam, alot of spam uses 'color' font.  Also
    > > > ~ one of the delimiters is '<' '>'  so it can't determine what is a html
    > > > tag.
    > >
    > > Thanks!
    > >
    > > Unfortunate that it is only looking at raw text. There is valuable info
    > > in the formatted text, precisely because of this hack of splitting words
    > > with HTML comments, so that word-recognizing filters like Bayes won't
    > > recognize "pe<!-- interruption -->nis" as "penis". The spammer can move
    > > the interruption back and forth across the word, put arbitrarily clean
    > > text (e.g. from project Gutenberg) in the "interruption", forcing 10X
    > > training time on the Bayesian filter.
    > >
    > > Crispin
    >
    >
    



    This archive was generated by hypermail 2b30 : Wed Jun 04 2003 - 01:57:22 PDT