Re: CRIME Interesting way around spam filter

From: Crispin Cowan (crispin@private)
Date: Tue Jun 03 2003 - 13:39:49 PDT

Next message: Andrew Plato: "RE: CRIME Port scanning from an ISP"

Previous message: Shaun Savage: "Re: CRIME Interesting way around spam filter"
In reply to: Shaun Savage: "Re: CRIME Interesting way around spam filter"
Next in thread: SPAM/PORN FILTER: "SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)"
Reply: SPAM/PORN FILTER: "SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)"
Reply: Alan: "Re: CRIME Interesting way around spam filter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Shaun Savage wrote:

> It looks at raw text. The tokens are found using a fixed set of
> delimiters.  The reason for this is the mozilla spam filter uses the
> html tags to help determine spam, alot of spam uses 'color' font.  Also
> ~ one of the delimiters is '<' '>'  so it can't determine what is a html
> tag. 

Thanks!

Unfortunate that it is only looking at raw text. There is valuable info 
in the formatted text, precisely because of this hack of splitting words 
with HTML comments, so that word-recognizing filters like Bayes won't 
recognize "penis" as "penis". The spammer can move 
the interruption back and forth across the word, put arbitrarily clean 
text (e.g. from project Gutenberg) in the "interruption", forcing 10X 
training time on the Bayesian filter.

Crispin

-- 
Crispin Cowan, Ph.D.           http://immunix.com/~crispin/
Chief Scientist, Immunix       http://immunix.com
            http://www.immunix.com/shop/

Next message: Andrew Plato: "RE: CRIME Port scanning from an ISP"
Previous message: Shaun Savage: "Re: CRIME Interesting way around spam filter"
In reply to: Shaun Savage: "Re: CRIME Interesting way around spam filter"
Next in thread: SPAM/PORN FILTER: "SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)"
Reply: SPAM/PORN FILTER: "SPAM/PORN DETECTED (was Re: CRIME Interesting way around spam filter)"
Reply: Alan: "Re: CRIME Interesting way around spam filter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Tue Jun 03 2003 - 14:12:36 PDT