RE: A question for the list...

From: Jonathan A. Zdziarski (jonathanat_private)
Date: Thu May 29 2003 - 07:19:57 PDT

Next message: David Gillett: "RE: DDoS Attack"

Previous message: Rob Shein: "RE: A question for the list..."
In reply to: Jeff: "Re: A question for the list..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

Your arguments are definitely valid, but what I hear you saying is that spam
filters generally do a poor job, and you're right...most do.  There are only
a few that adequately filter based on content.  Approach is definitely an
important factor, but my point is that a good spam filter is far more
effective than network-based management.  Static filters that search for
certain phrases are useless for the reasons you mentioned, however dynamic
filtering is much more accurate.  Most Bayesian filters, while your mileage
may vary, rarely have more than 0.05% risk of false positives - and that's
at the upper boundary.  SpamAssassin seems to have around 0.04% and DSPAM
ranges from 0.01% to 0.03%.  

> From an empiric scalability perspective, spam is 'lots of the same thing',
or bulk messages,

Out of the "lots", however, your network may only get hit with a few copies
of the message at a time, making detection less accurate.  

> If you content filter the Postmaster mailbox along with your other
mailboxes 

So don't content-filter the postmaster box =)  This is another good reason
that network-based spam filtering isn't necessarily a good idea.  Many
blackholes are already very inaccurate, so adding spam-filtering ISPs to the
list certainly isn't good for anyone.  But please keep in mind spam was a
muzzle-loader when this RFC was written, and is now more of an assault
weapon.  I think trying to place a single human element into the equation
for spam filtering still results in the same effect - stolen resources.  

> Content filtering can also be bad because of context - it has been seen to
reject discussions
> of: chicken breasts and thighs; Breast Cancer; Erectile Dysfunction; and
objectionable email.

Agreed, we need to make sure our content filters aren't this dumb.  Julia
Childs needs to be able to discuss her chicken breasts, and lawyers need to
discuss their erectile dysfunction...but if I receive any emails for either
they will most likely be spam.  I believe strongly in per-user corpus-based
filtering for this very reason.  The DSPAM project maintains a separate
dictionary for each user based on their email behavior, which is one reason
it's so effective at what it does.

> While I use content-based rules (if you can call header fields content) to
process some of my 
> email, those rules only serve to sort and categorize my email, not to
reject it.  

I agree, I think filtering based on "Characteristics of Spam" is generally
bad, because characteristics change.  A great example is the MUA.  Tools
like SpamAssassin will make an email "more innocent" if the MUA is pine...so
what did spammers do?  Started using a pine MUA.  Headers change, and many
spammers are smart enough to send from valid stockpiled domains...the one
thing that never changes, however, is the content of the message.



----------------------------------------------------------------------------
----------------------------------------------------------------------------

Next message: David Gillett: "RE: DDoS Attack"
Previous message: Rob Shein: "RE: A question for the list..."
In reply to: Jeff: "Re: A question for the list..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Thu May 29 2003 - 08:14:11 PDT