RE: A question for the list...

From: Jonathan A. Zdziarski (jonathanat_private)
Date: Thu May 29 2003 - 07:19:57 PDT

  • Next message: David Gillett: "RE: DDoS Attack"

    Hi,
    
    Your arguments are definitely valid, but what I hear you saying is that spam
    filters generally do a poor job, and you're right...most do.  There are only
    a few that adequately filter based on content.  Approach is definitely an
    important factor, but my point is that a good spam filter is far more
    effective than network-based management.  Static filters that search for
    certain phrases are useless for the reasons you mentioned, however dynamic
    filtering is much more accurate.  Most Bayesian filters, while your mileage
    may vary, rarely have more than 0.05% risk of false positives - and that's
    at the upper boundary.  SpamAssassin seems to have around 0.04% and DSPAM
    ranges from 0.01% to 0.03%.  
    
    > From an empiric scalability perspective, spam is 'lots of the same thing',
    or bulk messages,
    
    Out of the "lots", however, your network may only get hit with a few copies
    of the message at a time, making detection less accurate.  
    
    > If you content filter the Postmaster mailbox along with your other
    mailboxes 
    
    So don't content-filter the postmaster box =)  This is another good reason
    that network-based spam filtering isn't necessarily a good idea.  Many
    blackholes are already very inaccurate, so adding spam-filtering ISPs to the
    list certainly isn't good for anyone.  But please keep in mind spam was a
    muzzle-loader when this RFC was written, and is now more of an assault
    weapon.  I think trying to place a single human element into the equation
    for spam filtering still results in the same effect - stolen resources.  
    
    > Content filtering can also be bad because of context - it has been seen to
    reject discussions
    > of: chicken breasts and thighs; Breast Cancer; Erectile Dysfunction; and
    objectionable email.
    
    Agreed, we need to make sure our content filters aren't this dumb.  Julia
    Childs needs to be able to discuss her chicken breasts, and lawyers need to
    discuss their erectile dysfunction...but if I receive any emails for either
    they will most likely be spam.  I believe strongly in per-user corpus-based
    filtering for this very reason.  The DSPAM project maintains a separate
    dictionary for each user based on their email behavior, which is one reason
    it's so effective at what it does.
    
    > While I use content-based rules (if you can call header fields content) to
    process some of my 
    > email, those rules only serve to sort and categorize my email, not to
    reject it.  
    
    I agree, I think filtering based on "Characteristics of Spam" is generally
    bad, because characteristics change.  A great example is the MUA.  Tools
    like SpamAssassin will make an email "more innocent" if the MUA is pine...so
    what did spammers do?  Started using a pine MUA.  Headers change, and many
    spammers are smart enough to send from valid stockpiled domains...the one
    thing that never changes, however, is the content of the message.
    
    
    
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    



    This archive was generated by hypermail 2b30 : Thu May 29 2003 - 08:14:11 PDT