Re: HTML email "bug", of sorts.

From: Sean Straw / PSE (PSE-Lat_private)
Date: Mon Aug 20 2001 - 21:41:24 PDT

  • Next message: Dmitriy Kropivnitskiy: "Re: Multiple-Vendor-FTP-Vuln. (old?)"

    At 15:33 2001-08-20 -0600, Bear Giles wrote:
    
    >1) run them through a simple filter for image tags.  With regex,
    >the pattern could be as simple as "<img ([^>]+)>", case insensitive.
    >You might need to include some backslash quotes.
    
    .. which immediatley screws up _CODE_ embedded into messages.  "Here, joe, 
    the solution to the niggling problem is to replace the code in somefunction 
    with  <img src..."
    
    KLUNK.  This method would have broken valid code - code which may be 
    expected to be copied and pasted as-is.
    
    >For everything that matches, look for any height and width attributes
    >for the image.  If it's 1, you have a web bug.  Even if it's 2-8 or so,
    >it's probably still a web bug.
    
    And for code embedded in valid pages, it may not be.  How about for images 
    without explicit height and width elements - many clients don't show a 
    preview, or at least show an outline (even on single pixel images) that 
    this wouldn't matter in email.  In fact, the 'web bug' could just as easily 
    be a *REGULAR GRAPHIC* (such as a horizontal rule), since you're viewing 
    HTML email, and by the time you realize an image is being loaded - whether 
    it is visible or not - the request has already been made.
    
    >Either comment it out or delete it.  The latter may be preferable
    >if don't want to break scripts.
    
    Now you're stuck needing to match brackets, which very likely will not work 
    properly the instant you receive a quoted message:
    
     > the tag <img src="some tag"
     > height="1" width="1">
    
    Where does the IMG SRC closing bracket appear when you're using a simple 
    regexp?  What if the second line doesn't appear?
    
    Arguably, if the message body is HTML, the MIME type should indicate as 
    much, there should be an opening HTML tag (but there might not be, and 
    email HTML renderers are pretty lax with this), and gt and lt's that aren't 
    part of the HTML coding of the page would be properly escaped.  Then again, 
    what stops the spammer from obfuscating their code in the same way?  Try 
    embedding ORDINALS in your page, and a good HTML renderer will render it 
    fine, but most regexps will fail to find a match (I use ordinals to 
    "mailfuscate" mailto urls and even non-URL plaintext email addresses on all 
    of my webpages - it significantly reduces spam which arrives from 
    web-spidering spambots).
    
    Besides BGSOUND, page backgrounds and even TABLE backgrounds could utilize 
    an embedded image, in which case, you won't even see it as an IMG SRC 
    tag.  Suddenly, your filter needs to fully parse HTML in order to have a 
    prayer of stripping these tags.
    
    Which makes blocking (via RBL, etc) and effectively filtering spam a pretty 
    darn good solution.
    
    
    Someone mentioned having a port-80 filter on your firewall -- what of dot 
    trackers which reference a specific port number?
    
             <img src="http://www.somesite.com:110/dot_tracker.file?uniqueid">
    
    Anyone running a firewall would probably block certain services -- but all 
    the spammer has to do is run their tracking system on a port for a standard 
    service which a mail client would be expected to access, and that 
    firewalling isn't going to do you much (unless your firewall only allows 
    access for POP3 (110) out to one specific server - joe user is unlikely to 
    configure their machine this way, joe poweruser probably won't because they 
    have multiple accounts, and joe corporateadmin won't because too many users 
    check their various mail accounts from the office, and limiting them in 
    this fashion would be too grievous).
    
    
    Sorry if I've pointed out another exploit that the spammers could use to 
    circumvent such firewall rules.
    
    ---
      Please DO NOT carbon me on list replies.  I'll get my copy from the list.
    
      Sean B. Straw / Professional Software Engineering
      Post Box 2395 / San Rafael, CA  94912-2395
    



    This archive was generated by hypermail 2b30 : Tue Aug 21 2001 - 09:30:48 PDT