[Politech] Google's SafeSearch is overzealous, blocks innocuous domains [fs]

From: Declan McCullagh (declan@private)
Date: Fri Apr 23 2004 - 06:22:52 PDT

  • Next message: Declan McCullagh: "[Politech] Verifying overblocking by Google's SafeSearch [fs]"

    Google's chastity belt too tight
    Last modified: April 23, 2004, 4:00 AM PDT
    By Declan McCullagh
    Staff Writer, CNET News.com
    PartsExpress.com proudly touts itself as the Net's No. 1 source for 
    audio, video and speaker components--but online shoppers who rely on an 
    optional feature in the Google search engine to block porn sites would 
    never know it.
    By an accident of spelling, the domain name of the Ohio electronics 
    retailer includes an unfortunate string of letters, "sex," which is 
    enough to block the Web site from Google's filtered results.
    PartsExpress.com is not alone. A CNET News.com investigation shows that 
    Google's SafeSearch filter technology incorrectly blocks many innocuous 
    Web sites based solely on strings of letters such as "sex," "girls" or 
    "porn" embedded in their domain names.
    Google's SafeSearch flaws are more than academic--they can have serious 
    consequences for innocent Web site operators blocked out by them. Google 
    is the most widely used search engine on the Web, and failure to appear 
    in its listings can have a direct impact on sales for some companies, 
    particularly smaller enterprises with limited marketing budgets.
    Research company WebSideStory reported last month that Google claimed an 
    all-time high in search referrals, 41 percent of the United States 
    total, and the search giant's market share is steadily expanding.
    "Traffic from Google can make or break a business," said Maria Medina, 
    whose family-run clothing business at ALittleGirlsBoutique.com doesn't 
    pass the SafeSearch censor. "Here I am, a mom of four children, creating 
    an at-home business that sells little girl dresses and accessories, in 
    order to spend more time with my children, and I have been filtered out 
    as not being family friendly. Ridiculous."
    Matt Cutts, the Google engineer who designed SafeSearch four years ago, 
    said his algorithm looks for a "relatively small" number of trigger 
    words in a Web page's address. If one of those words appears, the 
    SafeSearch algorithm puts the address on a block list and does not take 
    the next step of evaluating the content of the site. "We try to find the 
    best trade-off of precision, recall and safety," Cutts said. "People who 
    opt in to SafeSearch are mostly OK with us being on the conservative side."
    Cutts would not disclose how many Web searches are done with SafeSearch 
    enabled, saying only that it's a small percentage of the millions of 
    queries handled by Google each day. But the sloppy filter stands out as 
    a rare black eye for a company that prides itself on superior search 
    technology and boasts on its payroll one of the world's highest 
    concentrations of computer science doctoral degrees. Google claims 
    SafeSearch "uses advanced proprietary technology that checks keywords 
    and phrases" and filters out only Web pages "containing pornography and 
    explicit sexual content."
    "That's not very bright," said Karen Schneider, a librarian who runs the 
    Librarians' Index to the Internet and has made a study of filtering 
    software. SafeSearch is "certainly evocative of the very primitive 
    CyberSitter-type tools of the mid-1990s--not a tool of fairly 
    sophisticated development."
    [...remainder snipped...]
    Politech mailing list
    Archived at http://www.politechbot.com/
    Moderated by Declan McCullagh (http://www.mccullagh.org/)

    This archive was generated by hypermail 2b30 : Fri Apr 23 2004 - 06:46:09 PDT