[ISN] The perils of Googling

From: InfoSec News (isn@private)
Date: Wed Mar 10 2004 - 05:20:41 PST

  • Next message: InfoSec News: "[ISN] Inside the DoD's crime lab"

    http://www.theregister.co.uk/content/55/36142.html
    
    [Its worthwhile to visit the story on the Register's website for links 
    to Google on various search terms.  - WK]
    
    
    By Scott Granneman
    SecurityFocus
    Posted: 10/03/2004 
    
    Google is in many ways most dangerous website on the Internet for
    thousands of individuals and organisations, writes SecurityFocus
    columnist Scott Granneman. Most computers users still have no idea
    that they may be revealing far more to the world than they would want.
    
    I'm not putting down Google. Far from it: it's a great search engine,
    and I use it all the time. I couldn't do my many jobs without Google,
    so I've spent some time learning how to maximize its value, how to
    find exactly what I want, how to plumb its depths to find just the
    right nugget of information that I need. In the same way that Google
    can be used for good, though, it can also be used by malevolent
    individuals to root out vulnerabilities, discover passwords and other
    sensitive data, and in general find out way more about systems than
    they need to know. And, of course, Google's not the only game in town
    - but it is certainly the biggest, the most widely-used, and in many
    ways the easiest to use.
    
    
    Throwing back the curtain
    
    Most people just head to Google, type in the words they're looking
    for, and hit Google Search. Some more knowledgeable folks know that
    they put quotation marks around phrases, or put a "+" in front of
    required words or a "-" in front of words that should not appear, or
    even use Boolean search terms like AND, OR, and NOT. Greater Google
    aficionados know about Google's Advanced Search page, where you get
    really specific.
    
    The page that Google provides for its Advanced Search is nice, and
    it's certainly easy and full of necessary tips, but if you really want
    to master all the tricks that Google offers the dedicated searcher,
    you need to learn at least some of what is detailed on the Google
    Advanced Search Operators page. For instance, let's say you just type
    the word "budget" into a Google search box, without the quotation
    marks. You're going to get over 11,000,000 hits, so many that it would
    take a tremendously long time to find anything troublesome from a
    security perspective.
    
    Now try that same search, but include the search operator "filetype"  
    along with it. Using the filetype operator, you can specify the kind
    of file you're looking for. Google's Advanced Search page lists
    several common formats, including Microsoft Word, Microsoft Excel, and
    Adobe Acrobat PDF, but you actually search for far more than those.  
    Let's change our search from just "budget" to "budget filetype:xls"  
    (again without the quotes; in fact, just ignore the quotation marks
    unless I mention otherwise) and see what we get.
    
    
    63,000 hits and counting
    
    Hmmm ... now we're down to 63,000 hits. Still an overwhelming number,
    but if you start looking through the first couple of pages, you'll
    notice some items of interest if you were an attacker looking for
    information you shouldn't have. Let's add another operator into the
    mix.
    
    The "site" operator allows you to narrow down your results to a
    particular subdomain, a second-level domain, or even a top-level
    domain. For instance, if you wanted to find out what Google has
    indexed at SecurityFocus on the topic of password cracking, try this
    search: "site:www.securityfocus.com password cracking", which gives
    you 449 results. I often use this trick even when a site provides its
    own search engine, as Google's index is often far better than the
    search that many sites include.
    
    Let's try our search, but stick to the .edu top-level domain, so we're
    looking for "budget filetype:xls site:edu". 15,200 hits. Not bad.  
    Things are starting to look very interesting.
    
    Let's introduce another tool into your toolbox: the ability to look
    only on pages that use a certain word or words in their title by
    incorporating the "intitle" operator into your search. At
    SecurityFocus, this query would narrow our results list down to only
    five, an incredible tightening of our search:  
    "site:www.securityfocus.com intitle:password cracking" (note that
    "password" is the only word that must be in the title; "cracking"  
    should appear on the page as a search term, but not in the title,
    since I didn't place "intitle:" prior to it).
    
    
    Enter the bad guys
    
    Bad guys know about the "intitle" operator, but they know something
    else that makes it even more powerful. Often Web servers are left
    configured to list the contents of directories if there is no default
    Web page in those directories; on top of that, those directories often
    contain lots of stuff that the website owners don't actually want to
    be on the Web. That makes such directory lists prime targets for
    snoopers. The title of these directory listings almost always start
    with "Index of", so let's try a new query that I guarantee will
    generate results that should make you sit up and worry:  
    "intitle:"index of" site:edu password". 2,940 results, and many, if
    not most, would be completely useless to a potential attacker. Many,
    however, would yield passwords in plain text, while others could be
    cracked using common tools like Crack and John the Ripper.
    
    There are other operators, but these should be enough to make the
    picture clear. Once you start to think about it, the potentially
    troublesome words and phrases that can be searched for and leveraged
    should begin to multiply in your mind: passwd. htpasswd. accounts.  
    users.pwd. web_store.cgi. finances. admin. secret. fpadmin.htm. credit
    card. ssn. And so on. Heck, even "robots.txt" would be useful: after
    all, if someone doesn't want search engines to find the stuff listed
    in robots.txt, that stuff could very well be worth a look. Remember,
    robots.txt just indicates that the website doesn't want search engines
    to index the files and folders listed in robots.txt; nothing
    inherently stops users from accessing that content once they know it
    exists.
    
    
    Sensitive information
    
    A couple of websites have even sprung up dedicated to listing words
    and phrases that reveal sensitive information and vulnerabilities. My
    favorite of these, Googledorks, is a treasure trove of ideas for the
    budding attacker. As a protective countermeasure, all security pros
    should visit this site and try out some of the suggestions on the
    sites that they oversee or with whom they consult. With a little elbow
    grease, some Perl, and the Google Web API, you could write scripts
    that would automate the process and generate some nice reports that
    you could show to your clients. Of course, so could the bad guys...  
    except I don't think your clients will ever see those reports, just
    the end results.
    
    Even the Google cache can aid in exposing holes in systems. Couple the
    operators outlined above with Google's cache, which can provide you
    with a look at files that have changed or been removed, and attackers
    have an incredibly powerful tool at their disposal.
    
    
    Responses
    
    As I said at the beginning of this column, the fact that it is
    actually quite easy to find dangerous information using just a search
    engine and some intelligent guesses is not exactly news to people who
    think about security professionally. But I'm afraid that there are
    many uneducated folks putting content onto Web servers that they think
    is hidden to the world, when it is in reality anything but.
    
    We have two seemingly opposite problems at work here: simplicity and
    complexity. On the one hand, it has become very easy for non-technical
    users to post content onto Web servers, sometimes without realizing
    that they're in fact placing that content on a Web server. It has even
    become easier to Web-enable databases, which has led in one case to
    the exposure of a database containing the records of a medical
    college's patients (and by the way, the search terms discussed in that
    article are still very much active at Google, one year later).
    
    Even when people do understand that their content is about to go onto
    the Web, many do not fully think through what they're about to post.  
    They don't examine that content in light of a few simple questions:  
    How could this information be used against me? Or my organisation? And
    should this even go on the Web in the first place?
    
    Well, of course ordinary users don't think to ask these questions!  
    They're just interested in getting their content out there, and most
    of the time are just pleased as punch that they could publish on the
    Web in the first place. Critically examining that content for security
    vulnerabilities is not something they've been trained to do.
    
    
    Points of failure
    
    On the other side of the coin we have complexity. For all the ease
    that has come about in the past several years, no matter how simple it
    has become for Bob in Marketing to publish the company's public sales
    figures online, the fact remains that we're dealing with complex
    systems that have many, many points of potential failure. That
    knowledge scares the hell out of the people who live security, while
    Bob goes blithely on successfully publishing the company's public
    sales figures ... and accidentally publishing the spreadsheet
    containing the company's top customers, complete with contact info,
    sales figures, and notes about who the salespeople think are good for
    a few thousand more this year.
    
    For instance, FrontPage is touted by Microsoft as an extremely
    simple-to-use Web publishing solution that enables users to "move
    files easily between local and remote locations and publish in both
    directions". Unfortunately for those average Joes who buy into the
    hype, FrontPage is still a very complicated program that can easily
    expose passwords and other sensitive data if it is not administered
    correctly. Don't believe me? Just search Google for "_vti_pvt password
    intitle:index.of" and take a look at what you find.
    
    FrontPage is not the only offender, but it is certainly an easy one to
    find in abundance on our favourite search engine. Now think about all
    the other programs out there that people are using every day. Personal
    Web servers that come with operating systems. Turnkey shopping cart
    software. Web-enabled Access databases. The list goes on and on. Take
    a moment and start to think about the organisations you oversee. See
    the list of potential problems tumble off into infinity. Oy.
    
    Sure, it's possible for the folks creating Web content to tell Google
    and other search engines not to index that content. O'Reilly's website
    has a marvellous short piece titled "Removing Your Materials From
    Google" that should be required reading for anyone who even thinks
    about putting anything on or even near a Web server. Of course, as I
    mentioned above, relying on robots.txt to protect sensitive content is
    a bit like putting a sign up saying "Please ignore the expensive
    jewels hidden inside this shack". But at least it will get folks
    thinking.
    
    
    Understand the threat
    
    And really, that's what it comes down to: we have to get folks
    thinking. Sure, those of us responsible for security can try to shut
    everything down and turn everything off that could pose a threat - and
    we should, within reason. But those pesky users are going to do their
    job: use the systems we provide them, and some we don't provide. We
    need to help them understand the threats that any Web-enabled
    technology can provide.
    
    Print out this column and hand it out. Show them how easy it is to
    find sensitive content online. Talk to them about appropriate and
    inappropriate content. Try to get them on your side so they trust you
    and come to you with requests for help beforehand instead of coming to
    you after the fact, when it's too late and the toothpaste is out of
    the tube. Finally, realise that humans have an innate need to
    communicate and will seize on any tool to do so, and if that means
    talking to your users and setting up a wiki or bulletin board or other
    collaborate tool, then do so.
    
    Google and other search tools have made the world available to us all,
    if we just know what to ask for. It's our job as security pros to help
    make the folks we work and interact with aware of that fact, in all of
    its far-reaching ramifications.
    
    -=-
    
    Scott Granneman is a senior consultant for Bryan Consulting Inc. in
    St. Louis. He specializes in Internet Services and developing Web
    applications for corporate, educational, and institutional clients.
    
    
    
    -
    ISN is currently hosted by Attrition.org
    
    To unsubscribe email majordomo@private with 'unsubscribe isn'
    in the BODY of the mail.
    



    This archive was generated by hypermail 2b30 : Wed Mar 10 2004 - 08:21:13 PST