Possible DOS against search engines?

From: Philip Stoev (philipat_private)
Date: Mon Feb 03 2003 - 02:33:38 PST

  • Next message: Rob Shein: "RE: Possible DOS against search engines?"

    Hello.
    
    I would like to gather opinion if the scenario described below is feasible
    or not. Please excuse me if I am talking nonsense.
    
    1. You create a generator for fake web pages, whose purpose is to spit out
    HTML containing a huge amount of (pseudo) random _non-existing_ words, as
    well as links to other pages within the generator;
    
    2. You place that generator somewhere and submit the URL to search engines
    for crawling;
    
    3. The search engines then crawls the site, possibly reaching their
    pre-defined maximum of crawling depth (or, if badly broken, crawl the site
    indefinitely, jumping from one freshly generated page to another);
    
    4. Upon adding the gathered words to the search engine's index, the index
    becomes heavily overloaded with the newly added words, as they are outside
    of the real-language words already present in the index. The following
    should be theoretically possible:
    
        - craft fake words so that they attack a specific hash function. Make a
    bunch of fakes that hash to the same value as a legitimate word in the
    English language. This will possibly impact the performance of search
    engines using that particular hash function when they try to look up the
    legitimate words that are being targeted.
    
        - craft fake words so that they disbalance a b-tree index, if one is
    used. I am not entirely sure, however it appears to me that it is possible
    to craft words in such a way as to alter the shape of the b-tree and thus
    impact the performance on the lookups where it used.
    
        - craft fake words randomly so that the index just grows. To the best of
    my understanding, most search engines will index and retain keywords that
    are only seen on one web page in the entire Internet. However, I think the
    capacity of the search engines to keep track of such one-time non-English
    letter sequences is limited and can be eventually exhausted.
    
    If the above-mentioned things are feasible, then one can even construct a
    worm of some sort, that will auto-install such fake page generators on valid
    sites, thus increasing the traffic to the crawler even more. Writing an
    short Apache handler meant to be silently installed in httpd.conf at
    root-kit installation should not be that difficult. When is the last time
    your reviewed the module list of your Apache? Will you spot a malicious
    module if it is called mod_ip_vhost_alias, loaded inbetween two other
    modules that you never knew are vital or not?
    
    Please note that the setup described differs from the practice of generating
    fake pages containing a lot of real (mostly adult) keywords. After all, such
    real-language words already exist in the index, whereas I suggest bombing
    the index with a huge number of not-previously-existing freshly-generated
    random letter sequences. Also, please note that the purpose of the attack is
    to damage the index, and not to make the crawler consume bandwidth by going
    in an endless loop or something like that (though, the crawler has to scan
    the pages first so that the generated keywords are ultimately delivered to
    the index).
    
    I will appreciate any and all thoughts on the issue.
    
    Philip Stoev
    



    This archive was generated by hypermail 2b30 : Mon Feb 03 2003 - 14:11:06 PST