RE: Possible DOS against search engines?

From: Rob Shein (shotenat_private)
Date: Mon Feb 03 2003 - 15:45:00 PST

Next message: 3APA3A: "Re: Windows reverse Shell"

Previous message: Philip Stoev: "Possible DOS against search engines?"
In reply to: Philip Stoev: "Possible DOS against search engines?"
Next in thread: jasonk: "RE: Possible DOS against search engines?"
Reply: jasonk: "RE: Possible DOS against search engines?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I see a few problems here.  Problems are listed below each concept, for
clarity, and assume a decent webcrawler.

> 
> 1. You create a generator for fake web pages, whose purpose 
> is to spit out HTML containing a huge amount of (pseudo) 
> random _non-existing_ words, as well as links to other pages 
> within the generator;

I doubt this would make even a slight dent in things.  Seeing as how
webcrawlers already walk the entire internet, with its various languages,
enormous expanse, and endless misspellings, I think anything you could
create would end up being a drop in the bucket.

> 
> 2. You place that generator somewhere and submit the URL to 
> search engines for crawling;
> 
> 3. The search engines then crawls the site, possibly reaching 
> their pre-defined maximum of crawling depth (or, if badly 
> broken, crawl the site indefinitely, jumping from one freshly 
> generated page to another);
 
But they don't crawl indefinitely.  What do they do if they hit two sites
that link to each other?  They notice this, and move on.

> 4. Upon adding the gathered words to the search engine's 
> index, the index becomes heavily overloaded with the newly 
> added words, as they are outside of the real-language words 
> already present in the index. The following should be 
> theoretically possible:
 
But who would search on them?

>     - craft fake words so that they attack a specific hash 
> function. Make a bunch of fakes that hash to the same value 
> as a legitimate word in the English language. This will 
> possibly impact the performance of search engines using that 
> particular hash function when they try to look up the 
> legitimate words that are being targeted.

This would be noticed by the search engine long before it became a real
problem, and it would be addressed.  This is how they deal with many things,
including people who try to influence their ranking using various means.
 
>     - craft fake words so that they disbalance a b-tree 
> index, if one is used. I am not entirely sure, however it 
> appears to me that it is possible to craft words in such a 
> way as to alter the shape of the b-tree and thus impact the 
> performance on the lookups where it used.
> 
>     - craft fake words randomly so that the index just grows. 
> To the best of my understanding, most search engines will 
> index and retain keywords that are only seen on one web page 
> in the entire Internet. However, I think the capacity of the 
> search engines to keep track of such one-time non-English 
> letter sequences is limited and can be eventually exhausted.

It is my belief that, again, they will notice the impact on their database
and quickly address the issue.  What about a bit of code that states that if
more then 5% of the words in a page are unique in the database, that that
page is dropped?

> If the above-mentioned things are feasible, then one can even 
> construct a worm of some sort, that will auto-install such 
> fake page generators on valid sites, thus increasing the 
> traffic to the crawler even more. Writing an short Apache 
> handler meant to be silently installed in httpd.conf at 
> root-kit installation should not be that difficult. When is 
> the last time your reviewed the module list of your Apache? 
> Will you spot a malicious module if it is called 
> mod_ip_vhost_alias, loaded inbetween two other modules that 
> you never knew are vital or not?

No, but I'd notice an abrupt lack of space on my web server.  And the sudden
oddly-named URLS in my logs.  And the corresponding oddly-named pages in my
site.  And if I didn't notice, my hosting provider would.

> Please note that the setup described differs from the 
> practice of generating fake pages containing a lot of real 
> (mostly adult) keywords. After all, such real-language words 
> already exist in the index, whereas I suggest bombing the 
> index with a huge number of not-previously-existing 
> freshly-generated random letter sequences. Also, please note 
> that the purpose of the attack is to damage the index, and 
> not to make the crawler consume bandwidth by going in an 
> endless loop or something like that (though, the crawler has 
> to scan the pages first so that the generated keywords are 
> ultimately delivered to the index).
> 
> I will appreciate any and all thoughts on the issue.
> 
> Philip Stoev
>

Next message: 3APA3A: "Re: Windows reverse Shell"
Previous message: Philip Stoev: "Possible DOS against search engines?"
In reply to: Philip Stoev: "Possible DOS against search engines?"
Next in thread: jasonk: "RE: Possible DOS against search engines?"
Reply: jasonk: "RE: Possible DOS against search engines?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Mon Feb 03 2003 - 15:53:51 PST