FC: Trapping anti-music piracy spiders: "RIAA Pit of Confusion"

From: Declan McCullagh (declanat_private)
Date: Sat May 17 2003 - 09:10:41 PDT

Next message: Declan McCullagh: "FC: Oregon legislator on Microsoft lobbying against open source bill"

Previous message: Declan McCullagh: "FC: ACLU says Bush administration must come clean on TIA project"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[One problem with this approach is that it looks like the RIAA is spending 
more time targeting FTP sites than web sites. --Declan]

---

From: "Paul \"Evil Genius\" Music" <evlpawlat_private>
To: "DeClan" <declanat_private>
Date: Sat, 17 May 2003 00:24:49 -0500

http://www.kuro5hin.org/print/2003/5/16/163447/493

RIAA Pit of Confusion (Culture)

By salimfadhley
Fri May 16th, 2003 at 10:13:06 PM EST

After reading about the RIAA threatening to sue yet another innocent 
archive operator, I decided to take some direct action: It occurred to me 
that the RIAA keep falsely accusing others of piracy because they put their 
faith in an unintelligent spider - a fact which can be simply exploited to 
make my servers into an RIAA no-go-zone...

Whilst spidering is nothing to worry about (and only to be expected on a 
public site), the way the association fires off legal threats based on this 
spider results alone seems wrong. Since this spider does not actually look 
at the whole title of the file, or even it's content, I figured I could 
have some fun at their expense:

What if I could write a `tarpit' script that could create a large number of 
interlinked automatically generated web sites. If their spider tried to 
scan my server it would be fooled into thinking that it had found a 
treasure trove of MP3 sites. Anybody who took the time to look at the site 
could see that the site contains no pirate content at all.

How might the RIAA react to such a thing?

They could upgrade their spider so that it only recognises valid tracknames 
that are in-fact MP3s. (e.g. it would know that 
`elephant_wiggle-Madonna.mp3' is not a real Madonna song). This would limit 
their ability to detect only correctly named MP3 files, and force them to 
use their spider responsibly.
Every single suspect site would need to be hand-checked in order to verify 
that a genuine breach of copyright has taken place - this would 
substantially decrease the return on investment for their spidering project 
because it would be labour intensive, again forcing a more responsible 
approach to detecting offenders.
They could blacklist my server to prevent their spider from looking at it 
in future - that would be at least a small victory. If they blacklisted 
enough servers it would be the same as giving up!
They could send me a legal nastygram instructing me to disable my tarpit... 
Since I do not live in the USA, this might not be enforceable.

How it works

The Pit of Confusion is a pure PHP script that can automatically generate a 
very large number of web-sites with links to MP3s. It contains a settings 
file which contains lists of famous artist names and random words that can 
be used to make silly song titles. There is also a download manager 
component - designed to deliver MP3 files in the most inefficient possible way.

As with any web-site, the action starts with a URL. Normally, the first 
parts of the URL just signifies the server on which the site runs, however 
I have used a Dynamic DNS service to encode the two key site parameters 
into the hostname. I learnt that trick from this website. The first two 
parts of the domain name tell the script how to build the page: If you visit:

http://madonna.ricky.music.stodge.org

It will show you `Ricky's' Madonna page. The script does not know anything 
about Madonna or any of her songs - it just uses information provided at 
run-time to set up the basic variables. Anything in the form of 
a.b.music.stodge.org will get handled by the same server.

Notice how slowly the page loads - that is because there is a configurable 
`annoying delay' built into each transaction. Assuming that the spider 
system has a fixed maximum number of threads, it makes sense to tie these 
up for as long as possible - but not so long as to deter a person wishing 
to verify that there are no pirated files on the site.

Next it builds up a list of randomly named MP3 links that include the the 
chosen Artist's name in the title. If you try to click on the link, instead 
of delivering a pirated file it sends a non-copyrighted music file via a 
download manager that ensures that the download will take a very long time. 
The idea is to tie-up as many threads as possible on whatever system is 
doing the spidering.

Finally it makes some links to a selection of other random sites produced 
by the same system. The idea is to keep the spider in the tarpit for as 
long as possible

Notes

This is just my first attempt. No doubt, by now more talented scripters can 
see weaknesses in my plan - this is why I intend to share the source-code 
of my project with anybody who wants it. If you want to help out, please 
leave a message in this board and I will get back to ya!

Full discussion: http://www.kuro5hin.org/story/2003/5/16/163447/493

-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
-------------------------------------------------------------------------
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
Like Politech? Make a donation here: http://www.politechbot.com/donate/
-------------------------------------------------------------------------

Next message: Declan McCullagh: "FC: Oregon legislator on Microsoft lobbying against open source bill"
Previous message: Declan McCullagh: "FC: ACLU says Bush administration must come clean on TIA project"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b30 : Sat May 17 2003 - 09:46:31 PDT