FC: Trapping anti-music piracy spiders: "RIAA Pit of Confusion"

From: Declan McCullagh (declanat_private)
Date: Sat May 17 2003 - 09:10:41 PDT

  • Next message: Declan McCullagh: "FC: Oregon legislator on Microsoft lobbying against open source bill"

    [One problem with this approach is that it looks like the RIAA is spending 
    more time targeting FTP sites than web sites. --Declan]
    
    ---
    
    From: "Paul \"Evil Genius\" Music" <evlpawlat_private>
    To: "DeClan" <declanat_private>
    Date: Sat, 17 May 2003 00:24:49 -0500
    
    http://www.kuro5hin.org/print/2003/5/16/163447/493
    
    RIAA Pit of Confusion (Culture)
    
    By salimfadhley
    Fri May 16th, 2003 at 10:13:06 PM EST
    
    After reading about the RIAA threatening to sue yet another innocent 
    archive operator, I decided to take some direct action: It occurred to me 
    that the RIAA keep falsely accusing others of piracy because they put their 
    faith in an unintelligent spider - a fact which can be simply exploited to 
    make my servers into an RIAA no-go-zone...
    
    
    
    Whilst spidering is nothing to worry about (and only to be expected on a 
    public site), the way the association fires off legal threats based on this 
    spider results alone seems wrong. Since this spider does not actually look 
    at the whole title of the file, or even it's content, I figured I could 
    have some fun at their expense:
    
    What if I could write a `tarpit' script that could create a large number of 
    interlinked automatically generated web sites. If their spider tried to 
    scan my server it would be fooled into thinking that it had found a 
    treasure trove of MP3 sites. Anybody who took the time to look at the site 
    could see that the site contains no pirate content at all.
    
    How might the RIAA react to such a thing?
    
    They could upgrade their spider so that it only recognises valid tracknames 
    that are in-fact MP3s. (e.g. it would know that 
    `elephant_wiggle-Madonna.mp3' is not a real Madonna song). This would limit 
    their ability to detect only correctly named MP3 files, and force them to 
    use their spider responsibly.
    Every single suspect site would need to be hand-checked in order to verify 
    that a genuine breach of copyright has taken place - this would 
    substantially decrease the return on investment for their spidering project 
    because it would be labour intensive, again forcing a more responsible 
    approach to detecting offenders.
    They could blacklist my server to prevent their spider from looking at it 
    in future - that would be at least a small victory. If they blacklisted 
    enough servers it would be the same as giving up!
    They could send me a legal nastygram instructing me to disable my tarpit... 
    Since I do not live in the USA, this might not be enforceable.
    
    How it works
    
    The Pit of Confusion is a pure PHP script that can automatically generate a 
    very large number of web-sites with links to MP3s. It contains a settings 
    file which contains lists of famous artist names and random words that can 
    be used to make silly song titles. There is also a download manager 
    component - designed to deliver MP3 files in the most inefficient possible way.
    
    As with any web-site, the action starts with a URL. Normally, the first 
    parts of the URL just signifies the server on which the site runs, however 
    I have used a Dynamic DNS service to encode the two key site parameters 
    into the hostname. I learnt that trick from this website. The first two 
    parts of the domain name tell the script how to build the page: If you visit:
    
    http://madonna.ricky.music.stodge.org
    
    It will show you `Ricky's' Madonna page. The script does not know anything 
    about Madonna or any of her songs - it just uses information provided at 
    run-time to set up the basic variables. Anything in the form of 
    a.b.music.stodge.org will get handled by the same server.
    
    Notice how slowly the page loads - that is because there is a configurable 
    `annoying delay' built into each transaction. Assuming that the spider 
    system has a fixed maximum number of threads, it makes sense to tie these 
    up for as long as possible - but not so long as to deter a person wishing 
    to verify that there are no pirated files on the site.
    
    Next it builds up a list of randomly named MP3 links that include the the 
    chosen Artist's name in the title. If you try to click on the link, instead 
    of delivering a pirated file it sends a non-copyrighted music file via a 
    download manager that ensures that the download will take a very long time. 
    The idea is to tie-up as many threads as possible on whatever system is 
    doing the spidering.
    
    Finally it makes some links to a selection of other random sites produced 
    by the same system. The idea is to keep the spider in the tarpit for as 
    long as possible
    
    Notes
    
    This is just my first attempt. No doubt, by now more talented scripters can 
    see weaknesses in my plan - this is why I intend to share the source-code 
    of my project with anybody who wants it. If you want to help out, please 
    leave a message in this board and I will get back to ya!
    
    Full discussion: http://www.kuro5hin.org/story/2003/5/16/163447/493
    
    
    
    
    -------------------------------------------------------------------------
    POLITECH -- Declan McCullagh's politics and technology mailing list
    You may redistribute this message freely if you include this notice.
    -------------------------------------------------------------------------
    To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
    This message is archived at http://www.politechbot.com/
    Declan McCullagh's photographs are at http://www.mccullagh.org/
    Like Politech? Make a donation here: http://www.politechbot.com/donate/
    -------------------------------------------------------------------------
    



    This archive was generated by hypermail 2b30 : Sat May 17 2003 - 09:46:31 PDT