How Google indexed a file with no external link

From: Kevin (kevinat_private)
Date: Mon Jul 09 2001 - 18:47:44 PDT

  • Next message: Paul Starzetz: "Re: Tripwire temporary files"

    I'm running a modest Apache 1.3.19 server on Mandrake 7.2, with a 2.4
    kernel.  No cgi's or PHP support, though I do have server-info and
    server-status enabled for local reference only.
    I noticed some hits in the Apache access_log for two files, index.old
    and index.older, which were backups of index.html left in my docroot 
    directory. It wasn't hard to figure out that Google was directing 
    people to these files; what I couldn't understand was how Google knew 
    they were there.
    Looking a bit deeper, I saw googlebot (and later, some ordinary vistors)
    using this syntax:

    ...and if you try this yourself in Internet Explorer, you'll find that 
    Apache is ignoring my index.html and is giving you a formatted directory 
    of the docroot directory as though there were no index page.
    The differences between the ?M and the ?S versions are not blatantly
    obvious, at least not to me.
    I'm writing to Bugtraq in frustration because I can't find this documented
    ANYWHERE, and it could be a nastier surprise to others than it was to me*.  
    What other little surprises like this exist, and can I do something in my 
    Apache config to take control of them?
    *Before you tell me about robots.txt, htaccess and so forth, let me
    note that I know about those; and before I put this site up I realized 
    that anything I leave in my docroot is fair game.  I'm only puzzled 
    because I can't find ANY information about these /?M or /?S thingamabobs.  
    I can't even RTFM, because I don't know what to call them!
    P.S.  I have since added .old, .older, .oldest to the list of file types
    to be served as html, and created new versions of all three files that 
    redirect visitors to index.html instead.
    Sanitized Apache httpd.conf appended at moderator's request -- standard
    Apache comments stripped out to reduce the size.
    8<------ snip here ----------
    ServerType standalone
    ServerRoot "/usr/local/apache"
    PidFile /var/log/
    ScoreBoardFile /var/log/httpd.scoreboard
    Timeout 300
    KeepAlive On
    MaxKeepAliveRequests 100
    KeepAliveTimeout 15
    MinSpareServers 2
    MaxSpareServers 4
    StartServers 3
    MaxClients 50
    MaxRequestsPerChild 0
    ExtendedStatus On
    Port 80
    User webby
    Group webby
    ServerAdmin kevinat_private
    DocumentRoot "/home/http"
    <Directory />
        Options FollowSymLinks
        AllowOverride None
    <Directory /home/http/bcc/images>
        Order Deny,Allow
        Deny from All
        AllowOverride AuthConfig
    <Directory "/home/http">
        Options Indexes FollowSymLinks MultiViews
        AllowOverride None
        Order allow,deny
        Allow from all
    <IfModule mod_userdir.c>
        UserDir public_html
    <IfModule mod_dir.c>
        DirectoryIndex index.html
    AccessFileName .htaccess
    <Files ~ "^\.ht">
        Order allow,deny
        Deny from all
    UseCanonicalName On
    <IfModule mod_mime.c>
        TypesConfig /usr/local/apache/conf/mime.types
    DefaultType text/plain
    <IfModule mod_mime_magic.c>
        MIMEMagicFile /usr/local/apache/conf/magic
    HostnameLookups Off
    ErrorLog /var/log/error_log
    LogLevel warn
    LogFormat "%h %l %u %t %v \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" custom
    LogFormat "%{Referer}i -> %U" referer
    LogFormat "%{User-agent}i" agent
    CustomLog /var/log/access_log custom
    ServerSignature Off
    <IfModule mod_alias.c>
        Alias /icons/ "/usr/local/apache/icons/"
        <Directory "/usr/local/apache/icons">
            Options Indexes MultiViews
            AllowOverride None
            Order allow,deny
            Allow from all
        ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/"
        <Directory "/usr/local/apache/cgi-bin">
            AllowOverride None
            Options None
            Order allow,deny
            Allow from all
    <IfModule mod_autoindex.c>
        IndexOptions FancyIndexing
    # Bunch of defaults provided by Apache - snipped
        ReadmeName README
        HeaderName HEADER
        IndexIgnore .??* *~ *# HEADER* README* RCS CVS *,v *,t
    <IfModule mod_mime.c>
        AddEncoding x-compress Z
        AddEncoding x-gzip gz tgz
    # Bunch of defaults provided by Apache - snipped
        <IfModule mod_negotiation.c>
            LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw
        AddType application/x-tar .tgz
    # Added by me AFTER seeing hits for these extensions:
        AddType text/html .old .older .oldest
    # This was NOT enabled:
        #AddHandler send-as-is asis
    <IfModule mod_setenvif.c>
        BrowserMatch "Mozilla/2" nokeepalive
        BrowserMatch "MSIE 4\.0b2;" nokeepalive downgrade-1.0 force-response-1.0
        BrowserMatch "RealPlayer 4\.0" force-response-1.0
        BrowserMatch "Java/1\.0" force-response-1.0
        BrowserMatch "JDK/1\.0" force-response-1.0
    <Location /server-status>
        SetHandler server-status
        Order deny,allow
        Deny from all
        Allow from
    <Location /server-info>
        SetHandler server-info
        Order deny,allow
        Deny from all
        Allow from
    	DocumentRoot "/home/http"
    	DocumentRoot "/home/http/bcc/com"
    	DocumentRoot "/home/http/bcc/com"
    	DocumentRoot "/home/http/bcc/images"
    	DocumentRoot "/home/http/bcc/org"
    	DocumentRoot "/home/http/bcc/com"
    	DocumentRoot "/home/http/bcc/com"
    8<------ snip here ----------

    This archive was generated by hypermail 2b30 : Tue Jul 10 2001 - 06:56:42 PDT