CGI.pm and the untrusted-URL problem

From: Kragen Sitaker (kragenat_private)
Date: Mon Feb 14 2000 - 11:01:48 PST

  • Next message: Jens Hektor: "Packet filter logging: MAC & TCP flags"

    Description of the Problem
    --------------------------
    
    CGI.pm contains a method self_url which returns the URL with which the
    script was called, including all of the data fields submitted ---
    except for the .submit= field added by CGI.pm.
    
    Normally, this is used something like this:
    
    	my $self = self_url;
    	print qq(<a href="$self#Section2">Section 2</a>\n);
    
    If CGI.pm is running on Apache 1.3.6, probably other versions of
    Apache, and possibly other Web servers, it is possible for a client to
    cause self_url to include arbitrary sequences of characters at its
    beginning, such as
    
    	"><script language="JavaScript">evil_code()</script><a href="
    
    which, if used in the manner described above, leads to the problem
    described in CERT Advisory CA-2000-02, "Malicious HTML Tags Embedded in
    Client Web Requests".
    
    Apparently, anything following an unencoded space in the URL used to
    invoke the script ends up being inserted, unencoded but converted to
    lower case, at the beginning of self_url's return value.
    
    Unencoded spaces are, of course, illegal in URLs.  Most web browsers
    accept them anyway in HREF attributes, and don't bother to %-encode
    them when they send them in a GET request.
    
    Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least,
    allow HREF attribute values to be delimited by ' single-quotes instead
    of " double-quotes, which allows insertion of unencoded " double-quotes
    into the URL --- which is crucial to exploiting this problem.  Lynx
    2.8.1rel.2, however, strips the spaces from the URL found in HTML,
    preventing it from being exploited via <A HREF=''>.
    
    Diagnosis
    ---------
    
    It appears that this happens because the unencoded space is interpreted
    by the HTTP server (Apache 1.3.6 in my tests) as separating the URL
    from the protocol name.  So the environment variable SERVER_PROTOCOL
    gets set to everything following the space, followed by a space and the
    actual protocol, such as "HTTP/1.0".
    
    Three of the four tested browsers (Netscape 4.6, MSIE 3.0, and Mozilla
    M12) send the unencoded space in the request URL, which generates an
    illegal HTTP Request-Line.
    
    CGI.pm simply takes that environment variable, chops off everything
    from the slash onwards, lowercases it, and returns the result as the
    URL scheme.
    
    Suggested fixes
    ---------------
    
    RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".",
    and "-" are allowed in scheme names.  Accordingly, I suggest the
    following change to CGI.pm:
    
    *** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999
    --- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm      Mon Feb 14 12:07:37 2000
    ***************
    *** 2594,2600 ****
          return 'https' if $self->server_port == 443;
          my $prot = $self->server_protocol;
          my($protocol,$version) = split('/',$prot);
    !     return "\L$protocol\E";
      }
      END_OF_FUNC
    
    --- 2594,2602 ----
          return 'https' if $self->server_port == 443;
          my $prot = $self->server_protocol;
          my($protocol,$version) = split('/',$prot);
    !     $protocol = lc $protocol;
    !     $protocol =~ tr/-+.a-z0-9//cd;
    !     return $protocol;
      }
      END_OF_FUNC
    
    (Sorry --- I'm using Solaris diff, which doesn't have unified diff
    capability.)
    
    This prevents the exploit, but of course the resulting URL is
    incorrect.  It won't affect responses to well-formed HTTP requests,
    which should never have anything other than HTTP for the $protocol to
    begin with.
    
    It might be smarter to always return 'http' when not returning 'https';
    I'm not presently aware of any protocols other than HTTP and SSL HTTP used with
    CGI.  The current draft CGI spec says:
    
    	Note that the scheme and the protocol are not identical; for
    	instance, a resource accessed via an SSL mechanism may have a
    	Client-URI with a scheme of "https" rather than "http".
    	CGI/1.1 provides no means for the script to reconstruct this,
    	and therefore the Script-URI includes the base protocol used.
    
    . . . in other words, implementing self_url in a way that is guaranteed
    to be correct for future non-HTTP CGI implementations is not possible.
    
    The successful exploit requires a remarkable chain of extreme forgiveness:
    1- The web browser must accept an illegal URL from (possibly valid,
       although very unusual) HTML.
    2- The web browser must send an illegal HTTP request with the illegal
       URL, without %-encoding the URL to make it legal.
    3- The HTTP server must accept the illegal HTTP request.
    4- The HTTP server must invoke the CGI script with a nonsensical
       SERVER_PROTOCOL.
    5- The CGI script must accept the nonsensical SERVER_PROTOCOL and use it to
       produce an illegal URL, which it must then embed in HTML it outputs.
    6- The web browser must then trust the output of the CGI script in some
       fashion inappropriate to the supplier of the original URL.
    
    Netscape 4.6, MSIE 3.0, and Mozilla M12 (and, I would guess, most Web
    browsers) will happily perform steps 1 and 2; Apache 1.3.6 (and, I
    would guess, most Web servers) will happily perform steps 3 and 4; any
    program using CGI.pm and embedding self_url's return value in their
    outputs will perform step 5; and as CERT advisory CA-2000-02 documents,
    there are a wide variety of situations that can cause step 6 to
    happen.
    
    My patch above breaks the chain at step 5.  It would be nice to break
    it at other steps as well.
    
    The HTTP requests used in this exploit are broken --- i.e. by having a
    Request-Line that has a protocol name that not only fails to be "HTTP",
    but actually fails to be a valid protocol name at all.  Perhaps Apache
    and other web servers should respond to such egregious protocol
    violations with error messages, rather than passing the bogus data on
    to CGI scripts.
    
    I have not sent copies of this mail to other web-server teams, because
    I do not have the facilities or inclination to properly verify that
    they are equally lenient.  Preliminary testing suggests that they are
    not:
    
    - IIS 5.0 responds, "The parameter is incorrect".
    - Netscape-Enterprise/3.6 responds, "Your browser sent a
      message this server could not understand."
    - Zeus 3.3 responds with a 400 Bad Request error.
    - thttpd 2.15 responds with a 400 Bad Request error.
    
    I also believe that Web browsers should take some steps to avoid
    sending illegal HTTP requests; since the problem here happens only when
    both the server and browser are trusted --- perhaps due to some earlier
    authentication exchange between them --- while the URL is untrusted,
    the browser should validate the URL, at least to the point of not
    sending illegal requests to the server.
    
    References
    ----------
    
    http://www.w3.org/CGI/ --- information about CGI
    http://Web.Golux.Com/coar/cgi/draft-coar-cgi-v11-03-clean.html --- current
    	draft specification for CGI
    http://www.cert.org/advisories/CA-2000-02.html --- CERT advisory CA-2000-02,
    	"Malicious HTML Tags Embedded in Client Web Requests"
    RFC 1738, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1738.txt ---
    	"Uniform Resource Locators (URL)" --- in particular, section 2.1,
    	which defines the syntax of scheme names
    RFC 2068, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt ---
    	"Hypertext Transfer Protocol -- HTTP/1.1"
    	--- in particular, section 3.2.1, which defines the syntax of
    	URI scheme names identically to RFC 1738, but including
    	uppercase US-ASCII letters.
    	--- and section 5.1, which defines the syntax of HTTP Request-Lines,
    	indicating (together with the sections defining URI syntax and
    	section 33.1, defining HTTP-Version syntax) that they must
    	contain exactly two spaces.
    http://stein.cshl.org/WWW/CGI/ --- documentation for CGI.pm
    http://www.apache.org/info/css-security/apache_specific.html --- changes made
    	to Apache in response to CA-2000-02
    http://www.netcraft.co.uk/survey/ --- Netcraft Web Server Survey,
    	which lists the most popular web server software
    
    --
    <kragenat_private>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
    The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
    <URL:http://www.pobox.com/~kragen/bubble.html>
    The power didn't go out on 2000-01-01 either.  :)
    



    This archive was generated by hypermail 2b30 : Fri Apr 13 2001 - 15:34:48 PDT