Description of the Problem -------------------------- CGI.pm contains a method self_url which returns the URL with which the script was called, including all of the data fields submitted --- except for the .submit= field added by CGI.pm. Normally, this is used something like this: my $self = self_url; print qq(<a href="$self#Section2">Section 2</a>\n); If CGI.pm is running on Apache 1.3.6, probably other versions of Apache, and possibly other Web servers, it is possible for a client to cause self_url to include arbitrary sequences of characters at its beginning, such as "><script language="JavaScript">evil_code()</script><a href=" which, if used in the manner described above, leads to the problem described in CERT Advisory CA-2000-02, "Malicious HTML Tags Embedded in Client Web Requests". Apparently, anything following an unencoded space in the URL used to invoke the script ends up being inserted, unencoded but converted to lower case, at the beginning of self_url's return value. Unencoded spaces are, of course, illegal in URLs. Most web browsers accept them anyway in HREF attributes, and don't bother to %-encode them when they send them in a GET request. Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least, allow HREF attribute values to be delimited by ' single-quotes instead of " double-quotes, which allows insertion of unencoded " double-quotes into the URL --- which is crucial to exploiting this problem. Lynx 2.8.1rel.2, however, strips the spaces from the URL found in HTML, preventing it from being exploited via <A HREF=''>. Diagnosis --------- It appears that this happens because the unencoded space is interpreted by the HTTP server (Apache 1.3.6 in my tests) as separating the URL from the protocol name. So the environment variable SERVER_PROTOCOL gets set to everything following the space, followed by a space and the actual protocol, such as "HTTP/1.0". Three of the four tested browsers (Netscape 4.6, MSIE 3.0, and Mozilla M12) send the unencoded space in the request URL, which generates an illegal HTTP Request-Line. CGI.pm simply takes that environment variable, chops off everything from the slash onwards, lowercases it, and returns the result as the URL scheme. Suggested fixes --------------- RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".", and "-" are allowed in scheme names. Accordingly, I suggest the following change to CGI.pm: *** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999 --- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm Mon Feb 14 12:07:37 2000 *************** *** 2594,2600 **** return 'https' if $self->server_port == 443; my $prot = $self->server_protocol; my($protocol,$version) = split('/',$prot); ! return "\L$protocol\E"; } END_OF_FUNC --- 2594,2602 ---- return 'https' if $self->server_port == 443; my $prot = $self->server_protocol; my($protocol,$version) = split('/',$prot); ! $protocol = lc $protocol; ! $protocol =~ tr/-+.a-z0-9//cd; ! return $protocol; } END_OF_FUNC (Sorry --- I'm using Solaris diff, which doesn't have unified diff capability.) This prevents the exploit, but of course the resulting URL is incorrect. It won't affect responses to well-formed HTTP requests, which should never have anything other than HTTP for the $protocol to begin with. It might be smarter to always return 'http' when not returning 'https'; I'm not presently aware of any protocols other than HTTP and SSL HTTP used with CGI. The current draft CGI spec says: Note that the scheme and the protocol are not identical; for instance, a resource accessed via an SSL mechanism may have a Client-URI with a scheme of "https" rather than "http". CGI/1.1 provides no means for the script to reconstruct this, and therefore the Script-URI includes the base protocol used. . . . in other words, implementing self_url in a way that is guaranteed to be correct for future non-HTTP CGI implementations is not possible. The successful exploit requires a remarkable chain of extreme forgiveness: 1- The web browser must accept an illegal URL from (possibly valid, although very unusual) HTML. 2- The web browser must send an illegal HTTP request with the illegal URL, without %-encoding the URL to make it legal. 3- The HTTP server must accept the illegal HTTP request. 4- The HTTP server must invoke the CGI script with a nonsensical SERVER_PROTOCOL. 5- The CGI script must accept the nonsensical SERVER_PROTOCOL and use it to produce an illegal URL, which it must then embed in HTML it outputs. 6- The web browser must then trust the output of the CGI script in some fashion inappropriate to the supplier of the original URL. Netscape 4.6, MSIE 3.0, and Mozilla M12 (and, I would guess, most Web browsers) will happily perform steps 1 and 2; Apache 1.3.6 (and, I would guess, most Web servers) will happily perform steps 3 and 4; any program using CGI.pm and embedding self_url's return value in their outputs will perform step 5; and as CERT advisory CA-2000-02 documents, there are a wide variety of situations that can cause step 6 to happen. My patch above breaks the chain at step 5. It would be nice to break it at other steps as well. The HTTP requests used in this exploit are broken --- i.e. by having a Request-Line that has a protocol name that not only fails to be "HTTP", but actually fails to be a valid protocol name at all. Perhaps Apache and other web servers should respond to such egregious protocol violations with error messages, rather than passing the bogus data on to CGI scripts. I have not sent copies of this mail to other web-server teams, because I do not have the facilities or inclination to properly verify that they are equally lenient. Preliminary testing suggests that they are not: - IIS 5.0 responds, "The parameter is incorrect". - Netscape-Enterprise/3.6 responds, "Your browser sent a message this server could not understand." - Zeus 3.3 responds with a 400 Bad Request error. - thttpd 2.15 responds with a 400 Bad Request error. I also believe that Web browsers should take some steps to avoid sending illegal HTTP requests; since the problem here happens only when both the server and browser are trusted --- perhaps due to some earlier authentication exchange between them --- while the URL is untrusted, the browser should validate the URL, at least to the point of not sending illegal requests to the server. References ---------- http://www.w3.org/CGI/ --- information about CGI http://Web.Golux.Com/coar/cgi/draft-coar-cgi-v11-03-clean.html --- current draft specification for CGI http://www.cert.org/advisories/CA-2000-02.html --- CERT advisory CA-2000-02, "Malicious HTML Tags Embedded in Client Web Requests" RFC 1738, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1738.txt --- "Uniform Resource Locators (URL)" --- in particular, section 2.1, which defines the syntax of scheme names RFC 2068, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt --- "Hypertext Transfer Protocol -- HTTP/1.1" --- in particular, section 3.2.1, which defines the syntax of URI scheme names identically to RFC 1738, but including uppercase US-ASCII letters. --- and section 5.1, which defines the syntax of HTTP Request-Lines, indicating (together with the sections defining URI syntax and section 33.1, defining HTTP-Version syntax) that they must contain exactly two spaces. http://stein.cshl.org/WWW/CGI/ --- documentation for CGI.pm http://www.apache.org/info/css-security/apache_specific.html --- changes made to Apache in response to CA-2000-02 http://www.netcraft.co.uk/survey/ --- Netcraft Web Server Survey, which lists the most popular web server software -- <kragenat_private> Kragen Sitaker <http://www.pobox.com/~kragen/> The Internet stock bubble didn't burst on 1999-11-08. Hurrah! <URL:http://www.pobox.com/~kragen/bubble.html> The power didn't go out on 2000-01-01 either. :)
This archive was generated by hypermail 2b30 : Fri Apr 13 2001 - 15:34:48 PDT