Re: CROSS SITE-SCRIPTING Protection with PHP

From: Sverre H. Huseby (shhat_private)
Date: Wed Oct 16 2002 - 13:48:52 PDT

  • Next message: Jeremy Junginger: "Covert Channels"

    [b0iler]
    
    |   Also, you are just sending the inputed values of parameters.  What
    |   about the names of the parameter (the $key variables)?  They could
    |   contain potentially dangerous XSS which is often printed to the
    |   client.  Also, user input (GPC) is not the only tainted data in a
    |   script.  Any data that comes from an outside source is potientally
    |   dangerous. Files, databases, ENV variables, etc.. need to be
    |   treated as if it contains the most clever tricks to evade your
    |   filtering and protection schemes.
    
    Correct.  And I've tried to say  the same quite a few times on several
    securityfocus lists the last two years.
    
    We need to shift the focus away from _input_.  Input is never trouble-
    some in  itself.  It first gets  troublesome when put in  a context in
    which it is  interpreted in some way.  And then  again only when parts
    of it  will not be interpreted  as plain data, but  as something else.
    As b0iler (whoever that is :)  ) correctly states above, data from the
    inside may cause  just as much trouble as data  from the outside.  And
    it may do so deep inside a multi-tier system, far from the web layer.
    
    It's when  data is  passed somewhere for  interpretation that  it gets
    troublesome.  We should thus pay attention to the format of the data
    whenever we _pass_it_along_,  rather than when we receive  it from the
    outside.  Web applications tend to pass data along all the time:
    
      * to database servers, often  by concatenating the data with strings
        containing  SQL constructs,  or  by using  some  kind of  prepared
        statement mechanism (much better).
    
      * to shell command interpreters (yikes!).
    
      * to the  OS by sending file names to  file handling functions, host
        names to name resolutions libraries and so on.  (a large amount of
        "so on" for the OS.)
    
      * to legacy systems written in some obscure language using some equ-
        ally obscure protocol.
    
      * to other web servers (B2B) using XML, URL parameters or whatever.
    
      * to other processes running on the same server, using some
        internally made protocol.
    
      * many, many, many more...
    
      * and  last, but not least, to  the web browser of  the user.  Which
        luckily is just another  sub-system, covered by the same rule as
        the rest.
    
    And to repeat: "Data" is not only user input.  It is anything, no mat-
    ter  the source.  Every  system we  pass data  to has  its own  way of
    interpreting  it, and  the  interpretation depends  on context.   Some
    examples:
    
      * when building strings  containing SQL queries, the quote character
        may cause trouble if it  appears prematurely in an SQL string con-
        stant.  _Any_  data passed  as part of  an SQL  statement _that_is
        _to_be_interpreted_as_a_string_constant_ will  need to have quotes
        escaped in some way.  (No,  we can't generally forbid quotes.  How
        would I  be able to write "can't"  a few words back  if you forbid
        the quote?)   And no,  we can't generally  escape quotes  at input
        time  either, because  then they  will look  rather funny  for the
        _other_  sub-systems,  in which  quotes  have  no special  meaning
        (eg. a text file or the user's browser).
    
        For  more on  this, see  another vuln-dev-mail  of  mine available
        here:
    
          http://shh.thathost.com/text/passing-data-03.txt
    
      * when talking to the OS, null-bytes may create confusion when pass-
        ing strings, as the OS (written in C, normally) treats the '\0' as
        a string terminator.  Most  "modern" languages do not.  We'll gen-
        erally need  to pay attention  to null-bytes when talking  to sub-
        systems written  in C.  The reason  is generally that  our view of
        the string will differ from the view taken by the OS.
    
        But there are  other things as well.  If we  pass a _file_name_ to
        the OS, we may need to  pay attention to slashes (and for some ob-
        scure  OSes, backslashes) and  double-dots as  well, as  they will
        switch context from _file_ to _directory_.
    
        And hundreds  of other examples  on how talking to  one particular
        sub-function (eg. open())  of a sub-system (eg. the  OS) will need
        careful handling of a selected set of characters.
    
      * and then comes the browser  again.  The HTML parser in the browser
        gives  special meaning  to  < (tag  start)  , >  (tag  end) and  &
        (character entity).  And if inside those < and >, suddenly " and '
        (both attribute  value encapsulators)  may have a  special meaning
        too.   We'll need to  escape them  somehow, so  that they  are not
        treated  as special  characters, but  rather as  plain characters.
        The correct  way is to  use HTML encoding  (as most of  you know).
        The  wrong way (generally)  is to  replace the  special characters
        with nothing.  Imagine all the complaints you will get if you make
        a discussion forum for mathematicians, and disallow < and > ...
    
    It  is generally  _not_possible_ to  fetch data  from the  request and
    start by doing  something to it that will match  all the possible sub-
    systems in one go.  Not  without giving severe restrictions as to what
    the data may contain.  ("Sorry, Sinead,  but your name will have to be
    OConnor for  now").  And  not without introducing  strange appearances
    for some of the sub-systems.  ("Welcome, Sinead O\'Connor").
    
    Input validation has  been given _far_ to much focus.   It may be good
    as a first  measure, to be able to give users  nice feedback when data
    don't match the  business rules and other high  level rules ("the file
    name is not supposed to contain directory elements"), but it generally
    won't solve the low level problems.  In systems over toy size, data is
    passed between many different  sub-systems, which often have different
    meta-characters  that may be  abused.  People  who believe  that input
    validation at the web layer  will avoid security problems several lay-
    ers down below (or when data come back to the first layer again), have
    given the issue too little thought, IMNSHO.
    
    Focus on input validation, but focus even more on handling every poss-
    ible meta-character,  meta-byte, meta-word or  whatever before passing
    the data  along to  the next sub-system,  whatever that is.   And that
    rule goes for every layer of the application, not just the web layer.
    
    
    Sverre - who feels this discussion  would fit better at webappsec than
             at vuln-dev.
    
    -- 
    shhat_private		Computer Geek?  Try my Nerd Quiz
    http://shh.thathost.com/	http://nerdquiz.thathost.com/
    



    This archive was generated by hypermail 2b30 : Wed Oct 16 2002 - 14:24:47 PDT