Re: VNC authentication weakness

From: Kragen Sitaker (kragenat_private)
Date: Fri Jul 26 2002 - 02:15:40 PDT

  • Next message: http-equivat_private: "WHERE'S THE CA$H: Internet Explorer 6.00. Outlook Express 6.00"

    David Wagner writes:
    > Andreas Beck  wrote:
    > >DONT do this [use /dev/urandom to generate VNC challenges]. This will
    > >deplete the random pool for no good reason.
    > 
    > I believe your comment is incorrect or misleading.  I believe /dev/urandom
    > will not deplete the randomness pool, and I suspect using /dev/urandom
    > is a good idea.  Try it.
    > 
    > Don't confuse /dev/random with /dev/urandom.  /dev/random depletes
    > the randomness pool, but should rarely be used in applications, IMHO.
    > /dev/urandom is typically the right choice, and continues to give output
    > without blocking to wait for more entropy input.
    > 
    > (It is true that if you call /dev/urandom, other users of /dev/random
    > may find that there is no entropy left for them.  But this should only
    > be an issue for users of /dev/random, and the latter should be rare.
    
    So, for what it's worth, on my Linux 2.4.13 system, /dev/random seems
    to have about 2100 bytes of randomness to start with, and accumulate
    new randomness at about 13 bytes per second on a relatively idle
    system.
    
    I know you know the following, but I think your explanation of it
    above is confusing, so I have undertaken to restate it:
    
    /dev/urandom *does* deplete the entropy pool; it's just that depletion
    of the entropy pool doesn't keep /dev/urandom from working, but it
    does keep /dev/random from working.
    
    My understanding is that /dev/random is only more secure than
    /dev/urandom if it turns out that MD5 has undiscovered weaknesses; is
    that correct?
    
    > >A challenge only needs to be _different_ each time.
    > 
    > Not so.  Challenges should be unpredictable.  A counter would not
    > make a very good challenge.  It can't hurt, and is probably a good
    > idea, to simply use crypto-quality random numbers in your challenge.
    
    Well, a challenge that is different each time will effectively foil
    replay attacks --- even a simple counter will do for that.  What kind
    of attack do you have in mind that would be easier if the challenge
    were a simple counter?  (Assuming, of course, that the counter never
    restarted from 0.)
    
    
    I was actually working on a Zope project last night in which I wanted
    to avoid the authorized-user-taking-unintended-action problem --- you
    know, the one where the attacker tricks a web site administrator into
    clicking a button on the attacker's web site that makes the
    administrator's browser POST a form, with the administrator's
    credentials, to the administrator's web site?  I embedded a random
    string in the (legitimate) form, and when the form was posted, checked
    to see if the random string in the request was one the server had sent
    out.  Thus, for an attacker to produce a fake form, they must obtain
    the value of such a random string, either by sniffing the wire running
    to the administrator's browser, or by guessing the random string.
    
    On Linux, I just use /dev/urandom, as you recommend.  On
    /dev/urandom-less systems, I settled on the following.  I don't know
    how safe it is, and I'd be interested in comments.
    
    import sys, os, sha, time
    counter = 0
    def random_hex_string():
        ...
            # try to capture some of the memory allocation state:
            thousandlists = [[] for ii in range(1000)]
            # and other interpreter state:
            counts = [sys.getrefcount(ii) for ii in range(-2, 100)]
            charcounts = [sys.getrefcount(chr(ii)) for ii in range(256)]
            # and something that's guaranteed to be different each time
            global counter
            counter += 1
            
            randomdata = ','.join(map(str,
                                      [os.getpid(),
                                       time.time(),
                                       time.clock(),
                                       map(id, thousandlists),
                                       counts, charcounts, counter,
                                       # some large chunk of application data here
                                       ]))
            return sha.sha(randomdata).hexdigest()
    
    'counter' is used to prevent any repetition, no matter how frequently
    "random" strings are generated.
    
    os.getpid() provides no entropy to someone who can log into the
    machine, since they can find out the PID of the process with little
    difficulty, and someone with intimate knowledge of the inner workings
    of the machine may be able to narrow down the PID possibilities
    considerably.  (For example, if the program starts when the machine
    boots, it may always have the same PID.)  But it will often provide a
    few bits of entropy, at least four or five.
    
    time.time() provides a floating-point version of the current time,
    generally with subsecond resolution, but usually with much worse than
    millisecond resolution.  This probably provides a few bits of entropy
    (probably at least four or so, possibly as many as a dozen or so) for
    someone who can't log into the machine this code is running on,
    because it's relatively hard for them to determine what time the
    machine thinks it is --- its time may not be set very accurately.
    Note that this program takes several seconds to restart, so
    time.time() plus the counter prevents repetition, if nothing else.
    
    time.clock() provides a floating-point number of CPU seconds this
    process has consumed; this is included because it may be harder to
    guess than time.time() --- it depends on how many requests this thread
    has handled and how much work they took --- and because it usually has
    higher resolution than time.time(), although it advances much more
    slowly.  It will provide very little entropy when the attacker knows
    the process has just started, but much more when the process has been
    running for a long time, or may have been running for a long time.
    So, to some extent, this counterbalances os.getpid() --- it will tend
    to provide more entropy when os.getpid() provides little, and
    os.getpid() will tend to provide more entropy when time.clock()
    provides little.  time.clock() can usually also be estimated by an
    attacker who is logged into the same machine.  In my application, I
    guess that time.clock() provides around ten bits of entropy.
    
    'counts' and 'charcounts' are the number of references to the numbers
    -2 through 100 and to the various single-character strings; these may
    depend on the history of the program.  Python optimizes a little bit
    by keeping only one copy of each number in the range [-1, 99] and each
    single-character string, but it counts how many.  In this program, I
    think this depends sensitively on the exact data stored in the
    program, which changes each day, as well as on the history of requests
    the program has handled.  I think this probably provides at least ten
    bits of entropy, possibly much more.
    
    'thousandlists' is a list of a thousand newly-created empty lists.  We
    take their memory addresses.  This depends on the state of the Python
    memory allocator.  In this program, I suspect that this changes
    considerably after every request, but I don't really have a good way
    of measuring that; but I am guessing that at least the first couple of
    hundred of these lists are allocated at somewhat unpredictable places
    and therefore provide a couple of bits of entropy each.  But I'm not
    very sure of this, and obviously it depends on the program; that's why
    I included all the other stuff above.
    
    Then we take all of these things, plus several megabytes of actual
    application data, join their string representations with commas, and
    hash them with SHA-1.
    
    Does anyone have a better solution that doesn't involve calling
    entropy-gathering routines from all over the program or running a
    continuous entropy-gathering thread?  Are there any big problems in
    this solution, other than that it only has (by my pessimistic
    estimates) about 28 bits of entropy if my "thousandlists" trick isn't
    really very effective?  28 bits is probably sufficient for my
    purposes.  Is there some much simpler solution I could have more
    confidence in?
    
    -- 
    <kragenat_private>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
    What we *need* is for some advanced off-world sentience to carpet nuke planet
    Earth from high orbit.  Call it Equal Opportunity Ethnic Cleansing.  I mean,
    racism is so petty.  Why play favorites?  -- RageBoy
    



    This archive was generated by hypermail 2b30 : Sun Jul 28 2002 - 01:06:19 PDT