Code-Red: An analytic model of its spread

From: Stuart Staniford (stuartat_private)
Date: Fri Jul 20 2001 - 00:48:30 PDT
Next message: kawaii: "Re: Jetdirect card Attack???--Followup"
Previous message: Ryan Russell: "Re: CodeRed"
Next in thread: Stuart Staniford: "Re: Code-Red: An analytic model of its spread"
Reply: Stuart Staniford: "Re: Code-Red: An analytic model of its spread"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I have worked up a tentative quantitative theory of what happened with the
spread of the Code-Red worm on Thursday July 17th (yesterday, for purposes of
this email).  Although I can't prove it, my theory seems to explain the limited
data I have fairly well, and to support a slightly novel conclusion.  I'd really
appreciate getting more complete data from anyone who has it.  This is not
exactly peer-reviewed science at this point, but rather late night, hasty math
which hopefully won't turn out to be too wrong in the morning.

What I believe happened is that sometime yesterday morning (US time), someone
released a new version of Code-Red.  At a minimum, the new version had a
corrected random number generator and seeding algorithm.  The new version spread
very rapidly until almost all vulnerable IIS servers on the Internet were
compromised.  It stopped trying to spread at midnight UTC (5pm in California
where I am).  I do not know in what other ways the new version might have been
different from the earlier one.

Background.  

Code-Red was first seen in the wild on July 13th, according to Ryan Permeh and
Marc Maiffret of Eeye Digital Security who analyzed it at

http://www.eeye.com/html/Research/Advisories/AL20010717.html and 
http://www.eeye.com/html/advisories/codered.zip

According to Eeye, Code-Red spreads by compromising Microsoft IIS web servers
using the .ida vulnerability discovered also by Eeye and published June 18th:

http://www.eeye.com/html/Research/Advisories/AD20010618.html

Once it has infected a host, Code-Red spreads by launching 99 threads which
generate random IP addresses, and then try to compromise that IP address using
the same vulnerability.  A hundredth thread defaces the web server in some
cases.

However, the worm analyzed by Eeye has what seems like a bug.  The random number
generator is initialized with a fixed seed, so that all copies of the worm in
all threads, on all hosts, will attempt to generate and compromise exactly the
same sequence of IP addresses.

The spread data.

The first thing that caught my attention was that our web server at Silicon
Defense only showed Code-Red attempts against it on July 19th.  There were no
earlier attempts, but there were over 20 attempts on the 19th.  Several other
correspondents reported a very similar pattern.

Next, I found the following data from Ken Eichmann of cas.org 
at http://www.incidents.org/diary/diary.php.
 
		   Hour    # Code Red Worm Scans     # Scanning During the Hour
                    ------  ---------------------   -------------------------
                     00           12699                     2450
                     01           13059                     2577
                     02           13272                     2590
                     03           13056                     2564
                     04           13283                     2632
                     05           13229                     2612
                     06           13554                     2601
                     07           13517                     2608
                     08           13746                     2685
                     09           16819                     3325
                     10           36589                     7838
                     11          116083                    26823
                     12          295348                    68085
                     13          466542                   103522
                     14          520973                   113451
                     15          513513                   115124
                     16          513894                    90931 

These first column is the hour of the day (I believe in Columbus, Ohio).  The
second is the number of port 80 probes, and the third is the number of
independent source IPs giving rise to those probes.  In the Excel spreadsheet
attached to this mail, there is a graph which shows the probe count in dark
blue, and the IP count in pink. 

I will now develop two simple theories of the worm spread.  The first, which
I'll call the Random Spread (or RS) theory assumes that the worm had a proper
random number generator that was seeded with basically random information.  The
second, which I'll call the Constrained Spread (or CS) theory, assumes that the
seeding works as Eeye describes - every copy of the worm starts with the same
fixed seed and the same random number generation algorithm.  The next part of
the discussion assumes a little calculus knowledge, but you can probably skip
over that part with limited loss of comprehension.  I'll subsequently apply this
theory to the data above and argue that the RS theory can explain it, but the CS
theory cannot.

I'll start with the RS theory.  I define the following parameters:

N is the total number of vulnerable IIS servers which can be potentially
compromised from the Internet.  (I assume N is fixed - ignoring both patching of
systems during the worm spread and normal deploying and removing of systems.  I
also ignore any spread of the worm behind firewalls on private Intranets).

K is the initial compromise rate.  That is, the number of vulnerable hosts which
an infected host can find and compromise per hour at the start of the incident
when few other hosts are compromised.  I assume that K is a global constant, and
does not depend on the processor speed, network connection, location, etc of the
infected machine.  (Constant K is an approximation).  I assume that a
compromised machine picks other machines to attack completely at random.  I
assume that once a machine is compromised, it cannot be compromised again, or
that if it is, that does not increase the rate at which it can find and attack
new systems.  I assume that once it is compromised, it stays that way.

T is a time which fixes when the incident happens.

I then have the following variables:

a is the proportion of vulnerable machines which have been compromised.

t is the time (in hours).

Now, I analyze the problem by assuming that at some particular time t, a
proportion of the machines a have been compromised, and then asking how many
more machines N*da, will get compromised in the next amount of time dt.  The
answer is

N * da = a N K dt (1-a)

The reason is that the number of machines compromised in the next increment of
time is proportional to the number of machines already compromised (N*a), times
the rate at which a compromised machine can compromise new machines K (1-a),
times the increment of time dt.  (Note that machines can compromise K others per
unit time to begin with, but only K*(1-a) once a proportion of other machines
are compromised already.

This give us the differential equation

da/dt = K*a*(1-a)

Bringing to bear my extremely rusty mathematical skills, I found that the
solution to that equation is

a = Exp(K * (t-T))/ (1 + Exp(K * (t-T)))

where T is a constant of integration that fixes the time position of the
incident.

This is an interesting equation.  For early t (significantly before T), a grows
exponentially.  For large t (significantly after T), a goes to 1 (all machines
are compromised.  The rate at which this happens depends only on K (the rate at
which one machine can compromise others), and not at all on the number of
machines.  This is interesting because it tells us that a worm like this can
compromise all vulnerable machines on the Internet fairly fast.

In the attached spreadsheet, the yellow curve in the chart is a manually made
fit to the cas.org scan data using the RS model above.  (I assume that the
number of probes is directly proportional to the fraction of machines
compromised).  Inspection of the graph will show that the fit is really quite
nice.  So the RS model can explain the data well.  Basically, I fit it to fall
slightly below the data curve, since it seems there is a fixed background rate
of web probes that was going on before the rapid rise due to the worm spread.

It uses a K of 1.8 - at the outset, a compromised IIS server was able to find
and compromise 1.8 other servers per hour.

You will notice an important implication of using this model to explain the
data.  The probe level due to the theoretical worm was negligible in the early
hours of the morning (see column D of the spreadsheet).  Therefore, if the RS
model is reasonably valid, the worm that caused this rapid upsurge in traffic
*cannot* have been a worm that was running around for days.  It must have been
released in the morning of the 17th.

Ok, now turning to the CS theory (constrained spread, as according to the Eeye
disassembly of the worm).  If indeed it's the case that all copies of the worm
use the same seed and RNG, then all copies of the worm attempt to compromise
exactly the same sequence of IP addresses.  If we assume that all copies work at
the same rate (a simplifying assumption we made in the RS analysis), then no
copy will ever catch up with the first instance of the worm.  In practice, other
copies may get a little ahead of the first one, but basically all copies are
working through the exact same list in parallel so progress will be linear to
begin with.

The rate at which such a worm will be able to compromise systems will be K
(1-a), total.  Since K is likely to be very small indeed compared to N, a will
stay negligible for a very long time, and the rise in the number of systems will
be very slow and linear.  I did not attempt to fit such a model to the data,
because it seems obviously hopeless.  If Eeye has understood the worm they
analyzed correctly, and I have understood their analysis, I don't believe there
is any way that worm can have caused the rapid rise in probes we saw yesterday.

Conclusion:

If I'm rights, this was (and probably still is) a pretty scary worm.  Within a
few hours of launch, it gained control of almost all vulnerable IIS servers
(which was probably a sizeable fraction of IIS servers).  That degree of control
could have been used to do enormous damage in any large number of ways.  It also
illustrates the incredible power that worms have to rapidly gain control of huge
numbers of machines on the Internet.

Thanks:

My thanks to my coworker Roel Jonkman, who sat up late in the night with me and
pored over Eeye's dissassembled code.  Thanks to Laura Tinnel of Teknowledge,
and Russell Fulton of the University of Auckland for helpful reports of their
experience of the worm.

Stuart.

-- 
Stuart Staniford     ---     President     ---     Silicon Defense
         ** Silicon Defense: Technical Support for Snort **
mailto:stuartat_private  http://www.silicondefense.com/
(707) 445-4355 x 16                           (707) 445-4222 (FAX)


----------------------------------------------------------------------------


This list is provided by the SecurityFocus ARIS analyzer service.
For more information on this free incident handling, management 
and tracking system please see:

http://aris.securityfocus.com
Next message: kawaii: "Re: Jetdirect card Attack???--Followup"
Previous message: Ryan Russell: "Re: CodeRed"
Next in thread: Stuart Staniford: "Re: Code-Red: An analytic model of its spread"
Reply: Stuart Staniford: "Re: Code-Red: An analytic model of its spread"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This archive was generated by hypermail 2b30 : Fri Jul 20 2001 - 09:48:17 PDT