I have worked up a tentative quantitative theory of what happened with the spread of the Code-Red worm on Thursday July 17th (yesterday, for purposes of this email). Although I can't prove it, my theory seems to explain the limited data I have fairly well, and to support a slightly novel conclusion. I'd really appreciate getting more complete data from anyone who has it. This is not exactly peer-reviewed science at this point, but rather late night, hasty math which hopefully won't turn out to be too wrong in the morning. What I believe happened is that sometime yesterday morning (US time), someone released a new version of Code-Red. At a minimum, the new version had a corrected random number generator and seeding algorithm. The new version spread very rapidly until almost all vulnerable IIS servers on the Internet were compromised. It stopped trying to spread at midnight UTC (5pm in California where I am). I do not know in what other ways the new version might have been different from the earlier one. Background. Code-Red was first seen in the wild on July 13th, according to Ryan Permeh and Marc Maiffret of Eeye Digital Security who analyzed it at http://www.eeye.com/html/Research/Advisories/AL20010717.html and http://www.eeye.com/html/advisories/codered.zip According to Eeye, Code-Red spreads by compromising Microsoft IIS web servers using the .ida vulnerability discovered also by Eeye and published June 18th: http://www.eeye.com/html/Research/Advisories/AD20010618.html Once it has infected a host, Code-Red spreads by launching 99 threads which generate random IP addresses, and then try to compromise that IP address using the same vulnerability. A hundredth thread defaces the web server in some cases. However, the worm analyzed by Eeye has what seems like a bug. The random number generator is initialized with a fixed seed, so that all copies of the worm in all threads, on all hosts, will attempt to generate and compromise exactly the same sequence of IP addresses. The spread data. The first thing that caught my attention was that our web server at Silicon Defense only showed Code-Red attempts against it on July 19th. There were no earlier attempts, but there were over 20 attempts on the 19th. Several other correspondents reported a very similar pattern. Next, I found the following data from Ken Eichmann of cas.org at http://www.incidents.org/diary/diary.php. Hour # Code Red Worm Scans # Scanning During the Hour ------ --------------------- ------------------------- 00 12699 2450 01 13059 2577 02 13272 2590 03 13056 2564 04 13283 2632 05 13229 2612 06 13554 2601 07 13517 2608 08 13746 2685 09 16819 3325 10 36589 7838 11 116083 26823 12 295348 68085 13 466542 103522 14 520973 113451 15 513513 115124 16 513894 90931 These first column is the hour of the day (I believe in Columbus, Ohio). The second is the number of port 80 probes, and the third is the number of independent source IPs giving rise to those probes. In the Excel spreadsheet attached to this mail, there is a graph which shows the probe count in dark blue, and the IP count in pink. I will now develop two simple theories of the worm spread. The first, which I'll call the Random Spread (or RS) theory assumes that the worm had a proper random number generator that was seeded with basically random information. The second, which I'll call the Constrained Spread (or CS) theory, assumes that the seeding works as Eeye describes - every copy of the worm starts with the same fixed seed and the same random number generation algorithm. The next part of the discussion assumes a little calculus knowledge, but you can probably skip over that part with limited loss of comprehension. I'll subsequently apply this theory to the data above and argue that the RS theory can explain it, but the CS theory cannot. I'll start with the RS theory. I define the following parameters: N is the total number of vulnerable IIS servers which can be potentially compromised from the Internet. (I assume N is fixed - ignoring both patching of systems during the worm spread and normal deploying and removing of systems. I also ignore any spread of the worm behind firewalls on private Intranets). K is the initial compromise rate. That is, the number of vulnerable hosts which an infected host can find and compromise per hour at the start of the incident when few other hosts are compromised. I assume that K is a global constant, and does not depend on the processor speed, network connection, location, etc of the infected machine. (Constant K is an approximation). I assume that a compromised machine picks other machines to attack completely at random. I assume that once a machine is compromised, it cannot be compromised again, or that if it is, that does not increase the rate at which it can find and attack new systems. I assume that once it is compromised, it stays that way. T is a time which fixes when the incident happens. I then have the following variables: a is the proportion of vulnerable machines which have been compromised. t is the time (in hours). Now, I analyze the problem by assuming that at some particular time t, a proportion of the machines a have been compromised, and then asking how many more machines N*da, will get compromised in the next amount of time dt. The answer is N * da = a N K dt (1-a) The reason is that the number of machines compromised in the next increment of time is proportional to the number of machines already compromised (N*a), times the rate at which a compromised machine can compromise new machines K (1-a), times the increment of time dt. (Note that machines can compromise K others per unit time to begin with, but only K*(1-a) once a proportion of other machines are compromised already. This give us the differential equation da/dt = K*a*(1-a) Bringing to bear my extremely rusty mathematical skills, I found that the solution to that equation is a = Exp(K * (t-T))/ (1 + Exp(K * (t-T))) where T is a constant of integration that fixes the time position of the incident. This is an interesting equation. For early t (significantly before T), a grows exponentially. For large t (significantly after T), a goes to 1 (all machines are compromised. The rate at which this happens depends only on K (the rate at which one machine can compromise others), and not at all on the number of machines. This is interesting because it tells us that a worm like this can compromise all vulnerable machines on the Internet fairly fast. In the attached spreadsheet, the yellow curve in the chart is a manually made fit to the cas.org scan data using the RS model above. (I assume that the number of probes is directly proportional to the fraction of machines compromised). Inspection of the graph will show that the fit is really quite nice. So the RS model can explain the data well. Basically, I fit it to fall slightly below the data curve, since it seems there is a fixed background rate of web probes that was going on before the rapid rise due to the worm spread. It uses a K of 1.8 - at the outset, a compromised IIS server was able to find and compromise 1.8 other servers per hour. You will notice an important implication of using this model to explain the data. The probe level due to the theoretical worm was negligible in the early hours of the morning (see column D of the spreadsheet). Therefore, if the RS model is reasonably valid, the worm that caused this rapid upsurge in traffic *cannot* have been a worm that was running around for days. It must have been released in the morning of the 17th. Ok, now turning to the CS theory (constrained spread, as according to the Eeye disassembly of the worm). If indeed it's the case that all copies of the worm use the same seed and RNG, then all copies of the worm attempt to compromise exactly the same sequence of IP addresses. If we assume that all copies work at the same rate (a simplifying assumption we made in the RS analysis), then no copy will ever catch up with the first instance of the worm. In practice, other copies may get a little ahead of the first one, but basically all copies are working through the exact same list in parallel so progress will be linear to begin with. The rate at which such a worm will be able to compromise systems will be K (1-a), total. Since K is likely to be very small indeed compared to N, a will stay negligible for a very long time, and the rise in the number of systems will be very slow and linear. I did not attempt to fit such a model to the data, because it seems obviously hopeless. If Eeye has understood the worm they analyzed correctly, and I have understood their analysis, I don't believe there is any way that worm can have caused the rapid rise in probes we saw yesterday. Conclusion: If I'm rights, this was (and probably still is) a pretty scary worm. Within a few hours of launch, it gained control of almost all vulnerable IIS servers (which was probably a sizeable fraction of IIS servers). That degree of control could have been used to do enormous damage in any large number of ways. It also illustrates the incredible power that worms have to rapidly gain control of huge numbers of machines on the Internet. Thanks: My thanks to my coworker Roel Jonkman, who sat up late in the night with me and pored over Eeye's dissassembled code. Thanks to Laura Tinnel of Teknowledge, and Russell Fulton of the University of Auckland for helpful reports of their experience of the worm. Stuart. -- Stuart Staniford --- President --- Silicon Defense ** Silicon Defense: Technical Support for Snort ** mailto:stuartat_private http://www.silicondefense.com/ (707) 445-4355 x 16 (707) 445-4222 (FAX) ---------------------------------------------------------------------------- This list is provided by the SecurityFocus ARIS analyzer service. For more information on this free incident handling, management and tracking system please see: http://aris.securityfocus.com
This archive was generated by hypermail 2b30 : Fri Jul 20 2001 - 09:48:17 PDT