Hello list[s], My small, home-brew honeypot was hit by something pretty interesting today - an automated, not published, not widely used web reconnaisance tool. I do not have a better name for it - it appears to gather information about the structure of your webserver by recursively downloading the data, querying external web crawlers (probably google.com) to include data not directly referenced on your webpages, and later, tries to brute-force certain locations on the server (such as administration scripts, logs, misc files, pr0n). It is safe to assume that further client-side processing is done to classify the contents, aggregate it and extract possibly sensitive information from the noise. Its "behavioral patterns" are very unique and pretty uncommon - this is not yet another "common cgi scripts" scanner. It seems to be designed to perform targeted attacks. I couldn't find any references to this tool, or any logs showing this kind of activity in the past. I guess many readers can find it interesting to examine their logs or analyze it further. Such tools are relatively difficult to write (and this one is far from being perfect, as you will see later), but are also very valuable for potential attackers or pen-testers. As far as I know, there are no comprehensive tools of this kind available publicly. I know that many people (including myself) have their private codes of this kind. This is also a very good proof that sufficiently challenging, customized honeypots can be used to capture targeted, smart attacks. I never thought that this really trivial installation would provide such results. The tool is apparently launched by hand against a specific host. This can be guessed by analyzing the initial behavior - the attacker first made few regular, slow connections to the server, one of them with a typo. Two minutes later, he/she launched the tool, which kept firing 5-10 HEAD and GET requests per second or such, approximately 1000 requests in total. The attack was apparently triggered by a curiosity - the server I am referring to is running some minimal http honeypot, providing bogus "secret" data to visitors. The "secret" URL was "leaked" to certain communities (egg, IRC channels). This is the initial activity (to protect my honeypot, I've changed the "secret" URL slightly): node-d-2425.a2000.nl - - [12/Mar/2002:14:15:59 +0100] "GET /privare HTTP/1.1" 404 788 node-d-2425.a2000.nl - - [12/Mar/2002:14:16:13 +0100] "GET /private%20stuff/ HTTP/1.1" 200 183 node-d-2425.a2000.nl - - [12/Mar/2002:14:16:44 +0100] "GET /privare HTTP/1.1" 404 788 node-d-2425.a2000.nl - - [12/Mar/2002:14:16:48 +0100] "GET /privare%20stuff/ HTTP/1.1" 404 788 node-d-2425.a2000.nl - - [12/Mar/2002:14:17:34 +0100] "GET /private%20stuff/pass.shtml?pass=blaat HTTP/1.1" 200 399 node-d-2425.a2000.nl - - [12/Mar/2002:14:17:43 +0100] "GET /private%20stuff/passwd.dat HTTP/1.1" 200 48938 node-d-2425.a2000.nl - - [12/Mar/2002:14:19:14 +0100] "GET /private%20stuff/index2.shtml HTTP/1.1" 200 23 node-d-2425.a2000.nl - - [12/Mar/2002:14:19:20 +0100] "GET /private%20stuff/index1.shtml HTTP/1.1" 200 23 node-d-2425.a2000.nl - - [12/Mar/2002:14:19:24 +0100] "GET /private%20stuff/index3.shtml HTTP/1.1" 404 788 As you can see, there's a gap between 14:17 and 14:19, the time attacker used to examine passwd.dat file he/she obtained from the system. Then, the scan started. Phase 1 was recursive, rapid suck of the contents: node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "GET / HTTP/1.0" 200 17421 node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "HEAD /head.jpg HTTP/1.1" 200 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "HEAD /lcam.jpg HTTP/1.1" 200 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "GET /prof.html HTTP/1.0" 200 20479 node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "GET /soft/ HTTP/1.0" 200 7966 node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "HEAD /mobp.jpg HTTP/1.1" 200 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:20:49 +0100] "GET /mobp/ HTTP/1.0" 200 15305 Note that the fingerprint of this tool is pretty unique - HTTP/1.0 GET on HTML files and directories, and HTTP/1.1 (different version!) HEAD on other file types. Interesting... All requests have 'Referer' field set to the server name (http://myhost/), and 'User-Agent' to 'Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)', which is, quite obviously, bogus. The remote system appears to run Windows right now, but I am not the administrator of this box, so I couldn't run p0f, tcpdump or such. Of interesting things, this crawler attempts to index every directory even if it is not explictly referenced in HTML code. For example, if I have a link to catspace/BIGLOG.txt on my webpage, the crawler will attempt to index catspace/ directory too: node-d-2425.a2000.nl - - [12/Mar/2002:14:20:59 +0100] "GET /catspace/ HTTP/1.0" 403 720 The crawler is rather poorly written - one of URLs on my webpage refers to http://myhost:54321/. The crawler incorrectly parses this URL into this request: node-d-2425.a2000.nl - - [12/Mar/2002:14:21:05 +0100] "GET /:54123/ HTTP/1.0" 404 748 Another bug - URLs taken from certain directory indexes have extra '/' appended at the end: node-d-2425.a2000.nl - - [12/Mar/2002:14:20:52 +0100] "GET /soft/uc.c/ HTTP/1.0" 404 748 This will keep certain files from being indexed, at least with Apache. Note that this happens only for certain directories (probably because I have different FancyIndexing settings for different directories). This seems to prove this code is not based off existing crawler and is a custom work. Then, phase 2 is brute-forcing - this phase is partially interleaved with phase 2, which suggests multithreading application: node-d-2425.a2000.nl - - [12/Mar/2002:14:21:36 +0100] "GET /2/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:36 +0100] "GET /8/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:36 +0100] "GET /5/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:36 +0100] "GET /4/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:36 +0100] "GET /123/ HTTP/1.0" 404 7 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:37 +0100] "GET /a/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:38 +0100] "HEAD /about HTTP/1.1" 404 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:38 +0100] "HEAD /account HTTP/1.1" 404 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:39 +0100] "GET /accounts/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:39 +0100] "GET /admin/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:39 +0100] "GET /adm/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:39 +0100] "GET /action.asp HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:39 +0100] "GET /ad/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:39 +0100] "HEAD /accounts HTTP/1.1" 404 0 Our first guess is that this tool might be looking for PHP scripts to exploit recent mod_php vulnerability. However many requests are not likely to contain scripts - it tries to find certificates, mails, source codes, default html files, administrative services, or... pr0n. node-d-2425.a2000.nl - - [12/Mar/2002:14:21:40 +0100] "GET /amateurs/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:40 +0100] "GET /amateur/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:40 +0100] "GET /apps/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:40 +0100] "GET /app/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:41 +0100] "GET /archives/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:41 +0100] "GET /arc/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:41 +0100] "GET /archive/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:41 +0100] "GET /asp/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:42 +0100] "GET /bank.asp HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:42 +0100] "GET /bin/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:42 +0100] "GET /binaries/ HTTP/1.0" 404 748 [...] node-d-2425.a2000.nl - - [12/Mar/2002:14:21:44 +0100] "GET /book/ HTTP/1.0" 404 748 [...] node-d-2425.a2000.nl - - [12/Mar/2002:14:21:45 +0100] "GET /certificates/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:46 +0100] "GET /code/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:47 +0100] "GET /controlpanel/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:47 +0100] "HEAD /codes HTTP/1.1" 404 0 [...] node-d-2425.a2000.nl - - [12/Mar/2002:14:21:49 +0100] "HEAD /data HTTP/1.1" 404 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:49 +0100] "HEAD /database HTTP/1.1" 404 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:49 +0100] "HEAD /debug HTTP/1.1" 404 0 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:49 +0100] "GET /Default.htm HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:50 +0100] "GET /dmr/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:50 +0100] "GET /doc/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:50 +0100] "GET /dhtml/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:50 +0100] "GET /door/ HTTP/1.0" 404 748 [...] node-d-2425.a2000.nl - - [12/Mar/2002:14:21:53 +0100] "GET /email/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:53 +0100] "GET /download/ HTTP/1.0" 404 748 node-d-2425.a2000.nl - - [12/Mar/2002:14:21:53 +0100] "GET /emails/ HTTP/1.0" 404 748 [...] The overall list of checked resources that returned 404 code: /1 /123 /2 /3 /4 /5 /6 /7 /8 /9 /a /abc /about /account /accounts /ad /adm /admin /ads /al /amateur /amateurs /ani /ani1 /anime /app /apps /appz /arc /archive /archives /asian /asians /asp /b /bin /binaries /binary /bizarre /black /book /books /c /cat /catalog /catalogs /certif /certificate /certificates /certified /certify /cgi /cgi- /cgibin /cgi-bin /cgi-win /code /codes /coding /content /contents /controlpanel /crack /cracks /ctc /d /data /database /debug /dhtml /dir /dirs /dmr /dmr1 /doc /docs /door /double /download /downloads /downloadz /driver /drivers /e /email /emails /entry /en_US /f /file /filez /final /food /forum /free /freepic /freepics /front /ftp /fuck /fucks /g /gal /galleries /gallery /galls /game /games /gamez /girl /girls /girlz /graph /graphic /graphics /graphs /h /hardcore /help /hidden /hide /home /htaccess /htdata /htdoc /htdocs /html /htpasswd /htpasswrd /i /id /ids /image /images /images_dir /imagez /index /info /j /k /l /lancelot /les /lesb /lesbian /lesbians /lesbo /lez /link /links /linkz /list /log /logs /m /mail /mails ...for some reason, the scan ended around letter 'm', so I can't determine what else would it look for, or if there are any later phases. And because the scan probably didn't provide attacker with any useful data in this case, I can't tell how would he/she attempt to use eventual information. One last thing I noticed: node-d-2425.a2000.nl - - [12/Mar/2002:14:20:51 +0100] "HEAD /soft/unicorns.tgz HTTP/1.1" 404 0 This file used to be on my server, but is no longer available there. This suggests that this tool crawls not only pages found directly, but also previously indexed and cached by other systems (such as google.com). Well, ok, enough from me, I could probably write few more pages, but I don't want to insult your intelligence or make blind guesses. Have fun! Check your logs, post your hypotesis! -- _____________________________________________________ Michal Zalewski [lcamtufat_private] [security] [http://lcamtuf.coredump.cx] <=-=> bash$ :(){ :|:&};: =-=> Did you know that clones never use mirrors? <=-= http://lcamtuf.coredump.cx/photo/ ---------------------------------------------------------------------------- This list is provided by the SecurityFocus ARIS analyzer service. For more information on this free incident handling, management and tracking system please see: http://aris.securityfocus.com
This archive was generated by hypermail 2b30 : Tue Mar 12 2002 - 10:28:38 PST