Dissecting E-Mail Malware

I’ve been receiving a lot of mails like this recently:

Malware mail

It reads:

Dear Sir or Madam,

attached you’ll find a new order from BSH Household Appliances LLC. Please include the order number on your invoice as a payment reference for us.

Kind regards,
BSH Order Center


This E-Mail has been checked for viruses by Avast Antivirus-Software

As you can see it includes a word file which I had no doubt was not legit – despite the nice claim that this mail has been checked by Avast snakeoil anti-virus.

Fun fact: At the time I got this Avast would have given that Word file a clean bill of health according to VirusTotal. Out of 54 scanners, only one did recognize this as malware via a heuristic. That is, among other reasons, why I think that virus scanners are a bunch of crap causing more harm then good. But that is a completely different issue.

The one thing the fake “checked by Avast”-line did for me: It peeked my curiosity for some reason. What was lurking in that file? What does it do? I wanted to find out especially since I’ve never done a forensic analysis of something like this but always wanted to try. Here’s how I approached it:

A first look

The first thing I did was: Just look at the file using good ol’ “less”. It revealed a bunch of HTML and one thing specifically peeked my interest:

Terminal screenshot

Part of the attached word document as plain-text

This looks very much like a Base64 encoded binary. So I wrote myself a small PHP script to decode that:

That was simple enough. Now, lets explore that further. Calling “file decoded.bin” just revealed “decoded.bin: data”. Looking at that the file with less just reveals garbage and “strings decoded.bin” revealed only one interesting string: “ActiveMime”. Little to go on.

Bring the hex editor!

The next obvious step was to look at the file with the eyes of a computer. I know that the first few bytes of a file usually reveal what it is. This is also how the “file” command identifies files (as far as I know). So I  launch emacs, which has conveniently has a hex editor build in: hexl-mode. Here’s the fist few bytes of my decoded base64 binary:

Since “file” did not reveal anything by looking at the first few bytes, I did not read much into that “ActiveMime” string there. It might serve a purpose exploiting MS Word but I’ve ignored it for now. More interesting is the “789c” because after it the file seems to really take off. I concluded that this must be the real start of the file because there are no more zeros from there on –  Just a lot of unreadable data.

Googling “0x789c” quickly revealed that that is a header for a zlib compressed data stream. The next logical step is: Decompress it. First, we need to truncate the first 50 bytes so the file starts with “789c”. This can simply be done using dd:

Now that the file is a valid zlib file, we can uncompress it. PHP can help you here too:

Now lets look at the file again. “file payload.raw” now reveals “payload.raw: CDF V2 Document, No summary info”. The first bytes of our payload look like this in the hex editor:

The file header and the file command indicate, that this is a “Microsoft Compound Binary File”. As far as I understand it, this is a container used by the old binary “.doc”, “.xls” and “.ppt” formats that were common before the introduction of Office Open XML in MS Office 2007.

It is an interesting format containing a FAT like filesystem pointing to multiple data streams for an application to process. You can read more about the binary file format here. It is quite an interesting design.

The evil VBA macro

I’ve never processed binary files myself without the help of a library and really did not feel like starting to learn that now. So I opted to look for ready made tools to use with that file format. The ForensicsWiki pointed me to the python oletools package. A huge shoutout to the author of this brilliant tool! If I ever decide to go into the forensics business, I would probably add this to my toolbox.

With oletools, a simple command is needed to display all VBA macros of that binary document:

Unfortunately, the source code is obfuscated. Variable and function names have been replaced by random strings like Ycra1FCZhO making the source very difficult to read. It was however clear, that the obfuscated source does build an URL using a bunch of loops. Then, it downloads the contents of that URL which happens to be an EXE-File. Once that is done, it runs the EXE-File which then in turn infects the PC.

Rather then messing with the source and trying to make it readable, I just installed a Windows 8.1 evaluation VM with MS Office 2013 trial, Wireshark and Sysinternals Process Explorer. My plan was to simply infect the machine and see what happens.

This is part of the network traffic after opening the infected file:

Malware download attempt

Malware download attempt

As you can see, I was too slow. The file has already been taken down. I could observe though, that my initial assumption was correct as the macro virus did try to run an EXE file that presumably would have been downloaded from this server.

There is nothing more for me to find here. I’d love to find out what the actual virus would have done to the system. Anyway: It has been a fun little project between Christmas eve and 32C3. I certainly did learn a lot about computer forensics and analyzing files and most importantly: it was fun!

If you are interested, you can download every file I’ve mentioned. But beware: This download contains the actual malicious document “invoice76940509.doc” which tries to infect your system. Handle with care. The password for the zip file is: “containsviruses”.