Dissecting E-Mail Malware

I’ve been receiving a lot of mails like this recently:

Malware mail

It reads:

Dear Sir or Madam,

attached you’ll find a new order from BSH Household Appliances LLC. Please include the order number on your invoice as a payment reference for us.

Kind regards,
BSH Order Center


This E-Mail has been checked for viruses by Avast Antivirus-Software

As you can see it includes a word file which I had no doubt was not legit – despite the nice claim that this mail has been checked by Avast snakeoil anti-virus.

Fun fact: At the time I got this Avast would have given that Word file a clean bill of health according to VirusTotal. Out of 54 scanners, only one did recognize this as malware via a heuristic. That is, among other reasons, why I think that virus scanners are a bunch of crap causing more harm then good. But that is a completely different issue.

The one thing the fake “checked by Avast”-line did for me: It peeked my curiosity for some reason. What was lurking in that file? What does it do? I wanted to find out especially since I’ve never done a forensic analysis of something like this but always wanted to try. Here’s how I approached it:

A first look

The first thing I did was: Just look at the file using good ol’ “less”. It revealed a bunch of HTML and one thing specifically peeked my interest:

Terminal screenshot

Part of the attached word document as plain-text

This looks very much like a Base64 encoded binary. So I wrote myself a small PHP script to decode that:

<?php
$base64payload = "..."; // String omitted for bravery

$decoded = base64_decode($base64payload);
file_put_contents("decoded.bin", $decoded);

echo "Done";

That was simple enough. Now, lets explore that further. Calling “file decoded.bin” just revealed “decoded.bin: data”. Looking at that the file with less just reveals garbage and “strings decoded.bin” revealed only one interesting string: “ActiveMime”. Little to go on.

Bring the hex editor!

The next obvious step was to look at the file with the eyes of a computer. I know that the first few bytes of a file usually reveal what it is. This is also how the “file” command identifies files (as far as I know). So I  launch emacs, which has conveniently has a hex editor build in: hexl-mode. Here’s the fist few bytes of my decoded base64 binary:

00000000: 4163 7469 7665 4d69 6d65 0000 01f0 0400  ActiveMime......
00000010: 0000 ffff ffff 3000 07f0 fa36 0000 0400  ......0....6....
00000020: 0000 0400 0000 0000 0000 0000 0000 00b4  ................
00000030: 0000 789c ed7d 0978 53c7 b5f0 dc2b 59de  ..x..}.xS....+Y.
00000040: 8d6c 2021 ac17 1382 09d8 e86a 17c1 c45a  .l !.......j...Z
00000050: 2d6f 58b6 8c17 4282 654b 5e65 4948 b2b1  -oX...B.eK^eIH..
00000060: 9d90 0802 8466 256b 6992 2624 af49 490a  .....f%ki.&$.II.
00000070: 294d d396 a64b 9496 b6bc 3669 48da b4b4  )M...K....6iH...
00000080: 7d6d 6849 daf4 b549 e97b 691f 7f5f 5ffd  }mhI...I.{i..__.
00000090: 9f99 7baf 3412 1218 c8fb dfff f818 fb5c  ..{.4..........\
000000a0: cd3d 77e6 9c33 db99 e59e 997b ec8d d213  .=w..3.....{....
000000b0: 4f7d 61ee af51 9a5b 8b64 e81f 53f9 4841  O}a..Q.[.d..S.HA
000000c0: e118 1188 5322 c4c2 8f0c e01f 5353 5312  ....S"......SSS.

Since “file” did not reveal anything by looking at the first few bytes, I did not read much into that “ActiveMime” string there. It might serve a purpose exploiting MS Word but I’ve ignored it for now. More interesting is the “789c” because after it the file seems to really take off. I concluded that this must be the real start of the file because there are no more zeros from there on –  Just a lot of unreadable data.

Googling “0x789c” quickly revealed that that is a header for a zlib compressed data stream. The next logical step is: Decompress it. First, we need to truncate the first 50 bytes so the file starts with “789c”. This can simply be done using dd:

dd if=decoded.bin of=decoded.zlib bs=50 skip=1

Now that the file is a valid zlib file, we can uncompress it. PHP can help you here too:

<?php

$compressedPayload = file_get_contents('decoded.zlib');
$decompressedPayload = zlib_decode($compressedPayload);

file_put_contents('payload.raw', $decompressedPayload);

Now lets look at the file again. “file payload.raw” now reveals “payload.raw: CDF V2 Document, No summary info”. The first bytes of our payload look like this in the hex editor:

00000000: d0cf 11e0 a1b1 1ae1 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 3e00 0300 feff 0900  ........>.......
00000020: 0600 0000 0000 0000 0000 0000 0100 0000  ................
00000030: 0100 0000 0000 0000 0010 0000 0200 0000  ................
00000040: 0300 0000 feff ffff 0000 0000 0000 0000  ................
00000050: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000060: ffff ffff ffff ffff ffff ffff ffff ffff  ................

The file header and the file command indicate, that this is a “Microsoft Compound Binary File”. As far as I understand it, this is a container used by the old binary “.doc”, “.xls” and “.ppt” formats that were common before the introduction of Office Open XML in MS Office 2007.

It is an interesting format containing a FAT like filesystem pointing to multiple data streams for an application to process. You can read more about the binary file format here. It is quite an interesting design.

The evil VBA macro

I’ve never processed binary files myself without the help of a library and really did not feel like starting to learn that now. So I opted to look for ready made tools to use with that file format. The ForensicsWiki pointed me to the python oletools package. A huge shoutout to the author of this brilliant tool! If I ever decide to go into the forensics business, I would probably add this to my toolbox.

With oletools, a simple command is needed to display all VBA macros of that binary document:

./olevba.py --decode --reveal ../../payload.raw > ../../macro.vba

Unfortunately, the source code is obfuscated. Variable and function names have been replaced by random strings like Ycra1FCZhO making the source very difficult to read. It was however clear, that the obfuscated source does build an URL using a bunch of loops. Then, it downloads the contents of that URL which happens to be an EXE-File. Once that is done, it runs the EXE-File which then in turn infects the PC.

Rather then messing with the source and trying to make it readable, I just installed a Windows 8.1 evaluation VM with MS Office 2013 trial, Wireshark and Sysinternals Process Explorer. My plan was to simply infect the machine and see what happens.

This is part of the network traffic after opening the infected file:

Malware download attempt

Malware download attempt

As you can see, I was too slow. The file has already been taken down. I could observe though, that my initial assumption was correct as the macro virus did try to run an EXE file that presumably would have been downloaded from this server.

There is nothing more for me to find here. I’d love to find out what the actual virus would have done to the system. Anyway: It has been a fun little project between Christmas eve and 32C3. I certainly did learn a lot about computer forensics and analyzing files and most importantly: it was fun!

If you are interested, you can download every file I’ve mentioned. But beware: This download contains the actual malicious document “invoice76940509.doc” which tries to infect your system. Handle with care. The password for the zip file is: “containsviruses”.