Analyzing Metasploit Payloads
Introduction
In the previous Metasploit blogpost our x86 emulator was introduced along with some basic detection and decoding of Metasploit’s payloads. This post will explain why we chose the emulator approach, giving a more in-depth look into the techniques payloads use to hinder static analysis.
(Skip to the end of the blogpost if you’re solely interested in the juicy analysis results of an in-the-wild Metasploit payload on tria.ge).
Anti-Analysis
Shellcode encoders have a few uses - from an exploit development standpoint they’re useful for circumventing bad characters.
Now exploit development is of course not the only use for encoders, as stated
in our previous post they’re also great to
obfuscate the real payload and circumvent some basic AV detection. In other
words, they’re useful for Red Teaming purposes. But besides that it also makes
reverse engineering the payload a bit harder. We’re now going to take a look
at the call4_dword_xor
encoder.
Call4 Dword XOR
If we run this payload through a disassembler we get the following output:
$ ndisasm -u call4_dword.bin | head -n20
00000000 29C9 sub ecx,ecx
00000002 83E9AA sub ecx,byte -0x56
00000005 E8FFFFFFFF call 0x9
0000000A C05E8176 rcr byte [esi-0x7f],byte 0x76
0000000E 0E push cs
0000000F 6AD0 push byte -0x30
00000011 D807 fadd dword [edi]
00000013 83EEFC sub esi,byte -0x4
00000016 E2F4 loop 0xc
00000018 96 xchg eax,esi
00000019 385A07 cmp [edx+0x7],bl
0000001C 6AD0 push byte -0x30
0000001E B88E8FE118 mov eax,0x18e18f8e
00000023 63E1 arpl cx,sp
00000025 80E88C sub al,0x8c
00000028 38DC cmp ah,bl
0000002A 53 push ebx
0000002B 55 push ebp
0000002C 7E5B jng 0x89
0000002E AA stosb
So what are we looking at here? We’ve disassembled the payload but we’re seeing
strange things. The assembly looks off and we don’t really see it entering a
decoding loop or something similar. This is what Call4 Dword XOR does, it
executes fine until it reaches the first call. It calls 0x9
, but as you can
see that address falls within the instruction itself. So how does this work?
Well this is a way to throw you off during static analysis. At address
0x5
it calls 0x9
basically skipping 4 bytes hence the name call4
. So how
does this look from address 0x9
onward? Well let’s throw this file through
the disassembler again from offset 0x9
:
$ ndisasm -u call4_dword_9.bin | head -n6
00000000 FFC0 inc eax
00000002 5E pop esi
00000003 81760E6AD0D807 xor dword [esi+0xe],0x7d8d06a
0000000A 83EEFC sub esi,byte -0x4
0000000D E2F4 loop 0x3
0000000F 96 xchg eax,esi
Looking at the disassembly we can now see that we now have our decoding loop!
00000003 81760E6AD0D807 xor dword [esi+0xe],0x7d8d06a
0000000A 83EEFC sub esi,byte -0x4
0000000D E2F4 loop 0x3
As we can see the decoding loop is a rather simple one, it takes a dword
reading at offset esi+0xe
with the XOR key 0x7d8d06a
. After that
instruction the payload subtracts -4 from the esi register basically increasing
it by 4. Then the code loops back to 0x3
to start this over again. The data
after this loop is all garbled for now, which will be decoded eventually.
Now this is an example of a simple encoder trying to throw you off, but after going through the simple XOR decoding loop you’ll get to the real payload.
Shikata Ga Nai
So this is the only x86 encoder that isn’t named exactly after what it does - if
we look at the list of available x86 encoders we see that this one is the only
one listed as being excellent
.
Name Rank Description
---- ---- -----------
x86/add_sub manual Add/Sub Encoder
x86/alpha_mixed low Alpha2 Alphanumeric Mixedcase Encoder
x86/alpha_upper low Alpha2 Alphanumeric Uppercase Encoder
x86/avoid_underscore_tolower manual Avoid underscore/tolower
x86/avoid_utf8_tolower manual Avoid UTF8/tolower
x86/bloxor manual BloXor - A Metamorphic Block Based XOR Encoder
x86/bmp_polyglot manual BMP Polyglot
x86/call4_dword_xor normal Call+4 Dword XOR Encoder
x86/context_cpuid manual CPUID-based Context Keyed Payload Encoder
x86/context_stat manual stat(2)-based Context Keyed Payload Encoder
x86/context_time manual time(2)-based Context Keyed Payload Encoder
x86/countdown normal Single-byte XOR Countdown Encoder
x86/fnstenv_mov normal Variable-length Fnstenv/mov Dword XOR Encoder
x86/jmp_call_additive normal Jump/Call XOR Additive Feedback Encoder
x86/nonalpha low Non-Alpha Encoder
x86/nonupper low Non-Upper Encoder
x86/opt_sub manual Sub Encoder (optimised)
x86/service manual Register Service
x86/shikata_ga_nai excellent Polymorphic XOR Additive Feedback Encoder
x86/single_static_bit manual Single Static Bit
x86/unicode_mixed manual Alpha2 Alphanumeric Unicode Mixedcase Encoder
x86/unicode_upper manual Alpha2 Alphanumeric Unicode Uppercase Encoder
x86/xor_dynamic normal Dynamic key XOR Encoder
You might wonder why that is. One of the reasons is that this encoder uses a so-called rotating key. This means that every round of decoding the key changes, rendering us unable to extract the key once and decode the whole payload with it. To see this in action the source of the emulator was modified to print the key every time a XOR instruction is used:
$ x86emu revtcp86shik.bin | head -n15
XORing value: 1604174323 with key: 2741511143 resulting in: 4243972628
XORing value: 2690549523 with key: 2690516475 resulting in: 33512
XORing value: 1171777763 with key: 2690549987 resulting in: 3850985472
XORing value: 243476690 with key: 2246568163 resulting in: 2338635825
XORing value: 1137154372 with key: 290236692 resulting in: 1384853584
XORing value: 2005226088 with key: 1675090276 resulting in: 340953868
XORing value: 1996625659 with key: 2016044144 resulting in: 254309003
XORing value: 3061095500 with key: 2270353147 resulting in: 824593079
XORing value: 3645214029 with key: 3094946226 resulting in: 1631366399
XORing value: 966380749 with key: 431345329 resulting in: 539755132
XORing value: 954998508 with key: 971100461 resulting in: 17682369
XORing value: 1746747945 with key: 988782830 resulting in: 1391649479
XORing value: 2645559522 with key: 2380432309 resulting in: 273845079
XORing value: 352929159 with key: 2654277388 resulting in: 2335984267
XORing value: 3389606107 with key: 695294359 resulting in: 3816296780
Compared to the call4_dword_xor
the difference becomes clear:
$ x86emu call4_dword.bin | head -n 15
XORing value: 123353238 with key: 131649642 resulting in: 8579324
XORing value: 2394476650 with key: 131649642 resulting in: 2304770048
XORing value: 1662574991 with key: 131649642 resulting in: 1690317285
XORing value: 2364047585 with key: 131649642 resulting in: 2335199371
XORing value: 1431559224 with key: 131649642 resulting in: 1384844370
XORing value: 799693694 with key: 131649642 resulting in: 678595348
XORing value: 563242853 with key: 131649642 resulting in: 642430735
XORing value: 997470043 with key: 131649642 resulting in: 1017970481
XORing value: 735751179 with key: 131649642 resulting in: 738360417
XORing value: 169283914 with key: 131649642 resulting in: 231719200
XORing value: 4114225003 with key: 131649642 resulting in: 4074948353
XORing value: 1431537464 with key: 131649642 resulting in: 1384863570
XORing value: 999447418 with key: 131649642 resulting in: 1011518224
XORing value: 2143919329 with key: 131649642 resulting in: 2014399627
XORing value: 3604584585 with key: 131649642 resulting in: 3506522339
Concluding: shikata_ga_nai
changes the XOR key every round, while most encoders (like
call4_dword_xor
) use the same key throughout the decoding process.
Another look we can take at this process is by visualising every iteration,
Another way Shikata Ga Nai makes itself harder to detect is the sheer number of encodings it can have for the same functionality. It uses FPU instructions to figure out where exactly in the memory it resides (getpc) so it can decode the payload. But looking at the source code that generates this encoder, we can see that the amount of FPU instructions the framework can choose from is a cool 100 instructions. These are chosen at random so the final payload will rarely ever have the same binary representation twice.
Putting it to practice
Now of course all this talk is exactly what it is, but how does it work in practice? We noticed the following excellent blogpost on manually deobfuscating a shikata ga nai payload. We turned the shellcode referred at the end of the blogpost into an executable (since you can’t execute shellcode directly on Windows) and uploaded it to tria.ge.
Turning the shellcode into an executable with three (3) iterations of Shikata Ga Nai may look as follows. Note that with ten (10) iterations it would have worked as well, but for no apparent reason we did just three.
./msfvenom -p- --platform windows -a x86 -e x86/shikata_ga_nai -i3 -f exe < /tmp/9F88A4BBAFF1B8F530EE29F7226B3338 > shikata.exe
Following is the analysis: tria.ge/reports/200305-gbt2r8qgkn.
As you’ll notice the behavioral analyses don’t report too much interesting yet, although we’re working on signatures that trigger on the interesting situation of “nothing” happening at all except for one network IOC.
In the static analysis, however, our engine is going at full force,
automatically extracting the shellcode, deobfuscating it, and extracting its
one and only useful IOC (5.61.59.234:8080
).
Conclusion
Shellcode and shellcode encoders are an interesting and good way to bypass many solutions. In our sandbox we do our very best to automatically and correctly extract as much information and malware configuration as possible.
In our next blog post we’ll be covering a number of CobaltStrike samples that work rather similar as Metasploit payloads. In the meanwhile, feel free to submit us Metasploit and cobaltstrike payloads and perhaps we’ll send some swag your way and/or cover it in the upcoming blogpost ;-)