Analyzing Metasploit Payloads
In the previous Metasploit blogpost our x86 emulator was introduced along with some basic detection and decoding of Metasploit’s payloads. This post will explain why we chose the emulator approach, giving a more in-depth look into the techniques payloads use to hinder static analysis.
(Skip to the end of the blogpost if you’re solely interested in the juicy analysis results of an in-the-wild Metasploit payload on tria.ge).
Shellcode encoders have a few uses - from an exploit development standpoint they’re useful for circumventing bad characters.
Now exploit development is of course not the only use for encoders, as stated
in our previous post they’re also great to
obfuscate the real payload and circumvent some basic AV detection. In other
words, they’re useful for Red Teaming purposes. But besides that it also makes
reverse engineering the payload a bit harder. We’re now going to take a look
Call4 Dword XOR
If we run this payload through a disassembler we get the following output:
$ ndisasm -u call4_dword.bin | head -n20 00000000 29C9 sub ecx,ecx 00000002 83E9AA sub ecx,byte -0x56 00000005 E8FFFFFFFF call 0x9 0000000A C05E8176 rcr byte [esi-0x7f],byte 0x76 0000000E 0E push cs 0000000F 6AD0 push byte -0x30 00000011 D807 fadd dword [edi] 00000013 83EEFC sub esi,byte -0x4 00000016 E2F4 loop 0xc 00000018 96 xchg eax,esi 00000019 385A07 cmp [edx+0x7],bl 0000001C 6AD0 push byte -0x30 0000001E B88E8FE118 mov eax,0x18e18f8e 00000023 63E1 arpl cx,sp 00000025 80E88C sub al,0x8c 00000028 38DC cmp ah,bl 0000002A 53 push ebx 0000002B 55 push ebp 0000002C 7E5B jng 0x89 0000002E AA stosb
So what are we looking at here? We’ve disassembled the payload but we’re seeing
strange things. The assembly looks off and we don’t really see it entering a
decoding loop or something similar. This is what Call4 Dword XOR does, it
executes fine until it reaches the first call. It calls
0x9, but as you can
see that address falls within the instruction itself. So how does this work?
Well this is a way to throw you off during static analysis. At address
0x5 it calls
0x9 basically skipping 4 bytes hence the name
call4. So how
does this look from address
0x9 onward? Well let’s throw this file through
the disassembler again from offset
$ ndisasm -u call4_dword_9.bin | head -n6 00000000 FFC0 inc eax 00000002 5E pop esi 00000003 81760E6AD0D807 xor dword [esi+0xe],0x7d8d06a 0000000A 83EEFC sub esi,byte -0x4 0000000D E2F4 loop 0x3 0000000F 96 xchg eax,esi
Looking at the disassembly we can now see that we now have our decoding loop!
00000003 81760E6AD0D807 xor dword [esi+0xe],0x7d8d06a 0000000A 83EEFC sub esi,byte -0x4 0000000D E2F4 loop 0x3
As we can see the decoding loop is a rather simple one, it takes a dword
reading at offset
esi+0xe with the XOR key
0x7d8d06a. After that
instruction the payload subtracts -4 from the esi register basically increasing
it by 4. Then the code loops back to
0x3 to start this over again. The data
after this loop is all garbled for now, which will be decoded eventually.
Now this is an example of a simple encoder trying to throw you off, but after going through the simple XOR decoding loop you’ll get to the real payload.
Shikata Ga Nai
So this is the only x86 encoder that isn’t named exactly after what it does - if
we look at the list of available x86 encoders we see that this one is the only
one listed as being
Name Rank Description ---- ---- ----------- x86/add_sub manual Add/Sub Encoder x86/alpha_mixed low Alpha2 Alphanumeric Mixedcase Encoder x86/alpha_upper low Alpha2 Alphanumeric Uppercase Encoder x86/avoid_underscore_tolower manual Avoid underscore/tolower x86/avoid_utf8_tolower manual Avoid UTF8/tolower x86/bloxor manual BloXor - A Metamorphic Block Based XOR Encoder x86/bmp_polyglot manual BMP Polyglot x86/call4_dword_xor normal Call+4 Dword XOR Encoder x86/context_cpuid manual CPUID-based Context Keyed Payload Encoder x86/context_stat manual stat(2)-based Context Keyed Payload Encoder x86/context_time manual time(2)-based Context Keyed Payload Encoder x86/countdown normal Single-byte XOR Countdown Encoder x86/fnstenv_mov normal Variable-length Fnstenv/mov Dword XOR Encoder x86/jmp_call_additive normal Jump/Call XOR Additive Feedback Encoder x86/nonalpha low Non-Alpha Encoder x86/nonupper low Non-Upper Encoder x86/opt_sub manual Sub Encoder (optimised) x86/service manual Register Service x86/shikata_ga_nai excellent Polymorphic XOR Additive Feedback Encoder x86/single_static_bit manual Single Static Bit x86/unicode_mixed manual Alpha2 Alphanumeric Unicode Mixedcase Encoder x86/unicode_upper manual Alpha2 Alphanumeric Unicode Uppercase Encoder x86/xor_dynamic normal Dynamic key XOR Encoder
You might wonder why that is. One of the reasons is that this encoder uses a so-called rotating key. This means that every round of decoding the key changes, rendering us unable to extract the key once and decode the whole payload with it. To see this in action the source of the emulator was modified to print the key every time a XOR instruction is used:
$ x86emu revtcp86shik.bin | head -n15 XORing value: 1604174323 with key: 2741511143 resulting in: 4243972628 XORing value: 2690549523 with key: 2690516475 resulting in: 33512 XORing value: 1171777763 with key: 2690549987 resulting in: 3850985472 XORing value: 243476690 with key: 2246568163 resulting in: 2338635825 XORing value: 1137154372 with key: 290236692 resulting in: 1384853584 XORing value: 2005226088 with key: 1675090276 resulting in: 340953868 XORing value: 1996625659 with key: 2016044144 resulting in: 254309003 XORing value: 3061095500 with key: 2270353147 resulting in: 824593079 XORing value: 3645214029 with key: 3094946226 resulting in: 1631366399 XORing value: 966380749 with key: 431345329 resulting in: 539755132 XORing value: 954998508 with key: 971100461 resulting in: 17682369 XORing value: 1746747945 with key: 988782830 resulting in: 1391649479 XORing value: 2645559522 with key: 2380432309 resulting in: 273845079 XORing value: 352929159 with key: 2654277388 resulting in: 2335984267 XORing value: 3389606107 with key: 695294359 resulting in: 3816296780
Compared to the
call4_dword_xor the difference becomes clear:
$ x86emu call4_dword.bin | head -n 15 XORing value: 123353238 with key: 131649642 resulting in: 8579324 XORing value: 2394476650 with key: 131649642 resulting in: 2304770048 XORing value: 1662574991 with key: 131649642 resulting in: 1690317285 XORing value: 2364047585 with key: 131649642 resulting in: 2335199371 XORing value: 1431559224 with key: 131649642 resulting in: 1384844370 XORing value: 799693694 with key: 131649642 resulting in: 678595348 XORing value: 563242853 with key: 131649642 resulting in: 642430735 XORing value: 997470043 with key: 131649642 resulting in: 1017970481 XORing value: 735751179 with key: 131649642 resulting in: 738360417 XORing value: 169283914 with key: 131649642 resulting in: 231719200 XORing value: 4114225003 with key: 131649642 resulting in: 4074948353 XORing value: 1431537464 with key: 131649642 resulting in: 1384863570 XORing value: 999447418 with key: 131649642 resulting in: 1011518224 XORing value: 2143919329 with key: 131649642 resulting in: 2014399627 XORing value: 3604584585 with key: 131649642 resulting in: 3506522339
shikata_ga_nai changes the XOR key every round, while most encoders (like
call4_dword_xor) use the same key throughout the decoding process.
Another look we can take at this process is by visualising every iteration,
Another way Shikata Ga Nai makes itself harder to detect is the sheer number of encodings it can have for the same functionality. It uses FPU instructions to figure out where exactly in the memory it resides (getpc) so it can decode the payload. But looking at the source code that generates this encoder, we can see that the amount of FPU instructions the framework can choose from is a cool 100 instructions. These are chosen at random so the final payload will rarely ever have the same binary representation twice.
Putting it to practice
Now of course all this talk is exactly what it is, but how does it work in practice? We noticed the following excellent blogpost on manually deobfuscating a shikata ga nai payload. We turned the shellcode referred at the end of the blogpost into an executable (since you can’t execute shellcode directly on Windows) and uploaded it to tria.ge.
Blog post completehttps://t.co/fP1v4wow0n— David Ledbetter (@Ledtech3) February 25, 2020
Tools re-uploaded with the current versions.
Some were updated but not all.
Turning the shellcode into an executable with three (3) iterations of Shikata Ga Nai may look as follows. Note that with ten (10) iterations it would have worked as well, but for no apparent reason we did just three.
./msfvenom -p- --platform windows -a x86 -e x86/shikata_ga_nai -i3 -f exe < /tmp/9F88A4BBAFF1B8F530EE29F7226B3338 > shikata.exe
Following is the analysis: tria.ge/reports/200305-gbt2r8qgkn.
As you’ll notice the behavioral analyses don’t report too much interesting yet, although we’re working on signatures that trigger on the interesting situation of “nothing” happening at all except for one network IOC.
In the static analysis, however, our engine is going at full force,
automatically extracting the shellcode, deobfuscating it, and extracting its
one and only useful IOC (
Shellcode and shellcode encoders are an interesting and good way to bypass many solutions. In our sandbox we do our very best to automatically and correctly extract as much information and malware configuration as possible.
In our next blog post we’ll be covering a number of CobaltStrike samples that work rather similar as Metasploit payloads. In the meanwhile, feel free to submit us Metasploit and cobaltstrike payloads and perhaps we’ll send some swag your way and/or cover it in the upcoming blogpost ;-)