In-Depth Metasploit Payloads Analysis

Blog.

Analyzing Metasploit Payloads

Introduction

In the previous Metasploit blogpost our x86 emulator was introduced along with some basic detection and decoding of Metasploit’s payloads. This post will explain why we chose the emulator approach, giving a more in-depth look into the techniques payloads use to hinder static analysis.

(Skip to the end of the blogpost if you’re solely interested in the juicy analysis results of an in-the-wild Metasploit payload on tria.ge).

Anti-Analysis

Shellcode encoders have a few uses - from an exploit development standpoint they’re useful for circumventing bad characters.

Now exploit development is of course not the only use for encoders, as stated in our previous post they’re also great to obfuscate the real payload and circumvent some basic AV detection. In other words, they’re useful for Red Teaming purposes. But besides that it also makes reverse engineering the payload a bit harder. We’re now going to take a look at the call4_dword_xor encoder.

Call4 Dword XOR

If we run this payload through a disassembler we get the following output:

$ ndisasm -u call4_dword.bin | head -n20
00000000  29C9              sub ecx,ecx
00000002  83E9AA            sub ecx,byte -0x56
00000005  E8FFFFFFFF        call 0x9
0000000A  C05E8176          rcr byte [esi-0x7f],byte 0x76
0000000E  0E                push cs
0000000F  6AD0              push byte -0x30
00000011  D807              fadd dword [edi]
00000013  83EEFC            sub esi,byte -0x4
00000016  E2F4              loop 0xc
00000018  96                xchg eax,esi
00000019  385A07            cmp [edx+0x7],bl
0000001C  6AD0              push byte -0x30
0000001E  B88E8FE118        mov eax,0x18e18f8e
00000023  63E1              arpl cx,sp
00000025  80E88C            sub al,0x8c
00000028  38DC              cmp ah,bl
0000002A  53                push ebx
0000002B  55                push ebp
0000002C  7E5B              jng 0x89
0000002E  AA                stosb

So what are we looking at here? We’ve disassembled the payload but we’re seeing strange things. The assembly looks off and we don’t really see it entering a decoding loop or something similar. This is what Call4 Dword XOR does, it executes fine until it reaches the first call. It calls 0x9, but as you can see that address falls within the instruction itself. So how does this work? Well this is a way to throw you off during static analysis. At address 0x5 it calls 0x9 basically skipping 4 bytes hence the name call4. So how does this look from address 0x9 onward? Well let’s throw this file through the disassembler again from offset 0x9:

$ ndisasm -u call4_dword_9.bin | head -n6
00000000  FFC0              inc eax
00000002  5E                pop esi
00000003  81760E6AD0D807    xor dword [esi+0xe],0x7d8d06a
0000000A  83EEFC            sub esi,byte -0x4
0000000D  E2F4              loop 0x3
0000000F  96                xchg eax,esi

Looking at the disassembly we can now see that we now have our decoding loop!

00000003  81760E6AD0D807    xor dword [esi+0xe],0x7d8d06a
0000000A  83EEFC            sub esi,byte -0x4
0000000D  E2F4              loop 0x3

As we can see the decoding loop is a rather simple one, it takes a dword reading at offset esi+0xe with the XOR key 0x7d8d06a. After that instruction the payload subtracts -4 from the esi register basically increasing it by 4. Then the code loops back to 0x3 to start this over again. The data after this loop is all garbled for now, which will be decoded eventually.

Now this is an example of a simple encoder trying to throw you off, but after going through the simple XOR decoding loop you’ll get to the real payload.

Shikata Ga Nai

So this is the only x86 encoder that isn’t named exactly after what it does - if we look at the list of available x86 encoders we see that this one is the only one listed as being excellent.

    Name                          Rank       Description
    ----                          ----       -----------
    x86/add_sub                   manual     Add/Sub Encoder
    x86/alpha_mixed               low        Alpha2 Alphanumeric Mixedcase Encoder
    x86/alpha_upper               low        Alpha2 Alphanumeric Uppercase Encoder
    x86/avoid_underscore_tolower  manual     Avoid underscore/tolower
    x86/avoid_utf8_tolower        manual     Avoid UTF8/tolower
    x86/bloxor                    manual     BloXor - A Metamorphic Block Based XOR Encoder
    x86/bmp_polyglot              manual     BMP Polyglot
    x86/call4_dword_xor           normal     Call+4 Dword XOR Encoder
    x86/context_cpuid             manual     CPUID-based Context Keyed Payload Encoder
    x86/context_stat              manual     stat(2)-based Context Keyed Payload Encoder
    x86/context_time              manual     time(2)-based Context Keyed Payload Encoder
    x86/countdown                 normal     Single-byte XOR Countdown Encoder
    x86/fnstenv_mov               normal     Variable-length Fnstenv/mov Dword XOR Encoder
    x86/jmp_call_additive         normal     Jump/Call XOR Additive Feedback Encoder
    x86/nonalpha                  low        Non-Alpha Encoder
    x86/nonupper                  low        Non-Upper Encoder
    x86/opt_sub                   manual     Sub Encoder (optimised)
    x86/service                   manual     Register Service
    x86/shikata_ga_nai            excellent  Polymorphic XOR Additive Feedback Encoder
    x86/single_static_bit         manual     Single Static Bit
    x86/unicode_mixed             manual     Alpha2 Alphanumeric Unicode Mixedcase Encoder
    x86/unicode_upper             manual     Alpha2 Alphanumeric Unicode Uppercase Encoder
    x86/xor_dynamic               normal     Dynamic key XOR Encoder

You might wonder why that is. One of the reasons is that this encoder uses a so-called rotating key. This means that every round of decoding the key changes, rendering us unable to extract the key once and decode the whole payload with it. To see this in action the source of the emulator was modified to print the key every time a XOR instruction is used:

$ x86emu revtcp86shik.bin | head -n15
XORing value: 1604174323  with key: 2741511143  resulting in: 4243972628
XORing value: 2690549523  with key: 2690516475  resulting in: 33512
XORing value: 1171777763  with key: 2690549987  resulting in: 3850985472
XORing value: 243476690   with key: 2246568163  resulting in: 2338635825
XORing value: 1137154372  with key: 290236692   resulting in: 1384853584
XORing value: 2005226088  with key: 1675090276  resulting in: 340953868
XORing value: 1996625659  with key: 2016044144  resulting in: 254309003
XORing value: 3061095500  with key: 2270353147  resulting in: 824593079
XORing value: 3645214029  with key: 3094946226  resulting in: 1631366399
XORing value: 966380749   with key: 431345329   resulting in: 539755132
XORing value: 954998508   with key: 971100461   resulting in: 17682369
XORing value: 1746747945  with key: 988782830   resulting in: 1391649479
XORing value: 2645559522  with key: 2380432309  resulting in: 273845079
XORing value: 352929159   with key: 2654277388  resulting in: 2335984267
XORing value: 3389606107  with key: 695294359   resulting in: 3816296780

Compared to the call4_dword_xor the difference becomes clear:

$ x86emu call4_dword.bin | head -n 15
XORing value: 123353238   with key: 131649642   resulting in: 8579324
XORing value: 2394476650  with key: 131649642   resulting in: 2304770048
XORing value: 1662574991  with key: 131649642   resulting in: 1690317285
XORing value: 2364047585  with key: 131649642   resulting in: 2335199371
XORing value: 1431559224  with key: 131649642   resulting in: 1384844370
XORing value: 799693694   with key: 131649642   resulting in: 678595348
XORing value: 563242853   with key: 131649642   resulting in: 642430735
XORing value: 997470043   with key: 131649642   resulting in: 1017970481
XORing value: 735751179   with key: 131649642   resulting in: 738360417
XORing value: 169283914   with key: 131649642   resulting in: 231719200
XORing value: 4114225003  with key: 131649642   resulting in: 4074948353
XORing value: 1431537464  with key: 131649642   resulting in: 1384863570
XORing value: 999447418   with key: 131649642   resulting in: 1011518224
XORing value: 2143919329  with key: 131649642   resulting in: 2014399627
XORing value: 3604584585  with key: 131649642   resulting in: 3506522339

Concluding: shikata_ga_nai changes the XOR key every round, while most encoders (like call4_dword_xor) use the same key throughout the decoding process.

Another look we can take at this process is by visualising every iteration,

GIF showing the iterations of Shikata Ga Nai

Another way Shikata Ga Nai makes itself harder to detect is the sheer number of encodings it can have for the same functionality. It uses FPU instructions to figure out where exactly in the memory it resides (getpc) so it can decode the payload. But looking at the source code that generates this encoder, we can see that the amount of FPU instructions the framework can choose from is a cool 100 instructions. These are chosen at random so the final payload will rarely ever have the same binary representation twice.

Putting it to practice

Now of course all this talk is exactly what it is, but how does it work in practice? We noticed the following excellent blogpost on manually deobfuscating a shikata ga nai payload. We turned the shellcode referred at the end of the blogpost into an executable (since you can’t execute shellcode directly on Windows) and uploaded it to tria.ge.

Turning the shellcode into an executable with three (3) iterations of Shikata Ga Nai may look as follows. Note that with ten (10) iterations it would have worked as well, but for no apparent reason we did just three.

./msfvenom -p- --platform windows -a x86 -e x86/shikata_ga_nai -i3 -f exe < /tmp/9F88A4BBAFF1B8F530EE29F7226B3338 > shikata.exe

Following is the analysis: tria.ge/reports/200305-gbt2r8qgkn.

As you’ll notice the behavioral analyses don’t report too much interesting yet, although we’re working on signatures that trigger on the interesting situation of “nothing” happening at all except for one network IOC.

In the static analysis, however, our engine is going at full force, automatically extracting the shellcode, deobfuscating it, and extracting its one and only useful IOC (5.61.59.234:8080).

Conclusion

Shellcode and shellcode encoders are an interesting and good way to bypass many solutions. In our sandbox we do our very best to automatically and correctly extract as much information and malware configuration as possible.

In our next blog post we’ll be covering a number of CobaltStrike samples that work rather similar as Metasploit payloads. In the meanwhile, feel free to submit us Metasploit and cobaltstrike payloads and perhaps we’ll send some swag your way and/or cover it in the upcoming blogpost ;-)

You may also like: