Executing Dynamically Generated Machine Code: The Start of a JIT

30 Mar 2017

Just-in-time compilers (JITs) have to do something that most application programs never do: generate and execute machine code at runtime. In a previous post, I showed how to build an x86 assembler. Something like that can be used to construct machine code. But how can you execute that code? If you have an array of bytes, and you want the processor to execute them as machine code, how can you do that?

An example: 6 bytes of x86/x64 machine language

We need some machine code to execute. Let’s pick something simple.

The C function

uint32_t function() {
    return 0x12345678;
}

can be translated into assembly language as

mov eax, 12345678h
ret

which translates to the six-byte sequence

b8 78 56 34 12 c3

in machine code. (This is so simple it works on both x86 and x64 under Linux, Windows, and macOS.)

The right idea, which doesn’t quite work

Let’s say we have the six-byte array b8 78 56 34 12 c3 somewhere in memory. If we can treat those bytes of machine code as a function, we should be able to call it, and it should return 0x12345678.

In C, that’s not hard to do: take a pointer to the machine code, cast it as a function pointer, and call it.

#include <stdint.h>                      // ***** THIS PROGRAM WILL CRASH *****
#include <stdio.h>

uint8_t machine_code[] = { 0xB8, 0x78, 0x56, 0x34, 0x12, 0xC3 };

int main(int argc, char **argv) {
    uint32_t (*fn)() = (uint32_t (*)()) &machine_code;
    uint32_t result = fn(); // <--------------------------------- Segfault here
    printf("result = %u\n", result);
    return 0;
}

The only problem with this code is that it doesn’t work.

Segmentation fault (core dumped)

The operating system is protecting you from yourself

The code above has the right idea, but it doesn’t work because the operating system will try to prevent you from executing data. This is typically enforced by a hardware feature sometimes called an NX bit (no execute), which is enabled in macOS, Linux, and Windows. This feature first became available to consumers in 2004. At that time, buffer overflow vulnerabilities plagued the software industry. The NX bit was introduced to make it more difficult for attackers to execute arbitrary code after exploiting such vulnerabilities.

Of course, a JIT is one of the rare cases where a program wants to write data to memory and then execute it. To continue our small example, we need to convince the OS to let us do that.

Allocating memory and making it executable

The NX bit is part of the page table, which means that memory protections are usually set on a per-page basis. Memory protections for a particular page are changed using the mprotect(2) system call on Linux and macOS, and they’re changed using VirtualProtect on Windows.

To store machine code in memory and then execute it:

Allocate a new page of memory, setting its protections to allow write access. This is done via the mmap(2) system call on Linux/macOS and VirtualAlloc on Windows. These system calls allocate full pages of memory, and the returned pointer is guaranteed to be page-aligned, suitable for passing to mprotect.
Copy the bytes of machine code into the newly allocated page.
Change the protections for the newly allocated page to read+execute. Generally speaking, a page should not be writable if it is executable.
On Windows, flush the instruction cache by invoking FlushInstructionCache. Your code will probably work without it, but the documentation for VirtualProtect dictates it.
Set a function pointer to an address in the newly allocated page, then invoke the function at that address.
When you are finished, free the page using munmap(2) or VirtualFree.

Ultimately, this results in the following code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
// x86/x64 Runtime Code Generation Demonstration (Linux/macOS/Windows)
// Copyright (C) 2017 Jeffrey L. Overbey.
// 
// Permission to use, copy, modify, and/or distribute this software for any
// purpose with or without fee is hereby granted.
// 
// THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
// WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
// SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
// WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
// OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
// CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

#if !defined(_WIN32) && !defined(_WIN64) ////////////////////////// Linux/macOS

#include <stddef.h>    // size_t
#include <stdint.h>    // uint8_t, uint32_t
#include <stdio.h>     // printf
#include <string.h>    // memcpy
#include <sys/mman.h>  // mmap, mprotect, munmap, MAP_FAILURE

// Machine code for "mov eax, 12345678h" followed by "ret"
uint8_t machine_code[] = { 0xB8, 0x78, 0x56, 0x34, 0x12, 0xC3 };

int main(int argc, char **argv) {
    // Allocate a new page of memory, setting its protections to read+write
    void *mem = mmap(NULL, sizeof(machine_code), PROT_READ | PROT_WRITE,
        MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    if (mem == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    // Write the machine code into the newly allocated page
    memcpy(mem, machine_code, sizeof(machine_code));

    // Change the page protections to read+execute
    if (mprotect(mem, sizeof(machine_code), PROT_READ | PROT_EXEC) == -1) {
        perror("mprotect");
        return 2;
    }

    // Point a function pointer at the newly allocated page, then call it
    uint32_t(*fn)() = (uint32_t(*)()) mem;
    uint32_t result = fn();
    printf("result = 0x%x\n", result);

    // Free the memory
    if (munmap(mem, sizeof(machine_code)) == -1) {
        perror("munmap");
        return 3;
    }

    return 0;
}

#else // defined(_WIN32) || defined(_WIN64) /////////////////////////// Windows

#include <stddef.h>    // size_t
#include <stdint.h>    // uint8_t, uint32_t
#include <stdio.h>     // printf
#include <memory.h>    // memcpy_s
#include <tchar.h>     // Must be included before strsafe.h
#include <strsafe.h>
#include <windows.h>

// Display the error message corresponding to GetLastError() in a message box.
static void DisplayError(LPTSTR failedFunctionName);

uint8_t machine_code[] = { 0xB8, 0x78, 0x56, 0x34, 0x12, 0xC3 };

int _tmain(int argc, _TCHAR **argv) {
    // Allocate a new page of memory, setting its protections to read+write
    LPVOID mem = VirtualAlloc(NULL, sizeof(machine_code),
        MEM_COMMIT, PAGE_READWRITE);
    if (mem == NULL) {
        DisplayError(TEXT("VirtualAlloc"));
        return 1;
    }

    // Write the machine code into the newly allocated page
    if (memcpy_s(mem, sizeof(machine_code), machine_code, sizeof(machine_code))) {
        DisplayError(TEXT("memcpy_s"));
        return 2;
    }

    // Change the page protections to read+execute
    DWORD ignore;
    if (!VirtualProtect(mem, sizeof(machine_code), PAGE_EXECUTE_READ, &ignore)) {
        DisplayError(TEXT("VirtualAlloc"));
        return 3;
    }

    // Flush the instruction cache
    if (!FlushInstructionCache(GetCurrentProcess(), mem, sizeof(machine_code))) {
        DisplayError(TEXT("FlushInstructionCache"));
        return 4;
    }

    // Point a function pointer at the newly allocated page, then call it
    uint32_t(*fn)() = (uint32_t(*)()) mem;
    uint32_t result = fn();
    _tprintf(TEXT("result = 0x%x\n"), result);

    // Free the memory
    if (!VirtualFree(mem, 0, MEM_RELEASE)) {
        DisplayError(TEXT("VirtualFree"));
        return 5;
    }

    return 0;
}

// from https://msdn.microsoft.com/en-us/library/windows/desktop/ms680582.aspx
static void DisplayError(LPTSTR failedFunctionName) {
    DWORD errorCode = GetLastError();
    LPVOID msgBufPtr;
    FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER
                      | FORMAT_MESSAGE_FROM_SYSTEM
                      | FORMAT_MESSAGE_IGNORE_INSERTS,
                  NULL,
                  errorCode,
                  MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
                  (LPTSTR)&msgBufPtr,
                  0,
                  NULL);

    size_t size = sizeof(TCHAR) * (lstrlen((LPCTSTR)msgBufPtr)
                                   + lstrlen((LPCTSTR)failedFunctionName)
                                   + 40 /* Static text below */);
    LPVOID displayBufPtr = (LPVOID)LocalAlloc(LMEM_ZEROINIT, size);
    StringCchPrintf((LPTSTR)displayBufPtr,
                    LocalSize(displayBufPtr) / sizeof(TCHAR),
                    TEXT("%s failed with error %d: %s"),
                    failedFunctionName,
                    errorCode,
                    msgBufPtr);
    MessageBox(NULL, (LPCTSTR)displayBufPtr, TEXT("Error"), MB_ICONERROR);

    LocalFree(msgBufPtr);
    LocalFree(displayBufPtr);
}

#endif

Source Code:	execute.c
Makefiles:	GNUmakefile (GNU Make on Linux/macOS)
	Makefile (NMAKE on Windows)

Published on 30 Mar 2017 • 1317 words • Comments? E-mail me!

Copyright © 2017 Jeffrey L. Overbey. All rights reserved. Except for source code where an explicit license is given, no part of this blog may be copied, reproduced, published, translated, or distributed, in whole or in part, without the written permission of the copyright owner.