Detecting Info Leaks from Uninitialised Memory by Seeding Memory Values (or a poor man's MSan)

NOTE: This explains explains one of the concepts used in LeakFuzzer ([source here] [paper here]) and NIFuzz ([source here]).

This post explains part of the approach taken in my work on finding information leaks with fuzzers. In particular, it explains how we can consistently detect information making its way to program output from uninitialised memory sources. These can be in-bounds reads of variables that have not been (fully) initialised, or out-of-bounds reads.

Information leaks can most obviously be concerning because they can reveal confidential data such as user passwords, encryption keys, or other personal data. But they can be also used to bypass security mechanisms such as ASLR, stack canaries, or PAC; making it possible to gain a foothold in a system for further exploitation.

Memory Reuse

In C and C++, memory is not zeroed (cleared) when you relinquish ownership; in C++ this is part of the 'zero-overhead principle' - i.e. "you don't pay for what you don't use". This is an issue when combined with uninitialised memory reads, as the reused memory could still contain sensitive information.

Heap Memory Reuse

To start off with let's look at how heap memory is allocated in this trivial example program:

int main(void) {
  char *myFirstArray = (char *)malloc(25);
  // populate myFirstArray = [0, 1, 2, 3, ..., 24]
  for (int i = 0; i < 25; i++) myFirstArray[i] = i;
  free(myFirstArray);
  char *mySecondArray = (char *)malloc(25);
  // Print out the memory addresses at the start of the arrays
  printf("myFirstArray: %p, mySecondArray: %p\n", myFirstArray, mySecondArray);
  printf("mySecondArray contents: [%hhu", mySecondArray[0]);
  for (int i = 1; i < 25; i++) printf(", %hhu", mySecondArray[i]);
  printf("]\n");
}

Running this program produces the output:

myFirstArray: 0x5e4f623512a0, mySecondArray: 0x5e4f623512a0
mySecondArray contents: [81, 35, 246, 228, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 17, 18, 19, 20, 21, 22, 23, 24]

Notice that the addresses of `myFirstArray` and `mySecondArray` are the same; this is because after freeing `myFirstArray`, the subsequent malloc call reuses that (now free) space. Check it out for yourself with the Godbolt Compiler Explorer (note that the addresses will change between executions due to ASLR). It is due to this memory reuse that uninitialised heap allocations can potentially hold sensitive information - notice how the final line of output still contains the values [16, 17, ...24] in tact. The first part of the data - [0, 1, ...15] - has been clobbered; probably due to some bookkeeping by the `malloc` implementation. Notably, the reuse of memory means that even if the memory either side of an allocation is currently unallocated, it could also still contain sensitive information from previous usage.

Stack Memory Reuse

Let's look at how stack memory is reused in this example program:

int main(void) {
  char superSecret[] = {'P','a','s','s','w','0','r','d','1'};
  mySecretHandler(superSecret, sizeof(superSecret));
  // (B)
  printf("%s\n", superSecret);
  myOtherFunction();
}

void mySecretHandler(char *secret, unsigned int length) {
  // Allocate a buffer on the stack
  char buffer[length];
  // (A)
  // reverse the secret, for some reason...
  for (int i = 0; i < length; i++) {
    buffer[i] = secret[length - i - 1];
  }
  // Copy it back into `secret`
  memcpy(secret, buffer, length);
}

void myOtherFunction() {
  char anotherBuffer[9];
  // (C)
  ... // some other stuff
}

The following diagram shows the stack layout at the lines marked with comments (A) and (B):

Note that the stack grows downwards from high memory addresses to low memory addresses; hence the `main` function appears at the top. The area indicated by each '{' or '}' symbol is called a 'stack frame', and there is one per currently called function. Calling a function pushes a new stack frame, and returning from a function pops a stack frame. The function return address (along with other information omitted from this diagram) is stored in the stack frame, followed by any local variables for the function. The dark green stack frame is the currently active one, the light green is owned but not active, and the red hatched stack frame is no longer in scope. Notice that the out-of-scope (red hatched) stack frame still contains the same data - i.e. it is not cleared.

The next diagram shows the stack layout at the lines marked with comments (B) and (C):

The key point here is that in the right hand side of the diagram (at the line marked (C)), the stack frame for `myOtherFunction` occupies the space previously used by `mySecretHandler`. In particular, the local variable `anotherBuffer` is at the same location as the `buffer` variable was in `mySecretHandler`. And guess what? It can still contain the reversed secret that we never bothered clearing out! Note, that this is not guaranteed; compilers have quite some freedom and are allowed to omit unused variables, and inline functions (copying and pasting the code into the call site) meaning that they get no stack frame of its own.

Struct Padding

In C, struct members are guaranteed¹ to be layed out in the order they're defined. Sometimes, in order to keep memory accesses aligned, the compiler will add padding between members. On some platforms this is done purely for performance reasons, whereas others won't allow unaligned accesses. The alignment is to some aribtrary boundary - generally multiples of 4, 8 or even 64 bytes (which is the size of a cache line). This padding is opaque to the developer from the source code (though IIRC you can get the information post-compile with pahole).

Let's take the following struct definition as an example:

struct Move {
  char direction; // 1 byte
  int distance; // 4 bytes
}

Here, despite storing just 5 bytes of actual information, the struct will take up 8 bytes in memory - with 3 bytes of padding sitting after the member `direction`, to align `distance` with a multiple of 4. We can confirm this by printing out the memory addresses using the following program:

int main(void) {
  const char UP = 0, LEFT = 1, DOWN = 2, RIGHT = 3;
  struct Move myMoves[] = {{ UP, 16 }, { LEFT, 8 }};
  printf("myMove[0]: direction: %p, distance: %p\n", 
         &myMoves[0].direction, &myMoves[0].distance);
  printf("myMove[1]: direction: %p, distance: %p\n", 
         &myMoves[1].direction, &myMoves[1].distance);
}

Even with the struct members swapped, we see that the real memory footprint of the struct is 8 bytes (as the compiler adds trailing padding).

This padding is an issue, as we've seen already that memory gets reused. Uninitialised data from these padding bytes could make its way to program output.

Struct Padding in the Wild

Let's take a look at a real example of an information leak from stack memory through struct padding from the Linux kernel - CVE-2009-2847:

typedef struct sigaltstack {
  void __user *ss_sp;
  int ss_flags;
  size_t ss_size;
} stack_t;

int
do_sigaltstack (const stack_t __user *uss,
                stack_t __user *uoss, 
                unsigned long sp)
{
  stack_t oss; // (A)
  ...
  if (uoss) {
    // (B)
    oss.ss_sp = (void __user *) current->sas_ss_sp;
    oss.ss_size = current->sas_ss_size;
    oss.ss_flags = sas_ss_flags(sp);
  }
  ...
  if (uoss) {
    // (C)
    if (copy_to_user(uoss, &oss, sizeof(oss)))
  ...
  }
}

As this is operating in kernel space, any memory values are potentially sensitive; and only variables marked with `__user` are accessible to user space (e.g. `const stack_t __user *uss`). We see that at (A) `stack_t oss` is declared as a local variable (thus on the stack in kernel space), and then copied to user space at (C) by the `copy_to_user` function (which functions similarly to `memcpy`). We also see that at (B), all 3 struct members are initialised. At first glance, it would be expected that the `copy_to_user` function thus only copies across initialised values. In fact, on 64-bit systems with 8-byte alignment, the struct is layed out like so:

+----------------+-----------+-----------+-----------+-----------+-----------+------------+
| Offset (bytes) |    0-3    |    4-7    |    8-11   |   12-15   |   16-19   |    20-23   |
+----------------+-----------+-----------+-----------+-----------+-----------+------------+
| Struct member  |         ss_sp         |  ss_flags | [padding] |        ss_size         |
+----------------+-----------+-----------+-----------+-----------+-----------+------------+

Note the padding from bytes 12-15. This padding is not cleared, and the data from the uninitialised memory is copied to user space. As the padding bytes are not directly accessible to us as a developer, we would need to use a call to `memset` in order to clear them.

So, Finding These Through Fuzzing is Easy?

As I discussed here; confirming an information leak can be done by finding a pair of tests with matching non-confidential (unprivileged) inputs and differing confidential (privileged) inputs that result in different non-confidential outputs.

For the above Linux kernel example, the non-confidential inputs are the pointers `uss` and `uoss`, which are provided as arguments to the `do_sigaltstack` function. The non-confidential output is the contents of `uoss`, which are updated by the call to `copy_to_user`.

We can consider anything apart from `uss` and `uoss` (and the contents of the structs they point to) as confidential - and given that this may contain values (due to memory reuse) that come from outside this function, we can consider it (confidential) input.

So, given that we know that it is possible to leak information from stack memory due to the struct padding, it should be possible to find a series of function calls that will populate this padding with some 'interesting' information. For example, if our goal was bypassing ASLR, this could be a pointer to a kernel object.

The issue here is that when using an off-the-shelf fuzzer that generates byte-array inputs, we would need to construct a complex fuzzing harness to parse the byte-array input into a series of function calls. Syzkaller is a specialised kernel fuzzer that generates sequences of syscalls and hence can do this; but I wanted a more general solution that kept fuzzing harnesses simple.

Assuming that we create a fuzzing harness which only calls the `do_sigaltstack` function, we expect the values contained in the uninitialised stack memory to be the same for each execution. If that is the case, then we can't find a pair of tests that differ in output, and hence demonstrate the leak.

The whole reason that information leaks through uninitialised memory are dangerous is that the reused memory contains information that has been built up over time. This is also true for information leaks from out-of-bounds memory reads - if nothing else has been allocated yet, then all you can possibly 'leak' is the zeroed memory that the OS gave you. The fact that fuzzing harnesses need to execute quickly, essentially means that none of this will happen; making finding these issues through fuzzing unlikely.

How do we Find These Information Leaks Through Fuzzing Then?

My approach is to literally treat the contents of uninitialised memory as an (confidential) input; then get the fuzzer to generate values to populate it. I call this prepoluation process 'seeding' the memory contents.

The following implementation details are specific to Unix-like systems. As the stack is a fixed size (which can configured with `ulimit -s`), we can get this size at the start - I get the top and bottom from `/proc/self/maps` with the `get_stack_top` and `get_min_stack_bottom` functions in here. I also wrap the `malloc`, `realloc` and `free` functions (in the same file - each wrapper is the target function name prefixed with `__wrap_`).

When landing in the fuzzing harness, I populate the stack with a repeated set of values provided by the fuzzer as part of the input - this is done with the 'FILL_STACK' macro here. The `malloc` (and family) wrappers use the same values to prepoulate sections of heap memory when they are allocated.

The following psuedocode indicates how such a fuzzing harness is constructed:

def fuzzer_harness(input):
  # parse out the memory seed value and public (non-confidential) inputs
  memory_seed, public_inputs = input.parse()
  # If there are any setup functions, they must be called here
  ...
  # Populate the stack with the memory seed value
  FILL_STACK(memory_seed)
  # Call the target function (i.e. the functionality we wish to fuzz)
  target_function(public_inputs)

Notice that the `FILL_STACK` macro is used after any setup functions, so that all of the stack memory below the `fuzzer_harness` is seeded. If it was used before the setup functions, then the seeded values would likely be clobbered by them.

Memory Seeding Values

As the memory seeding functions repeatedly paste the provided `memory_seed` value into the memory area (stack or heap allocation) until it is full, I believe that the two best values to start with in order to maximise the chance of detecting a leak are 0b10101010 (0xAA) and 0b01010101 (0x55). This is because every bit is flipped in one region compared to the other - meaning that even a single leaked bit will be discoverable in output. This is not the case for more recognisable values like 0xDEADBEEF or 0xBADF00D.

Example

Let's look this kernel example again, but simplified even further this time and with a fuzzing harness:

// Our target function requires this struct
typedef struct sigaltstack {
  void __user *ss_sp;
  int ss_flags;
  size_t ss_size;
} stack_t;

// Fuzzing harness
int LLVMFuzzerTestOneInput(const char *Data, unsigned int length) {
  // This is our 'secret' input
  char *memorySeed;
  unsigned int memorySeedLength;
  // `inputStack` and `inputSp` are the public inputs
  stack_t inputStack;
  unsigned long inputSp;
  // This `outputStack` will be populated by `do_sigaltstack`
  stack_t outputStack;

  // (A)
  // Parse the input into memory seed and public input
  parseInput(Data, length, &memorySeed, &memorySeedLength, &inputStack, &inputSp);

  // (B)
  // Populate the stack with the memory seed value (repeated)
  FILL_STACK(memorySeed, memorySeedLength);

  // (C)
  // Call the target function
  int res = do_sigaltstack(&inputStack, &outputStack, inputSp);

  // Output the results of the function call
  write(stdout, &res, sizeof(res));
  write(stdout, &outputStack, sizeof(outputStack));
   
  return 0;
}

// pretend that copy_to_user is a wrapper around memcpy
void copy_to_user(void *dst, void *src, size_t size) {
  memcpy(dst, src, size);
}

int do_sigaltstack (
  const stack_t __user *uss,
  stack_t __user *uoss, 
  unsigned long sp
) {
  stack_t oss;
  // (D)
  oss.ss_sp = uss->ss_sp;
  oss.ss_size = uss->ss_size;
  oss.ss_flags = uss->ss_flags;
  copy_to_user(uoss, &oss, sizeof(oss))
  // (E)
  return 0;
}

We will now walk through the stack memory contents at the different program points; to make things simpler, we consider only the local variables on the stack (pretending that there are no return address or arguments stored in each stack frame). At the program point marked (A), the stack memory is as follows:

Offset (bytes)	Variable	Value
0-7	memorySeed	0
8-11	memorySeedLength	0
12-27	inputStack	{ NULL, 0, 0 }
28-35	inputSp	0
36-53	outputStack	{ NULL, 0, 0 }
54-	[unallocated]	{ 0, 0, 0, 0, ... }

At the program point marked (B), the call to `parse_input` has overwritten the memory contents below the current local variables (marked as [unallocated]). Now, our value of `memorySeed` depends on the input. We have also populated the public inputs as follows: `inputStack` = { 0x11223344, 1, 20 }, `inputSp` = 0x55667788. The resulting stack is as follows:

Offset (bytes)	Variable	Value when seed is {0xAA}	Value when seed is {0x55}
0-7	memorySeed	Pointer to {0xAA}	Pointer to {0x55}
8-11	memorySeedLength	1	1
12-27	inputStack	{ 0x11223344, 1, 20 }	{ 0x11223344, 1, 20 }
28-35	inputSp	0x55667788	0x55667788
36-53	outputStack	{ NULL, 0, 0 }	{ NULL, 0, 0 }
54-	[unallocated]	{ 200, 181, 14, 17, ... }	{ 200, 181, 14, 17, ... }

At the program point marked (C), We have seeded the stack memory below our current stack frame (notice that [unallocated] has changed):

Offset (bytes)	Variable	Value when seed is {0xAA}	Value when seed is {0x55}
0-7	memorySeed	Pointer to {0xAA}	Pointer to {0x55}
8-11	memorySeedLength	1	1
12-27	inputStack	{ 0x11223344, 1, 20 }	{ 0x11223344, 1, 20 }
28-35	inputSp	0x55667788	0x55667788
36-53	outputStack	{ NULL, 0, 0 }	{ NULL, 0, 0 }
54-	[unallocated]	{ 0xAA, 0xAA, 0xAA, ... }	{ 0x55, 0x55, 0x55, ... }

Now at program point (D), the formerly [unallocated] memory is being used to store local variables for `do_sigaltstack` (in this case, just `oss`):

Offset (bytes)	Variable	Value when seed is {0xAA}	Value when seed is {0x55}
0-7	memorySeed	Pointer to {0xAA}	Pointer to {0x55}
8-11	memorySeedLength	1	1
12-27	inputStack	{ 0x11223344, 1, 20 }	{ 0x11223344, 1, 20 }
28-35	inputSp	0x55667788	0x55667788
36-53	outputStack	{ NULL, 0, 0 }	{ NULL, 0, 0 }
54-69	oss	{ 0xAAAA..., 0xAAAAAAAA, 0xAAAA...}	{ 0x5555..., 0x55555555, 0x5555... }
70-	[unallocated]	{ 0xAA, 0xAA, 0xAA, ... }	{ 0x55, 0x55, 0x55, ... }

At program point (E), `do_sigaltstack`s local variable `oss` has been populated, and copied back into `uoss` (aka `outputStack`):

Offset (bytes)	Variable	Value when seed is {0xAA}	Value when seed is {0x55}
0-7	memorySeed	Pointer to {0xAA}	Pointer to {0x55}
8-11	memorySeedLength	1	1
12-27	inputStack	{ 0x11223344, 1, 20 }	{ 0x11223344, 1, 20 }
28-35	inputSp	0x55667788	0x55667788
36-53	outputStack	{ 0x11223344, 1, 20 }	{ 0x11223344, 1, 20 }
54-69	oss	{ 0x11223344, 1, 20 }	{ 0x11223344, 1, 20 }
70-	[unallocated]	{ 0xAA, 0xAA, 0xAA, ... }	{ 0x55, 0x55, 0x55, ... }

Note that the member values of `oss` have been copied from `inputStack` (as pointed to by `uss`). The unseen 4 padding bytes between the second and third members (`ss_size` and `ss_flags`) have not been touched. They are still {0xAA, 0xAA, 0xAA, 0xAA} or {0x55, 0x55, 0x55, 0x55} depending on the memory seed value. As a result, the program outputs are as follows (assuming big-endian byte order for readability):

with seed {0xAA}: {0, 0, 0, 0, 0, 0x11, 0x22, 0x33, 0x44, 0, 0, 0, 1, 0xAA, 0xAA, 0xAA, 0xAA, 0, 0, 0, 0, 0, 0, 0, 20}
with seed {0x55}: {0, 0, 0, 0, 0, 0x11, 0x22, 0x33, 0x44, 0, 0, 0, 1, 0x55, 0x55, 0x55, 0x55, 0, 0, 0, 0, 0, 0, 0, 20}

Now we can see the difference in outputs due to the leaked information in the padding bytes, and we didn't have to construct a complex fuzzing harness to do so. Notice also that if we were to read from the [unallocated] memory (requiring a buffer underflow) and output it, then we'd also observe a difference in output that is dependent on the memory seed value. Finally, remember that we also seed heap memory allocations; and to make it possible to catch under- / over-reads, we over-allocate and seed either side of the returned buffer.

So, it's like MSan but worse?

MemorySanitizer (MSan) can detect certain uses of uninitialised memory; according to the docs it can detect the following: 'used in conditional branch', 'dereferenced uninitialised pointer', 'uninitialised value returned from function'. Unfortunately in the above example, none of these conditions are met. Additionally, due to their ability to weaken system defences such as ASLR and stack canaries, information leaks are of higher significance than some other uses of uninitialised memory. Thus, being aware of their presence can be useful in prioritising bug fixes.

Secondly, MSan uses a shadow memory to track the initialisation state. It is possible that in some ultra constrained systems, where the 2x memory overhead is not viable, the memory seeding approach could be justified in its use.

That being said, MSan offers much more fine grained detection of uninitialised memory uses; and I believe that it is preferable to run it when fuzzing in general. The memory seeding approach could be used in triaging uninitialised memory uses that have already been detected by MSan.

Conclusion

Memory seeding is an approach to populating memory with non-zero values, in order to make it possible to detect information leaks from uninitialised memory with simple fuzzing harnesses. It works by taking a 'seed' value from the fuzzer as input, and repeatedly copying this value into stack memory, and new heap allocations. To make it possible to detect under- and overflows in heap memory, the memory is over-allocated and seeded either side of the returned buffer.

¹ well, not all of the time ↩