Tut10: Symbolic Execution

In this tutorial, you will learn about symbolic execution, which is one of the most widely-used means for program analysis, and do some exercise with well-known symbolic execution engines, namely, KLEE and Angr.

1. Symbolic Execution

Generally, a program is "concretely" executed; it handles concrete values, e.g., an input value given by a user, and its behavior depends on this input.

Let's revisit crackme0x00 we encountered in lab01:

int main(int argc, char *argv[])
{
  int passwd;
  printf("IOLI Crackme Level 0x00\n");
  printf("Password: ");
  scanf("%d", &passwd);
  if (passwd == 3214)
    printf("Password OK :)\n");
  else
    printf("Invalid Password!\n");
  return 0;
}

When user gives an integer 3214 as input, it is stored in variable passwd. Then, the execution will follow the first if branch to print out "Password OK :)". Otherwise, it will take the other (i.e., else) branch, and print out "Invalid Password!". Then program was executed concretely, and the paths taken were determeined by the concrete value of passwd.

However, when we "symbolically" execute a program, a symbolic executor tracks symbolic states rather than concrete input by analyzing the program, and generates a set of test cases that can reach (theoretically) all paths existing in the program.

For example, if the same example is symbolically executed, the result would be two test cases: (1) passwd = 3214, that takes one branch, (2) passwd = 0, that takes the other branch.

Why do we do this? This technique comes in handy when we are trying to test a program for bug, because it helps us find which input, namely a test case, triggers which part (i.e., path) of the tested program by tracking symbolic expression and path constraints. Don't get lost! Here's an example.

1 | void buggy(int x, int y) {
2 |     int i = 10;
3 |     int z = y * 2;
4 |     if (z == x) {
5 |         if (x >= y + 10) {
6 |             z = z / (i - 10); /* Div-by-zero bug here */
7 |         }
8 |     }
9 | }

Have you spotted the division-by-zero bug in line 6? When x is 100 and y is 50, z becomes 100 in line 3. Thus, the first if branch is taken, and as x (100) >= y + 10 (60), the program reaches line 6. Here, z / (i - 10) triggers a division-by-zero bug, because i is 10.

As a program tester (or a bug hunter), you want to automatically find the pair of x and y that triggers the bug. Symbolic execution is a perfect match for this job.

Once we mark x and y as symbolic variables, two mappings are put in a symbolic store S: {x->x0, y->y0}, where a var->sym mapping indicates that "a variable var is represented by a symbolic expression sym." (Symbolic expressions are placeholders for unknown values.) Likewise, for z = y * 2, variable z is symbolically represented as z->2*y0, and this mapping is added to S. In addition, before encountering a branch, path constraint PC is true.

S: {x->x0, y->y0, z->2*y0}, PC: true

Program execution diverges at the first branch, if (z == x):

path1: skip if with PC: (x0 != 2*y0), S: {x->x0, y->y0, z->2*y0}
path2: step inside if with PC: (x0 == 2*y0), S: {x->x0, y->y0, z->2*y0}

Path p1 directly reaches line 8, and has nothing left to do.

(path1) S: {x->x0, y->y0, z->2*y0}, PC: (x0 != 2*y0)

Path p2 then encounters another branch condition if (x >= y + 10), which renders two paths again:

path2-1: skip if with PC: (x0 != 2*y0) AND (x0 < y0+10)
path2-2: take if with PC: (x0 != 2*y0) AND (x0 >= y0+10)

Path p2-1 is done.

(path2-1) S: {x->x0, y->y0, z->2*y0}, PC: (x0 != 2*y0) AND (x0 < y0+10)

Now, the only remaining path is path2-2. The executor proceeds to line 6, where z in the symbolic store S is updated:

(path2-2) S: {x->x0, y->y0, z->2*y0/0}, PC: (x0 != 2*y0) AND (x0 >= y0+10)

We ended up with three paths, with three sets of symbolic states that trigger each path: path1, path2-1, and path2-2. Now, a constraint solver jumps in to solve each path constraints and find concrete values that satisfy the constraint.

For example, Z3 constraint solver solves each path constraint to have

(path1) : x = -1, y = 0
(path2-1): x = 0, y = 0
(path2-2): x = 1,073,741,792, y = 5,368,870,896

We now have three automatically generated test cases, with which we can explore each and every execution path of the program. Providing the x and y of the third test case, the division by zero bug will be triggered.

2. Using KLEE for symbolic execution

So, how is symbolic execution done in practice? KLEE is a powerful symbolic execution engine built on top of LLVM compiler infrastructure, targeting C code.

KLEE exercise 1: crackme0x00

Let's open crackme0x00.c and check its contents. This program prints out "Password OK :)" when 3214 is provided as an input.

cd /tut/tut10-02-symexec/tut1-klee
vim crackme0x00.c

Step 1) Annotation

Originally, this program read user input through scanf("%d", &passwd);. For symbolic execution, we comment this line out, and make KLEE handle variable passwd symbolically, by explicitly marking passwd as a symbolic variable:

// scanf("%d", &passwd);
klee_make_symbolic(&passwd, sizeof(passwd), "passwd");

You need to specify symbolic variables like above, in order for KLEE to consider them as symbolic variables, and keep track of their states during a symbolic execution.

Step 2) Compiling target program to LLVM bitcode

KLEE operates on LLVM bitcode. With the symbolic variables annotated, we first need to compile our program to an LLVM bitcode:

$ clang-6.0 -I ./include -c -emit-llvm -g -O0 crackme0x00.c

crackme0x00.bc is the resulting bitcode, and we are ready to run KLEE on it.

Step 3) Running KLEE

KLEE is already installed on the server. You can start running an analysis by $ klee (options) [bitcode_file]:

$ klee crackme0x00.bc
KLEE: output directory is "klee-out-0"
KLEE: Using Z3 solver backend
KLEE: WARNING: undefined reference to function: printf
KLEE: WARNING ONCE: calling external: printf(93914628656128) at crackme0x00.c:12 3
IOLI Crackme Level 0x00
Password: Invalid Password!
Password OK :)

KLEE: done: total instructions = 23
KLEE: done: completed paths = 2
KLEE: done: generated tests = 2

Step 4) Interpreting the result and reproducing the bug

We've just symbolically executed our program, and KLEE reported that it could reach two paths through symbolic execution, and generate test case for each path:

KLEE: done: completed paths = 2
KLEE: done: generated tests = 2

The generated test cases and metadata is stored under the output directory. If you ran KLEE multiple times, the directory's name could be different, but the latest result can always be referenced by klee-last, which symbolically links to the latest output directory:

KLEE: output directory is "klee-out-0"

Now, let's check the actual test cases generated by KLEE, and try to reproduce each case against the binary.

$ ls klee-last | grep ktest
test000001.ktest
test000002.ktest

A ktest file is a serialized object of a generated test case. It can be analyzed through ktest-tool utility that comes with KLEE. Let's examine how the first test case looks like:

$ ktest-tool klee-last/test000001.ktest
ktest file : 'klee-last/test000001.ktest'
args       : ['crackme0x00.bc']
num objects: 1
object 0: name: 'passwd'
object 0: size: 4
object 0: data: b'\x8e\x0c\x00\x00'
object 0: hex : 0x8e0c0000
object 0: int : 3214
object 0: uint: 3214
object 0: text: ....

We can find that passwd is 3214:

object 0: name: 'passwd'
object 0: int : 3214

If we run the program with this concrete value, it will print "Password OK :)" as expected. We can compile the program and verify this by replaying the generated test case:

$ gcc -I ./include crackme0x00.c -lkleeRuntest -o crackme0x00
$ KTEST_FILE=klee-last/test000001.ktest ./crackme0x00
IOLI Crackme Level 0x00
Password: Password OK :)

As expected, the first test case printed "Password OK :)".

Now, let's investigate the second test case:

ktest file : 'klee-last/test000002.ktest'
args       : ['crackme0x00.bc']
num objects: 1
object 0: name: 'passwd'
object 0: size: 4
object 0: data: b'\x00\x00\x00\x00'
object 0: hex : 0x00000000
object 0: int : 0
object 0: uint: 0
object 0: text: ....

In this test case, passwd is 0. This test case will take the else branch, and print "Invalid Password!":

$ KTEST_FILE=klee-last/test000002.ktest ./crackme0x00
IOLI Crackme Level 0x00
Password: Invalid Password!

As shown, KLEE symbolically executed crackme0x00 by tracking the symbolic states of variable passwd, and found all (i.e., two) possible execution paths in the program.

Well, this is the basic workflow of KLEE. In the following section, we will utilize KLEE to crack other crackme challenges.

KLEE exercise 2: crackme0x01 - 0x03

In crackme0x01, our objective is to find an input that would make the binary print "Password OK :)". The steps are not different from what we did for crackme0x00.

First, remember to include klee header:

#include "klee/klee.h"

And then, mark the buffer to store our input symbolic:

klee_make_symbolic(&buf, sizeof(buf), "buf");
// scanf("%s", buf);

Now, compile and symbolically execute the program using KLEE:

$ clang-6.0 -I include -c -g -emit-llvm -O0 crackme0x01.c
$ klee --libc-uclibc crackme0x01.bc

Do you see that it indeed printed "Password OK :)" at the end?

IOLI Crackme Level 0x00
Password: Invalid Password!
Invalid Password!
Invalid Password!
Invalid Password!
Invalid Password!
Invalid Password!
Invalid Password!
Password OK :)

KLEE: done: total instructions = 12938
KLEE: done: completed paths = 8
KLEE: done: generated tests = 8

Let's check and replay the last (8th) test case to see if it really is the input that we are looking for:

$ ktest-tool klee-last/test000008.ktest
object 0: name: 'buf'
object 0: text: 250381.222222222

$ gcc -I ./include crackme0x01.c -lkleeRuntest -o crackme0x01
$ KTEST_FILE=klee-last/test000008.ktest ./crackme0x01
IOLI Crackme Level 0x01
Password: Password OK :)

Yes, KLEE is capable of handling symbolic variable that goes through a call to strcpy();, and find corresponding path. Take a look at test cases one to seven, and you would be able to imagine the steps KLEE took to find paths and corresponding test cases in this example.

Now, we have crackme0x02 and crackme0x03 left. crackme0x02 has a if statement, which checks if the input * 345 equals to 1190940. Would KLEE work in this case? crackme0x03 has a weird-looking shifting mechanism, but it will print "Password OK :)" if a certain condition is met. Your task is to further explore KLEE to find the inputs for those two binaries that makes them print "Password OK :)".

KLEE exercise 3: Finding buffer overflow

Our next target is bof.c. It has a classic buffer overflow bug, by which the buf variable in vuln() function can be overflown by an input string provided by a user.

Your task is to run KLEE on this target to find the buggy test case. These are the required steps (same as above): (1) Mark input as symbolic. (2) Remove the code that reads user input, because KLEE will auto-generate symbolic values for input. (3) Compile bof.c to LLVM bitcode, namely, bof.bc (refer to Exercise 1). (4) Run klee, and investigate the results. (5) Replay the buggy test case to confirm the bug.

Have you found the test case to trigger the bug? We have examined simple cases, but imagine you have many larger, complicated programs to analyze with limited amount of time. You could be assisted by this automated technique!

For further technical details, check the official paper published in OSDI'08.

KLEE paper

One caveat lying here is that KLEE requires a source code along with an LLVM compiler toolchain to conduct symbolic execution. Then, what if we only have a binary, but still want to do symbolic execution? Angr comes into play in this case.

3. Using Angr for symbolic execution

Angr is a user-friendly binary analysis framework. With its Python API, you can symbolically execute a program and do various analysis, without the existence of a source code.

In this tutorial, we will learn how to find a desired execution path and corresponding input through Angr framework.

Angr exercise 1: crackme0x00

Now, you only have a binary that asks you to input a password. Instead of brute-forcing, we can take advantage of Angr's symbolic execution that runs directly on binaries to find the desired input. Let's open crackme0x00.py, and follow its procedure.

cd /tut/tut10-02-symexec/tut2-angr
vim crackme0x00.py

Step 1) Importing Angr module and loading binary

Angr's analysis always begins with loading a binary into a Project object. If you want to analyze crackme0x00 binary, do:

import angr

proj = angr.Project("crackme0x00")

Step 2) Find and specify target addresses

With Angr, we can specify the target address in the binary that we want to reach, (preferrably a buggy basic block), and have the constraint solver find the corresponding test case by solving the collected path constraints. Let's run gdb and analyze the binary to find the target address:

$ gdb-pwndbg ./crackme0x00
pwndbg> disass main
...
   0x08049328 <+112>:   push   0x804a095
   0x0804932d <+117>:   call   0x80491f6 <print_key>

The main function is calling print_key function, and it seems that if we somehow reach there, it would print the flag for us.

Back to crackme0x00.py. Angr provides a loader, which helps you find symbols from the binary (like what pwntools does):

addr_main = proj.loader.find_symbol("main").rebased_addr
addr_target = addr_main + 112  # push 0x804a095

Step 3) Define an initial state and initiate simulation manager

Now that we have the address of main, where we want to start analysis, we can define an initial state as follows:

state = proj.factory.entry_state(addr=addr_main)

Simulation manager is a control interface for Angr's symbolic execution. With the defined state, we can initiate this module:

sm = proj.factory.simulation_manager(state)

Step 4) Run symbolically, and verify the test case

explore method of the simulation manager lets us symbolically execute the binary until it finds the state satisfyig the find parameter. In this case, addr_target will be given as a parameter. And until the simulation manager finds the path to the addr_target, we can keep stepping through the instructions:

sm.explore(find=addr_target)
while len(sm.found) == 0:
    sm.step()

If a path is found, it will dump the input and verify the test case:

if (len(sm.found) > 0):
    print("found!")
    found_input = sm.found[0].posix.dumps(0) # this is the stdin
    print(found_input)
    with open("input-crackme0x00", "wb") as fp:
        fp.write(found_input)

Now, let's run the script and check if Angr really finds the desired path.

$ ./crackme0x00.py
Finding input
...
found!
b'250381\x00\xd9\xd9..'

Starting from function main @0x080492b8, Angr symbolically executed the crackme0x00 binary to find the state that can reach the basic block at 0x08049328. It successfully found the block, and by solving the path constraints, emitted the test case as "250381\x00..."

Verifying this is straightforward, as we can now concretely execute the binary with the found test case:

$ ./crackme0x00 < input-crackme0x00
IOLI Crackme Level 0x00
Password: Password OK :)
FLAG

As expected, the test case printed the flag!

Angr exercise 2: crackme0x00-canary

Another interesting example is when the binary has canary implemented. Launch crackme0x00-canary and feed it with different inputs:

$ ./crackme0x00-canary
IOLI Crackme Level 0x00
Password:aaaabbbb
Invalid Password!

$ ./crackme0x00-canary
IOLI Crackme Level 0x00
Password:aaaabbbbccccddddeeee
Invalid Password!
crackme0x00-canary: *** stack smashing detected ***

In case we provided 20-byte input, the stack smashing seems to be detected through a canary, and because of that, we cannot not control the eip of this binary. In such case, could Angr help us find an input that can even bypass the canary check? (heads up: this binary implements a custom, weak canary, where the value is fixed.)

Let's take a look at crackme0x00-canary.py. The flow of symbolic execution is similar to that of the previous exercise, but note that we have to take advantage of an "unconstrained state" to solve this challenge.

Typically, when the size of a symbolic variable is known, a symbolic executor only considers values within the size. For example, if our character buffer of 16 bytes is marked symbolic, all the symbolic paths are reachable with an input that is shorter than 16 bytes, because the variable is constrained by its size. However, we know that the size of input to be stored in the buffer could be larger than the size, causing some troubles. To test such situation, unconstrained is used:

sm = proj.factory.simulation_manager(save_unconstrained=True)
while len(sm.unconstrained) == 0:
    sm.step()

This lets the simulation manager symbolically execute the target program until an effective unconstrained input (i.e., triggering buffer overflow in this case) is found. We can dump the stdin of this case, and see what happened:

unconstrained_state = sm.unconstrained[0]
crashing_input = unconstrained_state.posix.dumps(0)

print("found!")
print(repr(crashing_input))
with open("input-crackme0x00-canary", "wb") as fp:
    fp.write(crashing_input)

$ ./crackme0x00-canary.py
finding buffer overflow & bypassing static canary
...
found!
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x81\x14\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01'

Do you see that the input is long enough to overflow the buffer, overwrite the canary, and even the return address? Also, bytes 17-21 of the input are 0xdeadbeef. Guess what the static canary is?

We can verify this result by running the binary with the dumped input:

$ ./crackme0x00-canary < input-crackme0x00-canary
IOLI Crackme Level 0x00
Password:Invalid Password!
[1]    13740 segmentation fault (core dumped)  ./crackme0x00-canary < input-crackme0x00-canary

It indeed triggered a buffer overflow, and a segmentation fault, which means that Angr also found the canary!

Now you can start examining the core dump, and learn which part of the input should be changed to hijack the control flow:

$ gdb-pwndbg ./crackme0x00-canary core
pwndbg> info registers eip
eip            0x2148100        0x2148100
pwndbg> quit

$ xxd input-crackeme0x00-canary
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: efbe adde 0000 0000 0000 0000 0000 0000  ................
00000020: 0081 1402 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0101            ............

Now you know where to modify from the input to make the program jump to any place you want. Try making it jump to your shellcode, and print the flag!

Angr exercise 3: crackme0x01 - 0x03

Practice writing scripts for symbolic execution using Angr framework against the rest of the crackme binaries. Your task is to find the input that makes each binary print out "Password OK :)".

Angr exercise 4: Cracking password

Let's take a look at another example, pwd.

$ ./pwd
Enter the password: 12345678
Access denied.

$ ./pwd
./pwd
Enter the password: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Access denied.
*** stack smashing detected ***: <unknown> terminated
[1]    18317 abort (core dumped)  ./pwd

This binary appears to be a password authenticator, and we want to crack it by finding the password string using Angr. Take a look at the given template, pwd_template.py, and fill in the required parts by analyzing the pwd binary.

Ultimately, we are looking for the input (i.e., valid password), which will make pwd binary print "Access granted!". FYI, once the path is found, you can print the stdin through:

print("input: {0}".format(sm.active[0].posix.dumps(sys.stdin.fileno())))

[TASK] Analyze the binary, complete and execute the Angr script to find the password, and verify it against the pwd binary.

CS6265: Information Security Lab