In this tutorial, you will learn about symbolic execution, which is one of the most widely-used means for program analysis, and do some exercise with well-known symbolic execution engines, namely, KLEE and Angr.
Generally, a program is "concretely" executed; it handles concrete values, e.g., an input value given by a user, and its behavior depends on this input.
Let's revisit crackme0x00 we encountered in lab01:
crackme0x00
int main(int argc, char *argv[]) { int passwd; printf("IOLI Crackme Level 0x00\n"); printf("Password: "); scanf("%d", &passwd); if (passwd == 3214) printf("Password OK :)\n"); else printf("Invalid Password!\n"); return 0; }
When user gives an integer 3214 as input, it is stored in variable passwd. Then, the execution will follow the first if branch to print out "Password OK :)". Otherwise, it will take the other (i.e., else) branch, and print out "Invalid Password!". Then program was executed concretely, and the paths taken were determeined by the concrete value of passwd.
3214
passwd
if
else
However, when we "symbolically" execute a program, a symbolic executor tracks symbolic states rather than concrete input by analyzing the program, and generates a set of test cases that can reach (theoretically) all paths existing in the program.
For example, if the same example is symbolically executed, the result would be two test cases: (1) passwd = 3214, that takes one branch, (2) passwd = 0, that takes the other branch.
passwd = 3214
passwd = 0
Why do we do this? This technique comes in handy when we are trying to test a program for bug, because it helps us find which input, namely a test case, triggers which part (i.e., path) of the tested program by tracking symbolic expression and path constraints. Don't get lost! Here's an example.
1 | void buggy(int x, int y) { 2 | int i = 10; 3 | int z = y * 2; 4 | if (z == x) { 5 | if (x >= y + 10) { 6 | z = z / (i - 10); /* Div-by-zero bug here */ 7 | } 8 | } 9 | }
Have you spotted the division-by-zero bug in line 6? When x is 100 and y is 50, z becomes 100 in line 3. Thus, the first if branch is taken, and as x (100) >= y + 10 (60), the program reaches line 6. Here, z / (i - 10) triggers a division-by-zero bug, because i is 10.
x
y
z
z / (i - 10)
i
As a program tester (or a bug hunter), you want to automatically find the pair of x and y that triggers the bug. Symbolic execution is a perfect match for this job.
Once we mark x and y as symbolic variables, two mappings are put in a symbolic store S: {x->x0, y->y0}, where a var->sym mapping indicates that "a variable var is represented by a symbolic expression sym." (Symbolic expressions are placeholders for unknown values.) Likewise, for z = y * 2, variable z is symbolically represented as z->2*y0, and this mapping is added to S. In addition, before encountering a branch, path constraint PC is true.
S: {x->x0, y->y0}
var->sym
var
sym
z = y * 2
z->2*y0
PC
true
S: {x->x0, y->y0, z->2*y0}, PC: true
S: {x->x0, y->y0, z->2*y0}
PC: true
Program execution diverges at the first branch, if (z == x):
if (z == x)
PC: (x0 != 2*y0)
PC: (x0 == 2*y0)
Path p1 directly reaches line 8, and has nothing left to do.
(path1) S: {x->x0, y->y0, z->2*y0}, PC: (x0 != 2*y0)
Path p2 then encounters another branch condition if (x >= y + 10), which renders two paths again:
if (x >= y + 10)
PC: (x0 != 2*y0) AND (x0 < y0+10)
PC: (x0 != 2*y0) AND (x0 >= y0+10)
Path p2-1 is done.
(path2-1) S: {x->x0, y->y0, z->2*y0}, PC: (x0 != 2*y0) AND (x0 < y0+10)
Now, the only remaining path is path2-2. The executor proceeds to line 6, where z in the symbolic store S is updated:
(path2-2) S: {x->x0, y->y0, z->2*y0/0}, PC: (x0 != 2*y0) AND (x0 >= y0+10)
S: {x->x0, y->y0, z->2*y0/0}
We ended up with three paths, with three sets of symbolic states that trigger each path: path1, path2-1, and path2-2. Now, a constraint solver jumps in to solve each path constraints and find concrete values that satisfy the constraint.
For example, Z3 constraint solver solves each path constraint to have
We now have three automatically generated test cases, with which we can explore each and every execution path of the program. Providing the x and y of the third test case, the division by zero bug will be triggered.
So, how is symbolic execution done in practice? KLEE is a powerful symbolic execution engine built on top of LLVM compiler infrastructure, targeting C code.
Let's open crackme0x00.c and check its contents. This program prints out "Password OK :)" when 3214 is provided as an input.
crackme0x00.c
cd /tut/tut10-02-symexec/tut1-klee vim crackme0x00.c
Originally, this program read user input through scanf("%d", &passwd);. For symbolic execution, we comment this line out, and make KLEE handle variable passwd symbolically, by explicitly marking passwd as a symbolic variable:
scanf("%d", &passwd);
// scanf("%d", &passwd); klee_make_symbolic(&passwd, sizeof(passwd), "passwd");
You need to specify symbolic variables like above, in order for KLEE to consider them as symbolic variables, and keep track of their states during a symbolic execution.
KLEE operates on LLVM bitcode. With the symbolic variables annotated, we first need to compile our program to an LLVM bitcode:
$ clang-6.0 -I ./include -c -emit-llvm -g -O0 crackme0x00.c
crackme0x00.bc is the resulting bitcode, and we are ready to run KLEE on it.
crackme0x00.bc
KLEE is already installed on the server. You can start running an analysis by $ klee (options) [bitcode_file]:
$ klee (options) [bitcode_file]
$ klee crackme0x00.bc KLEE: output directory is "klee-out-0" KLEE: Using Z3 solver backend KLEE: WARNING: undefined reference to function: printf KLEE: WARNING ONCE: calling external: printf(93914628656128) at crackme0x00.c:12 3 IOLI Crackme Level 0x00 Password: Invalid Password! Password OK :) KLEE: done: total instructions = 23 KLEE: done: completed paths = 2 KLEE: done: generated tests = 2
We've just symbolically executed our program, and KLEE reported that it could reach two paths through symbolic execution, and generate test case for each path:
KLEE: done: completed paths = 2 KLEE: done: generated tests = 2
The generated test cases and metadata is stored under the output directory. If you ran KLEE multiple times, the directory's name could be different, but the latest result can always be referenced by klee-last, which symbolically links to the latest output directory:
KLEE: output directory is "klee-out-0"
Now, let's check the actual test cases generated by KLEE, and try to reproduce each case against the binary.
$ ls klee-last | grep ktest test000001.ktest test000002.ktest
A ktest file is a serialized object of a generated test case. It can be analyzed through ktest-tool utility that comes with KLEE. Let's examine how the first test case looks like:
ktest-tool
$ ktest-tool klee-last/test000001.ktest ktest file : 'klee-last/test000001.ktest' args : ['crackme0x00.bc'] num objects: 1 object 0: name: 'passwd' object 0: size: 4 object 0: data: b'\x8e\x0c\x00\x00' object 0: hex : 0x8e0c0000 object 0: int : 3214 object 0: uint: 3214 object 0: text: ....
We can find that passwd is 3214:
object 0: name: 'passwd' object 0: int : 3214
If we run the program with this concrete value, it will print "Password OK :)" as expected. We can compile the program and verify this by replaying the generated test case:
$ gcc -I ./include crackme0x00.c -lkleeRuntest -o crackme0x00 $ KTEST_FILE=klee-last/test000001.ktest ./crackme0x00 IOLI Crackme Level 0x00 Password: Password OK :)
As expected, the first test case printed "Password OK :)".
Now, let's investigate the second test case:
ktest file : 'klee-last/test000002.ktest' args : ['crackme0x00.bc'] num objects: 1 object 0: name: 'passwd' object 0: size: 4 object 0: data: b'\x00\x00\x00\x00' object 0: hex : 0x00000000 object 0: int : 0 object 0: uint: 0 object 0: text: ....
In this test case, passwd is 0. This test case will take the else branch, and print "Invalid Password!":
$ KTEST_FILE=klee-last/test000002.ktest ./crackme0x00 IOLI Crackme Level 0x00 Password: Invalid Password!
As shown, KLEE symbolically executed crackme0x00 by tracking the symbolic states of variable passwd, and found all (i.e., two) possible execution paths in the program.
Well, this is the basic workflow of KLEE. In the following section, we will utilize KLEE to crack other crackme challenges.
In crackme0x01, our objective is to find an input that would make the binary print "Password OK :)". The steps are not different from what we did for crackme0x00.
crackme0x01
First, remember to include klee header:
#include "klee/klee.h"
And then, mark the buffer to store our input symbolic:
klee_make_symbolic(&buf, sizeof(buf), "buf"); // scanf("%s", buf);
Now, compile and symbolically execute the program using KLEE:
$ clang-6.0 -I include -c -g -emit-llvm -O0 crackme0x01.c $ klee --libc-uclibc crackme0x01.bc
Do you see that it indeed printed "Password OK :)" at the end?
IOLI Crackme Level 0x00 Password: Invalid Password! Invalid Password! Invalid Password! Invalid Password! Invalid Password! Invalid Password! Invalid Password! Password OK :) KLEE: done: total instructions = 12938 KLEE: done: completed paths = 8 KLEE: done: generated tests = 8
Let's check and replay the last (8th) test case to see if it really is the input that we are looking for:
$ ktest-tool klee-last/test000008.ktest object 0: name: 'buf' object 0: text: 250381.222222222 $ gcc -I ./include crackme0x01.c -lkleeRuntest -o crackme0x01 $ KTEST_FILE=klee-last/test000008.ktest ./crackme0x01 IOLI Crackme Level 0x01 Password: Password OK :)
Yes, KLEE is capable of handling symbolic variable that goes through a call to strcpy();, and find corresponding path. Take a look at test cases one to seven, and you would be able to imagine the steps KLEE took to find paths and corresponding test cases in this example.
Now, we have crackme0x02 and crackme0x03 left. crackme0x02 has a if statement, which checks if the input * 345 equals to 1190940. Would KLEE work in this case? crackme0x03 has a weird-looking shifting mechanism, but it will print "Password OK :)" if a certain condition is met. Your task is to further explore KLEE to find the inputs for those two binaries that makes them print "Password OK :)".
crackme0x02
crackme0x03
Our next target is bof.c. It has a classic buffer overflow bug, by which the buf variable in vuln() function can be overflown by an input string provided by a user.
bof.c
buf
vuln()
Your task is to run KLEE on this target to find the buggy test case. These are the required steps (same as above): (1) Mark input as symbolic. (2) Remove the code that reads user input, because KLEE will auto-generate symbolic values for input. (3) Compile bof.c to LLVM bitcode, namely, bof.bc (refer to Exercise 1). (4) Run klee, and investigate the results. (5) Replay the buggy test case to confirm the bug.
input
bof.bc
Have you found the test case to trigger the bug? We have examined simple cases, but imagine you have many larger, complicated programs to analyze with limited amount of time. You could be assisted by this automated technique!
For further technical details, check the official paper published in OSDI'08.
One caveat lying here is that KLEE requires a source code along with an LLVM compiler toolchain to conduct symbolic execution. Then, what if we only have a binary, but still want to do symbolic execution? Angr comes into play in this case.
Angr is a user-friendly binary analysis framework. With its Python API, you can symbolically execute a program and do various analysis, without the existence of a source code.
In this tutorial, we will learn how to find a desired execution path and corresponding input through Angr framework.
Now, you only have a binary that asks you to input a password. Instead of brute-forcing, we can take advantage of Angr's symbolic execution that runs directly on binaries to find the desired input. Let's open crackme0x00.py, and follow its procedure.
crackme0x00.py
cd /tut/tut10-02-symexec/tut2-angr vim crackme0x00.py
Angr's analysis always begins with loading a binary into a Project object. If you want to analyze crackme0x00 binary, do:
import angr proj = angr.Project("crackme0x00")
With Angr, we can specify the target address in the binary that we want to reach, (preferrably a buggy basic block), and have the constraint solver find the corresponding test case by solving the collected path constraints. Let's run gdb and analyze the binary to find the target address:
$ gdb-pwndbg ./crackme0x00 pwndbg> disass main ... 0x08049328 <+112>: push 0x804a095 0x0804932d <+117>: call 0x80491f6 <print_key>
The main function is calling print_key function, and it seems that if we somehow reach there, it would print the flag for us.
print_key
Back to crackme0x00.py. Angr provides a loader, which helps you find symbols from the binary (like what pwntools does):
addr_main = proj.loader.find_symbol("main").rebased_addr addr_target = addr_main + 112 # push 0x804a095
Now that we have the address of main, where we want to start analysis, we can define an initial state as follows:
main
state = proj.factory.entry_state(addr=addr_main)
Simulation manager is a control interface for Angr's symbolic execution. With the defined state, we can initiate this module:
sm = proj.factory.simulation_manager(state)
explore method of the simulation manager lets us symbolically execute the binary until it finds the state satisfyig the find parameter. In this case, addr_target will be given as a parameter. And until the simulation manager finds the path to the addr_target, we can keep stepping through the instructions:
explore
find
addr_target
sm.explore(find=addr_target) while len(sm.found) == 0: sm.step()
If a path is found, it will dump the input and verify the test case:
if (len(sm.found) > 0): print("found!") found_input = sm.found[0].posix.dumps(0) # this is the stdin print(found_input) with open("input-crackme0x00", "wb") as fp: fp.write(found_input)
Now, let's run the script and check if Angr really finds the desired path.
$ ./crackme0x00.py Finding input ... found! b'250381\x00\xd9\xd9..'
Starting from function main @0x080492b8, Angr symbolically executed the crackme0x00 binary to find the state that can reach the basic block at 0x08049328. It successfully found the block, and by solving the path constraints, emitted the test case as "250381\x00..."
main @0x080492b8
0x08049328
Verifying this is straightforward, as we can now concretely execute the binary with the found test case:
$ ./crackme0x00 < input-crackme0x00 IOLI Crackme Level 0x00 Password: Password OK :) FLAG
As expected, the test case printed the flag!
Another interesting example is when the binary has canary implemented. Launch crackme0x00-canary and feed it with different inputs:
crackme0x00-canary
$ ./crackme0x00-canary IOLI Crackme Level 0x00 Password:aaaabbbb Invalid Password! $ ./crackme0x00-canary IOLI Crackme Level 0x00 Password:aaaabbbbccccddddeeee Invalid Password! crackme0x00-canary: *** stack smashing detected ***
In case we provided 20-byte input, the stack smashing seems to be detected through a canary, and because of that, we cannot not control the eip of this binary. In such case, could Angr help us find an input that can even bypass the canary check? (heads up: this binary implements a custom, weak canary, where the value is fixed.)
eip
Let's take a look at crackme0x00-canary.py. The flow of symbolic execution is similar to that of the previous exercise, but note that we have to take advantage of an "unconstrained state" to solve this challenge.
crackme0x00-canary.py
Typically, when the size of a symbolic variable is known, a symbolic executor only considers values within the size. For example, if our character buffer of 16 bytes is marked symbolic, all the symbolic paths are reachable with an input that is shorter than 16 bytes, because the variable is constrained by its size. However, we know that the size of input to be stored in the buffer could be larger than the size, causing some troubles. To test such situation, unconstrained is used:
unconstrained
sm = proj.factory.simulation_manager(save_unconstrained=True) while len(sm.unconstrained) == 0: sm.step()
This lets the simulation manager symbolically execute the target program until an effective unconstrained input (i.e., triggering buffer overflow in this case) is found. We can dump the stdin of this case, and see what happened:
unconstrained_state = sm.unconstrained[0] crashing_input = unconstrained_state.posix.dumps(0) print("found!") print(repr(crashing_input)) with open("input-crackme0x00-canary", "wb") as fp: fp.write(crashing_input)
$ ./crackme0x00-canary.py finding buffer overflow & bypassing static canary ... found! b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x81\x14\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01'
Do you see that the input is long enough to overflow the buffer, overwrite the canary, and even the return address? Also, bytes 17-21 of the input are 0xdeadbeef. Guess what the static canary is?
We can verify this result by running the binary with the dumped input:
$ ./crackme0x00-canary < input-crackme0x00-canary IOLI Crackme Level 0x00 Password:Invalid Password! [1] 13740 segmentation fault (core dumped) ./crackme0x00-canary < input-crackme0x00-canary
It indeed triggered a buffer overflow, and a segmentation fault, which means that Angr also found the canary!
Now you can start examining the core dump, and learn which part of the input should be changed to hijack the control flow:
$ gdb-pwndbg ./crackme0x00-canary core pwndbg> info registers eip eip 0x2148100 0x2148100 pwndbg> quit $ xxd input-crackeme0x00-canary 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000010: efbe adde 0000 0000 0000 0000 0000 0000 ................ 00000020: 0081 1402 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 0000 0000 0000 0101 ............
Now you know where to modify from the input to make the program jump to any place you want. Try making it jump to your shellcode, and print the flag!
Practice writing scripts for symbolic execution using Angr framework against the rest of the crackme binaries. Your task is to find the input that makes each binary print out "Password OK :)".
Let's take a look at another example, pwd.
pwd
$ ./pwd Enter the password: 12345678 Access denied. $ ./pwd ./pwd Enter the password: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Access denied. *** stack smashing detected ***: <unknown> terminated [1] 18317 abort (core dumped) ./pwd
This binary appears to be a password authenticator, and we want to crack it by finding the password string using Angr. Take a look at the given template, pwd_template.py, and fill in the required parts by analyzing the pwd binary.
pwd_template.py
Ultimately, we are looking for the input (i.e., valid password), which will make pwd binary print "Access granted!". FYI, once the path is found, you can print the stdin through:
print("input: {0}".format(sm.active[0].posix.dumps(sys.stdin.fileno())))
[TASK] Analyze the binary, complete and execute the Angr script to find the password, and verify it against the pwd binary.