# designing a memory allocator - interfaces malloc(sz) -> ptr free(ptr) Q1. ptr = malloc(0); ptr == NULL? Q2. ptr = malloc(-1); ptr == NULL? Q3. ptr = malloc(size); *ptr? Q4. free(NULL)? Q5. free(ptr); ptr == NULL? Q6. free(ptr); *ptr? - design goals 1. performance 1. fragmentation (or memory utilization) 1. security V0. simplest/secure malloc (w/ syscall) ============== malloc -> mmap free -> munmap - pros/cons - P (invoking syscall is slow) . F (reuse, no external fragmentation; os handles it for you) ++ S (no UAF, no overflow) V1. fastest/simple malloc =========================== - introduce: gptr -> indicating the top of the heap | | | | e.g., gptr'->+----------+ <- ptr1 ptr1 = malloc(sz1) | | ptr2 = malloc(sz2) | | free(ptr2) | | ptr3 = malloc(sz3) | | gptr'->+----------+ <- ptr2 | | | | | | | | gptr ->+----------+ <- ptr3 | | | | | | | | +----------+ | | | | - impl: def malloc(sz): ptr = gptr gptr += sz return ptr def free(ptr): pass - pros/cons: + P (just add) - F (wasted) + S (no UAF) V2. reuse freed memory (objs) ============================= - (1) introduce: freelist -> a linked list for freed objs (e.g., freelist -> ptr2 -> ptr3 ...) | | | | e.g., gptr'->+----------+ <- ptr1 ptr1 = malloc(sz1) | | ptr2 = malloc(sz2) | | free(ptr2) | | ptr3 = malloc(sz3) | | free(ptr3) gptr'->+----------+ <- ptr2 | FREED | | | | | | | gptr'->+----------+ <- ptr3 | FREED | | | | | | | gptr-> +----------+ | | | | def free(ptr): append(freelist, ptr) return P1. how to use memory efficiently? -> using freed region! P2. how to implement malloc()? -> missing size of free-ed objs - (2) introduce: embedding size (in-meta vs. out-of-band (slow!)) | | +----------+ | SZ1 | gptr'->+----------+ <- ptr1 | | | | | | +----------+ | SZ2 | gptr'->+----------+ <- ptr2 <-- freelist | ptr3 --|------------+ | | | | | | +----------+ | | SZ3 | | gptr'->+----------+ <- ptr3 <--+ | NULL | | | | | | | gptr ->+----------+ | | | | - freelist ---> ptr2: +-> ptr3: sz:[SZ2 ] | [SZ3 ] fd:[ptr3]--+ [NULL] - impl: fn malloc(sz): for f in freelist: # vs == trick to improve the hit ratio? if sz <= f->sz: unlink(f) return f ptr = gptr gptr += sz return ptr fn free(ptr): ptr->fd = freelist freelist = ptr return - pros/cons: - P: slow malloc (iterating freelist) + P: fast free (append) + F: reusing memory (==: no internal) (<=: internal) (yet, external fragmentation; |freed region| > sz) - S: unlink? overflow? V3. handling fragmentation better ================================= - P1. internal fragmentation: e.g., requesting < sz -> splitting - P2. external fragmentation: e.g., requesting >>sz -> merging (aka, fw/bk consolidation) fn malloc(sz): for f in freelist: if sz <= f->sz: unlink(f) rsz = f->sz - sz - 4 if rsz > 0: newobj = f + sz + 4 newobj->sz = rsz append(freelist, newobj) return f ptr = gptr gptr += sz return ptr - P2-1. how to know prev/next objs? - next: ptr + sz - prev: ??? - P2-2. how to know prev/next objs are free? | ... | +----------+ | SZ1 | gptr'->+----------+ <- ptr1 | | | | | | +----------+ | PSZ1 | +----------+ | SZ2 | gptr'->+----------+ <- ptr2 | FREED | | | | | +----------+ | PSZ2 | +----------+ | SZ3 | gptr'->+----------+ <- ptr3 | FREED | | | | | | | gptr ->+----------+ | | | | - introduce psz: - next: ptr + ps - prev: ptr - psz - merging two freed objs fn merge(ptr1, ptr2): // both are continuous objs ptr1->sz += ptr2->sz ptr2->psz += ptr2->sz - impl: fn free(ptr): fd = ptr - ptr->psz bk = ptr + ptr->sz if freed?(fd) and freed?(bk): pop bk from freelist merge(ptr, bk) merge(fd, ptr) return if fd is freed? merge(fd, ptr) return if bk is freed? merge(ptr, bk) return ptr->fd = freelist freelist = ptr return # slow! fn freed?(ptr): for f in freelist: if f == ptr: return true return false - pros/cons: - P: slow malloc() - P: slow! free() + F: no wasted memory! + F: handling fragmentation -- S: ... V4. better fragmentation handling ================================= - introduce In-Use flag (U) to indicate an obj is freed or not | ... | +----------+ | SZ1 U| = 1 gptr'->+----------+ <- ptr1 | | | | | | +----------+ | PSZ1 | +----------+ | SZ2 U| = 0 gptr'->+----------+ <- ptr2 | FREED | | | | | +----------+ | PSZ2 | +----------+ | SZ3 U| = 0 gptr'->+----------+ <- ptr3 | FREED | | | | | | | gptr ->+----------+ | | | | - new in-use check: # fast! fn freed?(ptr): return !(ptr->sz & U) - problem: too many memory accesses in checking (e.g., how to check prev is used or freed?) (ptr - ptr->psz)->sz & U? -------- -- - introduce: in-meta "prev" in-use bit | ... | +----------+ | SZ1 PU| = ? gptr'->+----------+ <- ptr1 | | | | | | +----------+ | PSZ1 | +----------+ | SZ2 PU| = 0 gptr'->+----------+ <- ptr2 | FREED | | | | | +----------+ | PSZ2 | +----------+ | SZ3 PU| = 1 gptr'->+----------+ <- ptr3 | FREED | | | | | | | gptr ->+----------+ | | | | - new check: (ptr->sz & PU) is used in practice - pros/cons: -P: slow malloc() +P: fast free() +F: no wasted memory +F: handling fragmentation --S: ... optimizations ============= (for faster malloc) - bins for free objs of known sizes bin[0]: > 10 bin[1]: > 20 bin[2]: > 30 .. pick one from a first fetch - what if bin[1] is empty for malloc(20) request? 1) keep checking the rest -> worse cast higher but better mem usage 2) skip, then use gptr - bitmap for optimization - fastbins for smaller objs (e.g., single link) - caching freed objs (i.e., unsorted bin) - split freed objs (instead of using the whole freed obj) - mmap() for larger objs - pros/cons: +P: fast malloc/free +F: no wasted memory ---S: fastbin? security bugs ============= 0. heap overflows (crafted in-place metadata) 1. use-after-free (recycled for memory utilization) int *ptr = malloc(size); free(ptr); *ptr; // BUG. use-after-free! 2. double free (binning) char *ptr = malloc(size); free(ptr); free(ptr); // BUG!