Introduction
============

# administrivia

  - MW4:35-5:55 @Instr Center 219
  - https://tc.gtisc.gatech.edu/bss
  - TA: Byoungyoung Lee (blee@gatech.edu)
  - Piazza sign-up

# grading

  - (20%) paper questions (check submission site)
    - answer to a short hw question
    - ask own questions (see link)
    - due: 10pm the night before

  - (80%) project
    - topics: research idea or cool hacking
      - don't have be a grand idea
      - try risky one (and ask for advices!)
      - pick what makes you most excited
      - relate (apply) security perspectives into your own research!
    - 3min presentation (background, interests, seeking a team or not)
      announce teams, group of 2-3 (solo is fine)
    - (20%) team proposal & presentation
    - (30%) final demo (peer reviews)
    - (30%) write-up
    - NOTE. brainstorming session: Oct. 2-3
    
  - informal class: interrupt, ask questions, point out mistakes
  - introduction
  - quick survey:
    - phds? new phds? master? undergraduate?
    - background: security, os, crypto, networking

# learning objectives

  1. understand state-of-the-art of system security researches
  2. propose, facilitate, and complete a project
    - research (for phd students)
    - cool hacking (for master/undergraduate students)

# materials

  - papers in systems security
  - see online, but preliminary
  - send us email, if you find an interesting one
  - + series of guest lectures (speakers in top-tier security conferences)

# topics

  - memory safety: 
    - attacks: control-flow hijacking, sop
    - kernel bugs/exploits
    - language-based security: hail
  - mitigation:
    - ASLR &c (in modern OSes)
    - side-channel (workaround)
  - mechanisms:
    - attacks: buf overflow, use-after-free &c
    - cfi, sfi, capability systems &c
  - hardware security:
    - intel sgx
    - tpm
    - arm trusted zone
  - bug findings:
    - integer overflows (static analyzer) 
    - symbolic executions
    - fuzzing &c
  - web:
    - attack: xss, sql &c
    - auditing
    - new design of web
  - anonymous network:
    - tor (w/ attacks)

# what is security (or secure systems)?

  - achieving some goals in the presence of an adversary
    (i.e., will the system work when there's an adversary?)

  - high-level plan for thinking about security:
    1. policy: the goal you want to achieve
      (e.g. only Alice should read file F)
      - common goals: confidentiality, integrity, availability
    2. threat model: assumptions about what the attacker could do
      (e.g. can guess passwords, cannot physically grab file server)
      - better to err on the side of assuming attacker can do something
    3. mechanism: knobs that your system provides to help uphold policy
      (e.g. user accounts, passwords, file permissions, encryption)
      - part of trusted computing base (TCB)

  - resulting goal: prevent adversary from violating policy within threat model
    - note that goal has nothing to say about mechanism

# why is security hard? negative goal

  - anything (police, threat model, mechanism) can go wrong
  - difficult to think of all possible ways that attacker might break in
  - weakest link matters
  - "realistic threat models" are open-ended (almost negative models)
    contrast: easy to check whether a positive goal is upheld,
    (e.g., Alice can actually read file F.)

# running wikipedia

  - policy:
    - only logined/approved users can edit
    - only admin can delete
    - version history on page contents
  - threat model:
    - attacker can create/login (access to all of special pages)
    - attacker can edit
    - attacker can update files (media)
    - attacker can guess legitimate user's paswds
    - attacker can send any (craft) packets
  - mechanism:
    - enforce login/edit policy
    - TCB: mediawiki, php, apache, mysql, kernel, cpu, ram ...

  - XXX. xss? browsers? (gotcha: now you are users of wikipedia)

# what adversaries want?

  - steal data (applicable to wikipedia?)
    - blocked ips, user's credentials &c
  - own machine: root exploits (install backdoors, rootkit)
  - botnet: sending spam, mining bitcoins, mounting DDos
  - DoS
  - attack other machines behind the firewall

# case study: buffer overflows in webserver

    void read_req(void) {
      char buf[128];
      int i;
      gets(buf);
      i = atoi(buf);
    }

    <read_req>:
       push   %ebp               ; save old frame
       mov    %esp -> %ebp       ; new frame
     * sub    132, %esp          ; 128 (sizeof(buf)) + 4 (sizeof(i)) = 132
       lea    -128(%ebp) -> %eax ; eax = &buf
       push   %eax
       call   gets()             ; get user's input and write to buf
       ...
       mov    %ebp -> %esp       ; restore stack
       pop    %ebp               ; restore old frame
       ret                       ; return to the caller

  - what does the compiler generate in terms of memory layout?
    x86 stack:
      - stack grows down.
      %esp points to the last (bottom-most) valid thing on the stack.
      %ebp points to the caller's %esp value.

    > up to read_req's prologue (*)

    0x00
      ^          ...
      |  +------------------+ <- esp
         |        i         |
         +------------------+
         |     buf[0:3]     |
         |      ...         |
         |                  |
         +------------------+ <- ebp
         |    saved ebp     |
         +------------------+
         |   return addr    |
         +------------------+

    > call gets(buf)

    0x00
      ^         ....
      |  +------------------+ <- esp
         |   return addr.   |
         +------------------+
         |      &buf        |
         +------------------+
         |        i         |
         +------------------+            +----------+<--+
         |     buf[0:3]     |            | shell    |   |
         |      ...         |            | code     |   |
         |                  |            |          |   |
         +------------------+ <- ebp     |          |   |
         |    saved ebp     |            |          |   |
         +------------------+            |          |   |
         |   return addr    |            |//////////|---+
         +------------------+            +----------+

   - how does the attacker take advantage of this?
     - supply long input, overwrite data on stack past buffer, change ret addr
     - set return address to be &buf[0]

   - how do we guess the address of where the code is?  or where our buffer is?
        what happens when the same application is run on different machines?
        how much do you have to know about the machine you're attacking?
        one machine might have twice as much memory as another
        does this change memory addresses?
            no, virtual memory helps us
            addresses depend largely on software versions

    - why would you write such bad code?
        well, even if you don't, libc has plenty of unsafe functions
            strcpy, gets, scanf
        even the safe versions aren't always safe
            strncpy leaves the buffer without null-termination

  - how does the adversary know the address of the buffer?
    - luckily for adversary, virtual memory makes things more deterministic.
    - For a given OS and program, addresses will often be the same.

  - possible attacks & research

     jump to -+-> injected code  --> stack (NX, ASLR)
              |   <buf overflow> --> heap  (hard due to jit, nop sled)
              +-> libc
              |   <ret-to-libc>
              +-> existing code (remove gadgets)
              |   <return oriented programming>
              |   <lots of variants: jop, sop>
              +-> original caller
                  <enforce: cfi>
    
  - how can we protect the return address?
    - shadow stack
    - terminator canary: null, nl, lf, -1 (0x00ff0a0d)

  - what happens if stack grows up, instead of down?

         +------------------+
         |   return addr.   |
         +------------------+
         |    saved ebp     |
         +------------------+ <---+      +----------+<--+
         |     buf[0:3]     |     |      | shell    |   |
         |      ...         |     |      | code     |   |
         |                  |     |      |          |   |
         +------------------+     |      |          |   |
         |        i         |     |      |          |   |
         +------------------+     |      |          |   |
         |      &buf        | ----+      |          |   |
         +------------------+            |          |   |
         |   return addr.   |            |//////////|---+
         +------------------+            +----------+
         |    saved ebp     |
       | +------------------+
       v        ...

  - why would programmers write such code?
    - legacy code, wasn't exposed to the internet
    - programmers were not thinking about security
    - many standard functions used to be unsafe (strcpy, gets, sprintf).
    - even safe versions have gotchas (strncpy does not null-terminate).

  - any memory errors can translate into a vulnerability
    - use-after-free: using memory after it has been deallocated
      - might use corrupted function/data ptr later
      - e.g., see bug bounties of chrome
    - double-free: freeing the same memory twice
      - might cause malloc to later return the same memory twice
    - decrementing the stack ptr past the end of stack, into some other memory
      - e.g., xorg (w/ root)
    - might not even need to overwrite a return address or function pointer
      - can suffice to read sensitive data like an encryption key
      - can suffice to change some bits (e.g. int isLoggedIn, int isRoot)
      - e.g., heartbleed (info leak)