UC Berkeley CS 161 Computer Security

居然有幸能来线下上课。

这不记个笔记。

Summer 2024


Lecture 1: Introduction and Security Principles

Security Principles

  1. Know your threat model: 谁会来攻击你,为什么要攻击你。Threat Model 指的就是对攻击者建立一个合适的模型(比如他们有多少资源,他们的动机)。You often just need to have “good enough” defense to make attackers turn somewhere else.
  2. Consider Human Factors:考虑用户习惯,别让用户太麻烦。
  3. Security is economics:搞 security 需要米,所以要考虑你保护的是啥。
  4. Detect if You Can’t Prevent:
    • Deterrence: Stop the attack before it happens.
    • Prevention: Stop the attack as it happens.
    • Detection: Learn that there was an attack (after it happened).
    • Response: Do something about the attack (after it happened). 对于 response,需要做的事 Mitigation and Recovery,就是比如在发现勒索软件的时候赶紧进行文件备份和转移。
  5. Defense in depth:多层防御,但是要考虑成本。
  6. Least Privilege:然鹅其实现在的操作系统做的都很遭糕。
  7. Ensure complete mediation:You should check every access to every object. 课上提出了一个叫做 Reference Monitor 的概念,就是一个任何 access 操作都必须经过的结点,比如说 network firewall。用这种方法来保证每个 access 都被检查过。
  8. Separation of Responsibility:例子是原子弹发射,要两个人同时操作才行。
  9. Shannon’s Maxim:你不能依赖于 Security Through Obscurity,也就是通过源代码的保密性来保护系统。想到 Kerckhoff’s Principle?
  10. Use Fail-Safe Defaults:当系统崩溃的时候,应该让系统保持最安全的状态。比如说一些防盗门在断电的时候要自动关上。
  11. Design in Security from the Start

The Trusted Computing Base (TCB)

TCB 指的是 The components of a system that security relies upon。

TCB Design Principles:

  1. Unbypassable (or completeness)
  2. Tamper-resistant (or security):不能改,必须保证 TCB 的完整性
  3. Verifiable (or correctness):TCB 应该设计得越小越好(KISS principle: Keep It Simple, Stupid)

这种让 TCB 和其他系统分离出来的方法让我们能够更方便的 focus on one thing。

TOCTTOU Vulnerabilities

在讲 Ensure Complete Mediation 的时候提到了一个例子挺有意思的,叫做 The time of check to time of use (TOCTTOU) vulnerability。

就是考虑一个 ATM 机提款的操作。假设现在你的银行账户里有 \(1000\) 块钱。一般情况下很直觉地会这样写提款程序:

\begin{algorithm}
\caption{Withdrawal}
\begin{algorithmic}
\PROCEDURE{Withdrawal}{$w$}
  \STATE $b =$ \CALL{GetBalance}{} \COMMENT{Step (1)}
    \IF{$b < w$}
      \STATE Abort
    \ENDIF
    \STATE \CALL{SetBalance}{$b - w$} \COMMENT{Step (2)}
    \STATE \CALL{DispenseCash}{$w$}
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

但这个程序事实上有一个巨大的漏洞,就是假设我现在开两个并行程序,两次都取 \(1000\) 块钱,当第一个程序运行到 (2) 之前时,我让第二个程序运行到 (1)。这时候我们会发现两个程序都能通过 (1) 进入 (2),钱就会被取两次。这样我们能从 \(1000\) 块钱的账户中取出 \(2000\) 块钱。


Lecture 2: x86 Assembly and Call Stack

Number Representation

  • nibble 是一个十六进制数的大小,1 nibble = 4 bits
  • 1 byte = 8 bits
  • word 是指针的大小,32 位下是 32 bits,64 位下是 64 bits

CALL: Compiler, Assembler, Linker, Loader

  • Compiler: 高级语言 -> Assembly Code
  • Assembler: Assembly Code -> Machine Code
  • Linker: Deals with dependencies and libraries
  • Loader: Sets up memory space and runs the machine code

C Memory Layout

讲课的时候考虑的是 32 位机,也就是 memory 是从 0x000000000xFFFFFFFF。其实可以把内存看作一个一维的数组,当然我们通常将其画成一张 \(n\times 4 \text{ bytes}\) 的表。

                          4 bytes
                      |<----------->|
     0xFFFFFFFF       +-------------+
--------------------> |             |
                      |             |
   Higher Address     |             |
         ^            |             |
         |            |             |
         |            |             |
                      |   Memory    |
                      |             |
         |            |             |
         |            |             |
         v            |             |
    Lower Address     |             |
                      |             |
--------------------> |             |
     0x00000000       +-------------+
                      -------------->
                           index

x86 中都是以 Little-endian 存储的,也就是说比如说一个东西是 0x0123456789abcdef,那么他在地址中应该存储为:

+---------------------------+
| 0x67 | 0x45 | 0x23 | 0x01 |
+------+------+------+------+
| 0xef | 0xcd | 0xab | 0x89 |
+---------------------------+
#include <stdio.h>
#include <stdint.h>

int main() {
    union {
        uint64_t num_int64;
        unsigned char num_char[8];
    } num;
    num.num_int64 = 0x0123456789abcdef;
    for (int i = 0; i < 8; i++) {
        printf("%02x ", num.num_char[i]);
    }
    printf("\n");
    return 0;
}

输出:

ef cd ab 89 67 45 23 01

但是数组和 struct 仍然是从小到大的顺序。

然后 Memory 按照一下几个块分:

+-------------------+
|       Stack       |
+~~~~~~~~~~~~~~~~~~~+
|         |         |
|         v         |
|                   |
|         ^         |
|         |         |
+~~~~~~~~~~~~~~~~~~~+
|       Heap        |
+-------------------+
|       Data        |
+-------------------+
|       Code        |
+-------------------+

Function Call

void foo(int, int);

int main() {
  foo(1, 2);
}

void foo(int a, int b) {
  return;
}
main:
  pushq $2
  pushq $1

  call  foo


foo:
  movq  %rsp, %rbp
  subq  $32, %rsp

leave 等价于

mov   %ebp, %esp
pop   %ebp

Lecture 3: Memory Safety Vulnerabilities

Buffer Overflow

void vulnerable(void) {
    char name[20];
    gets(name);
}

gets 时将 SHELLCODE 写入内存,然后覆盖 RIP 使其指向他。

+---------------+             +-------------------------+
|      ...      |             |          ...            |
+---------------+             +-------------------------+
|      RIP      |             |     (RIP)  &SHELLCODE   |------+
+---------------+             +-------------------------+      |
|      SFP      |             |     (SFP)  'AAAA'       |      |
+---------------+             +-------------------------+      |
|     name      |    gets     |     (name) 'AAAA'       |      |
+---------------+  -------->  +-------------------------+      |
|     name      |             |     (name) 'AAAA'       |      |
+---------------+             +-------------------------+      |
|     name      |             |     (name) SHELLCODE    |      |
+---------------+             +-------------------------+      |
|     name      |             |     (name) SHELLCODE    |      |
+---------------+             +-------------------------+      |
|     name      |             |     (name) SHELLCODE    |<-----+
+---------------+             +-------------------------+

当然 SHELLCODE 也可以写在其他地方比如说 RIP 上面这个随心情而定。

diff --git a/vulnerable.c b/vulnerable.c
--- a/vulnerable.c
+++ b/vulnerable.c
@@ -1,4 +1,4 @@
 void vulnerable(void) {
     char name[20];
-    gets(name);
+    fgets(name, 20, stdin);
 }

实操的时候,看 RIP 和 SFP:

(gdb) info frame
Stack level 0, frame at 0x7fffffffd9a0:
 rip = 0x555555555155 in foo (test.c:8); saved rip = 0x555555555140
 called by frame at 0x7fffffffd9b0
 source language c.
 Arglist at 0x7fffffffd990, args: a=1, b=2
 Locals at 0x7fffffffd990, Previous frame's sp is 0x7fffffffd9a0
 Saved registers:
  rbp at 0x7fffffffd990, rip at 0x7fffffffd998

其中 Saved registersrbp 指的是上课讲的 sfpriprip,不太一样的是显示的 rip 是因为机器是 64 位的,是 register 里的 rip,如果是 32 位会显示 ebpeip。这俩相差 $8$ bytes 正好一个 word。

一些 vulnerable 的函数:

  1. gets:改成 fgets
  2. strcpy:改成 strncpy(more compatible, less safe)或者 strlcpy(less compatible, more safe)
  3. strlen:改成 strnlen 或者 memchr

Off-by-One Exploit

这是很多初学者经常犯的错误就是

void vulnerable() {
  char s[32];
  for (int i = 0; i <= 32; i++) {
    scanf("%c", &s[i]);
  }
}

int main() {
  vulnerable();
  return 0;
}

这样允许了我们有一个额外的输入。然而这个输入最多只能允许我们部分控制 SFP

我们需要这样修改:

+--------------------------+                +---------------------------+
|       RIP of main        |                |        RIP of main        |
+--------------------------+                +---------------------------+
|       SFP of main        |<---+           |        SFP of main        |
+--------------------------+    |           +---------------------------+
|    RIP of vulnerable     |    |           |     RIP of vulnerable     |
+--------------------------+    |           +---------------------------+
|    SFP of vulnerable     |----+           |   Fake SFP of vulnerable  |----+
+--------------------------+                +---------------------------+    |
|          x[2]            |      gets      |       Garbage bytes       |    |
+--------------------------+  ----------->  +---------------------------+    |
|          x[1]            |                |      Fake RIP of main     |----+---> SHELLCODE
+--------------------------+                +---------------------------+    |
|          x[0]            |                |      Fake SFP of main     |<---+
+--------------------------+                +---------------------------+

Homework 1

也记录一下做 Homework 时学到的东西。

GDB

(gdb) x/4xw buf
0xbffffdf8:  0xbffffeac  0xb7ffc165  0x00000000  0x00000000

0xbffffdf8buf 的地址,4xw 显示了 $4$ 个 word 的内容。

(gdb) layout split

离开:Ctrl + X 再按 A

切换窗口:Ctrl + X 再按 O

看汇编代码

(gdb) disas main
Dump of assembler code for function main:
   0x00001629 <+0>:     lea    0x4(%esp),%ecx
   0x0000162d <+4>:     and    $0xfffffff0,%esp
   0x00001630 <+7>:     push   -0x4(%ecx)
   0x00001633 <+10>:    push   %ebp
   0x00001634 <+11>:    mov    %esp,%ebp
   0x00001636 <+13>:    push   %ebx
   0x00001637 <+14>:    push   %ecx
   0x00001638 <+15>:    sub    $0x20,%esp
   0x0000163b <+18>:    call   0x1142 <__x86.get_pc_thunk.bx>
   ...