Parallel Computing and Distributed Systems

Notes for Parallel Computing and Distributed Systems

Parallel Computing

Stanford CS149

A Modern Multi-Core Processor

Forms of Parallel Execution

  1. Superscalar: 找到不相关的语句然后并行执行
  2. SIMD:ALU 的并行,Single instruction, multiple data
  3. Multi-core:多个 core,所以想咋玩咋玩

SIMD 遇到 branchs?一部分 ALU 需要等待

Interleaved (temporal) multi-threading: 这个任务在 stall 了(比如在 fetch memory),先执行别的任务

Overcoming bandwidth limits is often the most important challenge facing software developers targeting modern throughput-optimized systems.


Distributed System

MIT 6.824

Stanford CS244B

忽然发现 CS244B 的 instructor 是 的一作。

Introduction

You should try everything else before you try distributed systems.

Why distributed systems?

Remote Procedure Call (RPC)

需要比普通的 procedure 多一个 “I don’t know” 的选项。

Interface Definition Languages (IDL): Specify RPC call and return types

External Data Representation Standard (XDR)

Consensus

在讨论 asynchronous systems 的时候,我们会保守地认为网络是很慢的。也就是说,we can’t distinguish failed agent from slow network.

有 \(n\) 个 agents,每个 agent input 一个数字,现在要各个 agent 通过互相交流达成一致,使得每个 agent 的 output 都相同,并且是其中一个 agent 的 input。

Safety: 所有 agent 的 output 都相同(agreement)并且 output 为其中某个 agent 的 input(validity)

Liveness: 所有 non-failed 的 agents 都有输出

Fault tolerance:

FLP impossibility result : No deterministic consensus protocol guarantees all three of safety, liveness, and fault tolerance in an asynchronous system.

Bivalent: An execution of a consensus protocol is in a bivalent state when the network can affect which value agents choose.

Univalent, Valent: An execution of a consensus protocol is in a univalent state when only one output value is possible. If that value is \(i\), call the state \(i\)-valent.

Stuck: An execution of a [broken] consensus protocol is in a stuck state when one or more non-faulty nodes can never output a value.