CAP theorem by Ali Ghodsi

CAP conjecture [reminder]
• Can only have two of:
– Consistency
– Availability
– Partition-tolerance

• Examples
– Databases, 2PC, centralized algo (C & A)
– Distributed databases, majority protocols (C & P)
– DNS, Bayou (A & P)

CAP theorem
• Formalization by Gilbert & Lynch
• What does impossible mean?
– There exist an execution which violates one of CAP
– not possible to guarantee that an algorithm has
all three at all times
• Shard data with different CAP tradeoffs
• Detect partitions and weaken consistency

Partition-tolerance & availability
• What is partition-tolerance?
– Consistency and Availability are provided by algo
– Partitions are external events (scheduler/oracle)
• Partition-tolerance is really a failure model
• Partition-tolerance equivalent with omissions

• In the CAP theorem
– Proof rests on partitions that never heal
– Datacenters can guarantee recovery of partitions!
• Can guarantee that conflict resolution eventually happens

How do we ensure consistency
• Main technique to be consistent
– Quorum principle
– Example: Majority quorums
• Always write to and read from a majority of nodes
• At least one node knows most recent value
majority(9)=5

WRITE(v)

READ v

Quorum Principle
• Majority Quorum
– Pro: tolerate up to N/2 -1 crashes
– Con: Have to read/write  N/2 +1 values

• Read/write quorums (Dynamo, ZooKeeper, Chain Repl)
– Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)
– Pro: adjust performance of reads/writes
– Con: availability can suffer

• Maekwa Quorum
–
–
–
–

P1

Arrange nodes in a MxM grid
P4
Write to row+col, read cols (always overlap)
P7
Pro: Only need to read/write O( sqrt(N) ) nodes
Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)

P2

P3

P5

P6

P8

P9

7

Probabilistic Quorums
• Quorum size α√N, (α > 1)
intersects with probability 1-exp(α2)
– Example:
– Maekwa:

N=16 nodes, quorum size 7,
intersects 95%, tolerates 9 failures
N=16 nodes, quorum size 7,
intersects 100%, tolerates 4 failures

– Pro: Small quorums, high fault-tolerance
– Con: Could fail to intersect, N usually large
8

Quorums and CAP
• With quorums we can get
– C & P: partition can make quorum unavailable
– C & A: no-partition ensures availability and atomicity

• Faced decision when fail to get quorum *brewer’11+
– Sacrifice availability by waiting for merger
– Sacrifice atomicity by ignoring the quorum

• Can we get CAP for weaker consistency?

What does atomicity really mean?
R

P1
R

P2
P3

W(5)

W(6)
invocation response

• Linearization Points
– Read ops appear as if immediately happened at all nodes at
• time between invocation and response

– Write ops appear as if immediately happened at all nodes at

Definition of Atomicity
• Linearization Points
– Read ops appear as if immediately happened at all nodes at

– Write ops appear as if immediately happened at all nodes at

R:6

P1
R:5

P2
P3

W(5)

W(6)

atomic

Definition of Atomicity
R:6

P1
R:6

P2
P3

W(5)

W(6)
R:5

P1
R:6

P2
P3

atomic

W(5)

W(6)

not atomic

Atomicity too strong?
R:5

P1
R:6

P2
P3

W(5)

not atomic

W(6)

• Linearization points too strong?
– Why not just have R:5 appear atomically right after W(5)?
– Lamport: ”If P2’s operator phones P1 and tells her I just read 6”

Atomicity too strong?
R:5

P1
R:6

P2
P3

W(5)

W(6)

not atomic
sequentially
consistent

• Sequential consistency
–
–
–
–

Weaker than atomicity
Sequential consistency removes this ”real-time” requirement
Any global ordering OK as long as they respect local ordering
Does Gilbert’s proof fall apart for sequential consistency?

• Causal memory
–
–
–
–

Weaker than sequential
No need to have global view, each process different view
Local, read/writes immediately return to caller
CAP theorem does not apply to causal memory

P1
P2

causally
consistent
W(0) R:1

W(1) R:0

Going really weak
• Eventual consistency
– When network non-partitioned, all nodes eventually have the same
value
– I.e. don’t be ”consistent” at all times, but only after partitions heal!

• Based on powerful technique: gossipping
–
–
–
–

Periodically exchange ”logs” with one random node
Exchange must be constant-sized packets
Set reconciliation, merkle trees, etc
Use (clock, node_id) to break ties of events in log

• Properties of gossipping
– All nodes will have the same value in O(log N) time
– No positive-feedback cycles that congest the network

BASE
• Catch all for any consistency model C’ that
enables C’-A-P
– Eventual consistency
– PRAM consistency
– Causal consistency

• Main ingredients
– Stale data
– Soft-state (regenerateable state)
– Approximate answers

Summary
• No need to ensure CAP at all times
– Switch between algorithms or satisfy subset at different times

• Weaken consistency model
– Choose weaker consistency:
• Causal memory (relatively strong) work around CAP

– Only be consistent when network isn’t partitioned:
• Eventual consistency (very weak) works around CAP

• Weaken partition-tolerance
– Some environments never partition, e.g. datacenters
– Tolerate unavailability in small quorums
– Some env. have recovery guarantees (partitions heal within X
hours), perform conflict resolution

Related Work (ignored in talk)
• PRAM consistency (Pipelined RAM)
– Weaker than causal and non-blocking

• Eventual Linearizability (PODC’10)
– Becomes atomic after quiescent periods

• Gossipping & set reconciliation
– Lots of related work

CAP theorem by Ali Ghodsi

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (7)

Similaire à CAP theorem by Ali Ghodsi

Similaire à CAP theorem by Ali Ghodsi (20)

Dernier

Dernier (20)

CAP theorem by Ali Ghodsi

Notes de l'éditeur