SlideShare une entreprise Scribd logo
1  sur  17
Life after CAP
CAP conjecture [reminder]
• Can only have two of:
– Consistency
– Availability
– Partition-tolerance

• Examples
– Databases, 2PC, centralized algo (C & A)
– Distributed databases, majority protocols (C & P)
– DNS, Bayou (A & P)
CAP theorem
• Formalization by Gilbert & Lynch
• What does impossible mean?
– There exist an execution which violates one of CAP
– not possible to guarantee that an algorithm has
all three at all times
• Shard data with different CAP tradeoffs
• Detect partitions and weaken consistency
Partition-tolerance & availability
• What is partition-tolerance?
– Consistency and Availability are provided by algo
– Partitions are external events (scheduler/oracle)
• Partition-tolerance is really a failure model
• Partition-tolerance equivalent with omissions

• In the CAP theorem
– Proof rests on partitions that never heal
– Datacenters can guarantee recovery of partitions!
• Can guarantee that conflict resolution eventually happens
How do we ensure consistency
• Main technique to be consistent
– Quorum principle
– Example: Majority quorums
• Always write to and read from a majority of nodes
• At least one node knows most recent value
majority(9)=5

WRITE(v)

READ v
Quorum Principle
• Majority Quorum
– Pro: tolerate up to N/2 -1 crashes
– Con: Have to read/write  N/2 +1 values

• Read/write quorums (Dynamo, ZooKeeper, Chain Repl)
– Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)
– Pro: adjust performance of reads/writes
– Con: availability can suffer

• Maekwa Quorum
–
–
–
–

P1

Arrange nodes in a MxM grid
P4
Write to row+col, read cols (always overlap)
P7
Pro: Only need to read/write O( sqrt(N) ) nodes
Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)

P2

P3

P5

P6

P8

P9

7
Probabilistic Quorums
• Quorum size α√N, (α > 1)
intersects with probability 1-exp(α2)
– Example:
– Maekwa:

N=16 nodes, quorum size 7,
intersects 95%, tolerates 9 failures
N=16 nodes, quorum size 7,
intersects 100%, tolerates 4 failures

– Pro: Small quorums, high fault-tolerance
– Con: Could fail to intersect, N usually large
8
Quorums and CAP
• With quorums we can get
– C & P: partition can make quorum unavailable
– C & A: no-partition ensures availability and atomicity

• Faced decision when fail to get quorum *brewer’11+
– Sacrifice availability by waiting for merger
– Sacrifice atomicity by ignoring the quorum

• Can we get CAP for weaker consistency?
What does atomicity really mean?
R

P1
R

P2
P3

W(5)

W(6)
invocation response

• Linearization Points
– Read ops appear as if immediately happened at all nodes at
• time between invocation and response

– Write ops appear as if immediately happened at all nodes at
• time between invocation and response
Definition of Atomicity
• Linearization Points
– Read ops appear as if immediately happened at all nodes at
• time between invocation and response

– Write ops appear as if immediately happened at all nodes at
• time between invocation and response

R:6

P1
R:5

P2
P3

W(5)

W(6)

atomic
Definition of Atomicity
R:6

P1
R:6

P2
P3

W(5)

W(6)
R:5

P1
R:6

P2
P3

atomic

W(5)

W(6)

not atomic
Atomicity too strong?
R:5

P1
R:6

P2
P3

W(5)

not atomic

W(6)

• Linearization points too strong?
– Why not just have R:5 appear atomically right after W(5)?
– Lamport: ”If P2’s operator phones P1 and tells her I just read 6”
Atomicity too strong?
R:5

P1
R:6

P2
P3

W(5)

W(6)

not atomic
sequentially
consistent

• Sequential consistency
–
–
–
–

Weaker than atomicity
Sequential consistency removes this ”real-time” requirement
Any global ordering OK as long as they respect local ordering
Does Gilbert’s proof fall apart for sequential consistency?

• Causal memory
–
–
–
–

Weaker than sequential
No need to have global view, each process different view
Local, read/writes immediately return to caller
CAP theorem does not apply to causal memory

P1
P2

causally
consistent
W(0) R:1

W(1) R:0
Going really weak
• Eventual consistency
– When network non-partitioned, all nodes eventually have the same
value
– I.e. don’t be ”consistent” at all times, but only after partitions heal!

• Based on powerful technique: gossipping
–
–
–
–

Periodically exchange ”logs” with one random node
Exchange must be constant-sized packets
Set reconciliation, merkle trees, etc
Use (clock, node_id) to break ties of events in log

• Properties of gossipping
– All nodes will have the same value in O(log N) time
– No positive-feedback cycles that congest the network
BASE
• Catch all for any consistency model C’ that
enables C’-A-P
– Eventual consistency
– PRAM consistency
– Causal consistency

• Main ingredients
– Stale data
– Soft-state (regenerateable state)
– Approximate answers
Summary
• No need to ensure CAP at all times
– Switch between algorithms or satisfy subset at different times

• Weaken consistency model
– Choose weaker consistency:
• Causal memory (relatively strong) work around CAP

– Only be consistent when network isn’t partitioned:
• Eventual consistency (very weak) works around CAP

• Weaken partition-tolerance
– Some environments never partition, e.g. datacenters
– Tolerate unavailability in small quorums
– Some env. have recovery guarantees (partitions heal within X
hours), perform conflict resolution
Related Work (ignored in talk)
• PRAM consistency (Pipelined RAM)
– Weaker than causal and non-blocking

• Eventual Linearizability (PODC’10)
– Becomes atomic after quiescent periods

• Gossipping & set reconciliation
– Lots of related work

Contenu connexe

Tendances

Clk-to-q delay, library setup and hold time
Clk-to-q delay, library setup and hold timeClk-to-q delay, library setup and hold time
Clk-to-q delay, library setup and hold timeVLSI SYSTEM Design
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Mantis qcon nyc_2015
Mantis qcon nyc_2015Mantis qcon nyc_2015
Mantis qcon nyc_2015neerajrj
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersJustin Dorfman
 

Tendances (7)

ZeroMQ with NodeJS
ZeroMQ with NodeJSZeroMQ with NodeJS
ZeroMQ with NodeJS
 
Clk-to-q delay, library setup and hold time
Clk-to-q delay, library setup and hold timeClk-to-q delay, library setup and hold time
Clk-to-q delay, library setup and hold time
 
Who Broke My Crypto
Who Broke My CryptoWho Broke My Crypto
Who Broke My Crypto
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Scapy talk
Scapy talkScapy talk
Scapy talk
 
Mantis qcon nyc_2015
Mantis qcon nyc_2015Mantis qcon nyc_2015
Mantis qcon nyc_2015
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
 

Similaire à CAP theorem by Ali Ghodsi

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency modelsrogerbodamer
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Tokyo Institute of Technology
 
Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascaleMarc Snir
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 
A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...openseesdays
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Peter Breuer
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at OptimizelyMaking the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at OptimizelyScyllaDB
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixKishore Gopalakrishna
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
Verification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAVerification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAUniversität Rostock
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)NYversity
 

Similaire à CAP theorem by Ali Ghodsi (20)

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
Integrating Cache Oblivious Approach with Modern Processor Architecture: The ...
 
Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascale
 
Ch3-2
Ch3-2Ch3-2
Ch3-2
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...A shared-filesystem-memory approach for running IDA in parallel over informal...
A shared-filesystem-memory approach for running IDA in parallel over informal...
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papers
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at OptimizelyMaking the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Data driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache Helix
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Verification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAVerification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLA
 
Computer network (8)
Computer network (8)Computer network (8)
Computer network (8)
 

Dernier

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

CAP theorem by Ali Ghodsi

  • 2. CAP conjecture [reminder] • Can only have two of: – Consistency – Availability – Partition-tolerance • Examples – Databases, 2PC, centralized algo (C & A) – Distributed databases, majority protocols (C & P) – DNS, Bayou (A & P)
  • 3. CAP theorem • Formalization by Gilbert & Lynch • What does impossible mean? – There exist an execution which violates one of CAP – not possible to guarantee that an algorithm has all three at all times • Shard data with different CAP tradeoffs • Detect partitions and weaken consistency
  • 4. Partition-tolerance & availability • What is partition-tolerance? – Consistency and Availability are provided by algo – Partitions are external events (scheduler/oracle) • Partition-tolerance is really a failure model • Partition-tolerance equivalent with omissions • In the CAP theorem – Proof rests on partitions that never heal – Datacenters can guarantee recovery of partitions! • Can guarantee that conflict resolution eventually happens
  • 5. How do we ensure consistency • Main technique to be consistent – Quorum principle – Example: Majority quorums • Always write to and read from a majority of nodes • At least one node knows most recent value majority(9)=5 WRITE(v) READ v
  • 6. Quorum Principle • Majority Quorum – Pro: tolerate up to N/2 -1 crashes – Con: Have to read/write  N/2 +1 values • Read/write quorums (Dynamo, ZooKeeper, Chain Repl) – Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2) – Pro: adjust performance of reads/writes – Con: availability can suffer • Maekwa Quorum – – – – P1 Arrange nodes in a MxM grid P4 Write to row+col, read cols (always overlap) P7 Pro: Only need to read/write O( sqrt(N) ) nodes Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration) P2 P3 P5 P6 P8 P9 7
  • 7. Probabilistic Quorums • Quorum size α√N, (α > 1) intersects with probability 1-exp(α2) – Example: – Maekwa: N=16 nodes, quorum size 7, intersects 95%, tolerates 9 failures N=16 nodes, quorum size 7, intersects 100%, tolerates 4 failures – Pro: Small quorums, high fault-tolerance – Con: Could fail to intersect, N usually large 8
  • 8. Quorums and CAP • With quorums we can get – C & P: partition can make quorum unavailable – C & A: no-partition ensures availability and atomicity • Faced decision when fail to get quorum *brewer’11+ – Sacrifice availability by waiting for merger – Sacrifice atomicity by ignoring the quorum • Can we get CAP for weaker consistency?
  • 9. What does atomicity really mean? R P1 R P2 P3 W(5) W(6) invocation response • Linearization Points – Read ops appear as if immediately happened at all nodes at • time between invocation and response – Write ops appear as if immediately happened at all nodes at • time between invocation and response
  • 10. Definition of Atomicity • Linearization Points – Read ops appear as if immediately happened at all nodes at • time between invocation and response – Write ops appear as if immediately happened at all nodes at • time between invocation and response R:6 P1 R:5 P2 P3 W(5) W(6) atomic
  • 12. Atomicity too strong? R:5 P1 R:6 P2 P3 W(5) not atomic W(6) • Linearization points too strong? – Why not just have R:5 appear atomically right after W(5)? – Lamport: ”If P2’s operator phones P1 and tells her I just read 6”
  • 13. Atomicity too strong? R:5 P1 R:6 P2 P3 W(5) W(6) not atomic sequentially consistent • Sequential consistency – – – – Weaker than atomicity Sequential consistency removes this ”real-time” requirement Any global ordering OK as long as they respect local ordering Does Gilbert’s proof fall apart for sequential consistency? • Causal memory – – – – Weaker than sequential No need to have global view, each process different view Local, read/writes immediately return to caller CAP theorem does not apply to causal memory P1 P2 causally consistent W(0) R:1 W(1) R:0
  • 14. Going really weak • Eventual consistency – When network non-partitioned, all nodes eventually have the same value – I.e. don’t be ”consistent” at all times, but only after partitions heal! • Based on powerful technique: gossipping – – – – Periodically exchange ”logs” with one random node Exchange must be constant-sized packets Set reconciliation, merkle trees, etc Use (clock, node_id) to break ties of events in log • Properties of gossipping – All nodes will have the same value in O(log N) time – No positive-feedback cycles that congest the network
  • 15. BASE • Catch all for any consistency model C’ that enables C’-A-P – Eventual consistency – PRAM consistency – Causal consistency • Main ingredients – Stale data – Soft-state (regenerateable state) – Approximate answers
  • 16. Summary • No need to ensure CAP at all times – Switch between algorithms or satisfy subset at different times • Weaken consistency model – Choose weaker consistency: • Causal memory (relatively strong) work around CAP – Only be consistent when network isn’t partitioned: • Eventual consistency (very weak) works around CAP • Weaken partition-tolerance – Some environments never partition, e.g. datacenters – Tolerate unavailability in small quorums – Some env. have recovery guarantees (partitions heal within X hours), perform conflict resolution
  • 17. Related Work (ignored in talk) • PRAM consistency (Pipelined RAM) – Weaker than causal and non-blocking • Eventual Linearizability (PODC’10) – Becomes atomic after quiescent periods • Gossipping & set reconciliation – Lots of related work

Notes de l'éditeur

  1. Failed ops appear ascompleted at every node, XORnever occurred at any node