SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Designing Large­Scale 
 Distributed Systems


   Ashwani Priyedarshi
“the network is the computer.”

 John Gage, Sun Microsystems
“A distributed system is one in which the failure 
 of a computer you didn’t even know existed can 
       render your own computer unusable.”

                 Leslie Lamport
“Of three properties of distributed data systems­ 
 consistency, availability, partition­tolerance – 
                   choose two.”

       Eric Brewer, CAP Theorem, PODC 2000
Agenda
●   Consistency Models
●   Transactions
●   Why to distribute?
●   Decentralized Architecture
●   Design Techniques & Tradeoffs
●   Few Real World Examples
●   Conclusions
Consistency Model
• Restricts possible values that a read operation on 
  an item can return
  – Some are very restrictive, others are less
  – The less restrictive ones are easier to implement


• The most natural semantic for storage system is ­ 
  "read should return the last written value”
  – In case of concurrent accesses and multiple replicas, it's 
    not easy to identify what "last write" means
Strict Consistency
●   Assumes the existence of absolute global time
●   It is impossible to implement on a large distributed 
    system
●   No two operations (in different clients) allowed at the 
    same time
●   Example: Sequence (a) satisfies strict consistency, but 
    sequence (b) does not
Sequential Consistency
●   The result of any execution is the same as if 
     ●   the read and write operations by all processes on the data 
         store were executed in some sequential order
     ●   the operations of each individual process appear in this 
         sequence in the order specified by its program
●   All processes see the same interleaving of operations
●   Many interleavings are valid
●   Different runs of a program might act differently
●   Example: Sequence (a) satisfies sequential consistency, 
    but sequence (b) does not
Consistency vs Availability
•   In large shared­data distributed systems, network 
    partitions are a given

•   Consistency or Availability

•   Both options require the client developer to be aware 
    of what the system is offering
Eventual Consistency
•   An eventual consistent storage system guarantees that 
    if no new updates are made to the object, eventually 
    all accesses will return the last updated value

•   If no failures occur, the maximum size of the 
    inconsistency window can be determined based on factors 
    such as:
    – load on the system
    – communication delays
    – number of replicas


•   The most popular system that implements eventual 
    consistency is DNS
Quorum­based Technique 
•   To enforce consistent operation in a distributed 
    system.
•   Consider the following parameters:
    – N = Total number of replicas
    – W = Replicas to wait for acknowledgement during writes
    – R = Replicas to access during reads
•   If W+R > N
    – the read set and the write set always overlap and one can 
      guarantee strong consistency
•   If W+R <= N
    – the read and write set might not overlap and consistency 
      cannot be guaranteed
Agenda
●   Consistency Models
●   Transactions
●   Why to distribute?
●   Decentralized Architecture
●   Design Techniques & Tradeoffs
●   Few Real World Examples
●   Conclusions
Transactions
●   Extended form of consistency across multiple operations
●   Example: Transfer money from A to B
    ●   Subtract from A
    ●   Add to B
●   What if something happens in between?
    ●   Another transaction on A or B
    ●   Machine Crashes
    ●   ...
Why Transactions?
●   Correctness
●   Consistency
●   Enforce Invariants
●   ACID
Agenda
●   Consistency Models
●   Transactions
●   Why to distribute?
●   Decentralized Architecture
●   Design Techniques & Tradeoffs
●   Few Real World Examples
●   Conclusions
Why to distribute?
●   Catastrophic Failures
●   Expected Failures
●   Routine Maintenance
●   Geolocality
    ●   CDN, edge caching
Why NOT to distribute?
●   Within a Datacenter
    ●   High bandwidth: 1­100Gbps interconnects
    ●   Low latency: < 1ms within a rack, < 5ms across
    ●   Little to no cost
●   Between Datacenters
    ●   Low bandwidth: 10Mbps­1Gbps
    ●   High latency: expect 100s of ms
    ●   High Cost for fiber
Agenda
●   Consistency Models
●   Transactions
●   Why to distribute?
●   Decentralized Architecture
●   Design Techniques & Tradeoffs
●   Few Real World Examples
●   Conclusions
Decentralized Architecture
●   Operating from multiple data­centers simultaneously
●   Hard problem
●   Maintaining consistency? Harder
●   Transactions? Hardest
Option 1: Don't
●   Most common
    ●   Make sure data­center never goes down
●   Bad at catastrophic failure
    ●   Large scale data loss
●   Not great for serving
    ●   No geolocation
Option 2: Primary with hot 
failover(s)
●   Better, but not ideal
    ●   Mediocre at catastrophic failure
    ●   Window of lost data
    ●   Failover data may be inconsistent
●   Geolocated for reads, not for writes
Option 3: Truly Distributed
●   Simultaneous writes in different DCs, maintaining 
    consistency
●   Two­way: Hard
●   N­way: Harder
●   Handles catastrophic failure, geolocality
●   But high latency
Agenda
●   Consistency Models
●   Transactions
●   Why to distribute?
●   Decentralized Architecture
●   Design Techniques & Tradeoffs
●   Few Real World Examples
●   Conclusions
Tradeoffs

               Backups   M/S   MM   2PC   Paxos
Consistency
Transactions
Latency
Throughput
Data Loss
Failover
Backups
●   Make a copy
●   Weak consistency
●   Usually no transactions
Tradeoffs – Backups

                    Backups   M/S   MM   2PC   Paxos
Consistency    Weak
Transactions   No
Latency        Low
Throughput     High
Data Loss      High
Failover       Down
Master/slave replication
●   Usually asynchronous
    ●   Good for throughput, latency
●   Weak/eventual consistency
●   Support transactions
Tradeoffs – Master/Slave

                    Backups          M/S   MM   2PC   Paxos
Consistency    Weak           Eventual
Transactions   No             Full
Latency        Low            Low
Throughput     High           High
Data Loss      High           Some
Failover       Down           Read Only
Multi­master replication
●   Asynchronous, eventual consistency
●   Concurrent writes
●   Need serialization protocol
    ●   e.g. monotonically increasing timestamps
    ●   Either with master election or distributed consensus protocol
●   No strong consistency
●   No global transactions
Tradeoffs ­ Multi­master

                    Backups          M/S           MM   2PC   Paxos
Consistency    Weak           Eventual     Eventual
Transactions   No             Full         Local
Latency        Low            Low          Low
Throughput     High           High         High
Data Loss      High           Some         Some
Failover       Down           Read Only    Read/write
Two Phase Commit
●   Semi­distributed consensus protocol
    ●   deterministic coordinator
●   1: Request 2: Commit/Abort
●   Heavyweight, synchronous, high latency
●   3PC: Asynchronous (One extra round trip)
●   Poor Throughput
Tradeoffs ­ 2PC

                    Backups          M/S           MM          2PC   Paxos
Consistency    Weak           Eventual     Eventual     Strong
Transactions   No             Full         Local        Full
Latency        Low            Low          Low          High
Throughput     High           High         High         Low
Data Loss      High           Some         Some         None
Failover       Down           Read Only    Read/write   Read/write
Paxos
●   Decentralized, distributed consensus protocol
●   Protocol similar to 2PC/3PC
    ●   Lighter, but still high latency
●   Three class of agents: proposers, acceptors, learners
●   1. a) prepare b) promise 2. a) accept b) accepted 
●   Survives minority failure
Tradeoffs

                    Backups          M/S           MM          2PC      Paxos
Consistency    Weak           Eventual     Eventual     Strong       Strong
Transactions   No             Full         Local        Full         Full
Latency        Low            Low          Low          High         High
Throughput     High           High         High         Low          Medium
Data Loss      High           Some         Some         None         None
Failover       Down           Read Only    Read/write   Read/write   Read/write
Agenda
●   Consistency Models
●   Transactions
●   Why to distribute?
●   Decentralized Architecture
●   Design Techniques & Tradeoffs
●   Few Real World Examples
●   Conclusions
Examples
●   Megastore
    ●   Google's Scalable, Highly Available Datastore
    ●   Strong Consistency, Paxos
    ●   Optimized for reads
●   Dynamo
    ●   Amazon’s Highly Available Key­value Store
    ●   Eventual Consistency, Consistent Hashing, Vector Clocks
    ●   Optimized for writes
●   PNUTS
    ●   Yahoo's Massively Parallel & Distributed Database System
    ●   Timeline Consistency 
    ●   Optimized for reads
Conclusions
●   No silver bullet
    ●   There are no simple solutions
●   Design systems based on application needs
The End
Backup Slides
Vector Clocks
• Used to capture causality between different 
  versions of the same object.
• A vector clock is a list of (node, counter) pairs.
• Every version of every object is associated with 
  one vector clock.
• If the counters on the first object’s clock are 
  less­than­or­equal to all of the nodes in the 
  second clock, then the first is an ancestor of the 
  second and can be forgotten.
Vector Clock Example
Partitioning Algorithm

• Consistent hashing:
  – The output range of a hash 
    function is treated as a 
    fixed circular space or 
    “ring”.
• Virtual Nodes
  – Each node can be responsible 
    for more than one virtual 
    node.
  – When a new node is added, it 
    is assigned multiple 
    positions.
  – Various Advantages

Contenu connexe

Tendances

Error in hadoop
Error in hadoopError in hadoop
Error in hadoopLen Bass
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory SystemsArush Nagpal
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed SystemsDr Sandeep Kumar Poonia
 
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest SystemsBig Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systemsaaamase
 
Inerview Quesion on Data Mining and Machine Learning
Inerview Quesion on Data Mining and Machine LearningInerview Quesion on Data Mining and Machine Learning
Inerview Quesion on Data Mining and Machine LearningYash Diwakar
 
Distributed Processing
Distributed ProcessingDistributed Processing
Distributed ProcessingImtiaz Hussain
 
Database replication
Database replicationDatabase replication
Database replicationArslan111
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed SystemsRicha Singh
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory SystemsAnkit Gupta
 
Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel ComputingDavid Chou
 
Database architecture
Database architectureDatabase architecture
Database architecture1Arun_Pandey
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Nfr testing(performance)
Nfr testing(performance)Nfr testing(performance)
Nfr testing(performance)Dilip Sharma
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 

Tendances (20)

Error in hadoop
Error in hadoopError in hadoop
Error in hadoop
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
 
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest SystemsBig Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
Big Data Day LA 2015 - Lessons Learned Designing Data Ingest Systems
 
Inerview Quesion on Data Mining and Machine Learning
Inerview Quesion on Data Mining and Machine LearningInerview Quesion on Data Mining and Machine Learning
Inerview Quesion on Data Mining and Machine Learning
 
Distributed Processing
Distributed ProcessingDistributed Processing
Distributed Processing
 
Database replication
Database replicationDatabase replication
Database replication
 
Chap 4
Chap 4Chap 4
Chap 4
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
 
Database architecture
Database architectureDatabase architecture
Database architecture
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Nfr testing(performance)
Nfr testing(performance)Nfr testing(performance)
Nfr testing(performance)
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 

Similaire à Designing large scale distributed systems

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistencyseldo
 
Intro to distributed systems
Intro to distributed systemsIntro to distributed systems
Intro to distributed systemsAhmed Soliman
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Otávio Carvalho
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokesGagan Bajpai
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world dataAthira Mukundan
 
Sistemas Distribuidos
Sistemas DistribuidosSistemas Distribuidos
Sistemas DistribuidosLocaweb
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibrarySebastian Andrasoni
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouMariaDB plc
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...javier ramirez
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IOPiyush Katariya
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis
 

Similaire à Designing large scale distributed systems (20)

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Intro to distributed systems
Intro to distributed systemsIntro to distributed systems
Intro to distributed systems
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
Sistemas Distribuidos
Sistemas DistribuidosSistemas Distribuidos
Sistemas Distribuidos
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IO
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Designing large scale distributed systems

  • 5. Agenda ● Consistency Models ● Transactions ● Why to distribute? ● Decentralized Architecture ● Design Techniques & Tradeoffs ● Few Real World Examples ● Conclusions
  • 6. Consistency Model • Restricts possible values that a read operation on  an item can return – Some are very restrictive, others are less – The less restrictive ones are easier to implement • The most natural semantic for storage system is ­  "read should return the last written value” – In case of concurrent accesses and multiple replicas, it's  not easy to identify what "last write" means
  • 7. Strict Consistency ● Assumes the existence of absolute global time ● It is impossible to implement on a large distributed  system ● No two operations (in different clients) allowed at the  same time ● Example: Sequence (a) satisfies strict consistency, but  sequence (b) does not
  • 8. Sequential Consistency ● The result of any execution is the same as if  ● the read and write operations by all processes on the data  store were executed in some sequential order ● the operations of each individual process appear in this  sequence in the order specified by its program ● All processes see the same interleaving of operations ● Many interleavings are valid ● Different runs of a program might act differently ● Example: Sequence (a) satisfies sequential consistency,  but sequence (b) does not
  • 9. Consistency vs Availability • In large shared­data distributed systems, network  partitions are a given • Consistency or Availability • Both options require the client developer to be aware  of what the system is offering
  • 10. Eventual Consistency • An eventual consistent storage system guarantees that  if no new updates are made to the object, eventually  all accesses will return the last updated value • If no failures occur, the maximum size of the  inconsistency window can be determined based on factors  such as: – load on the system – communication delays – number of replicas • The most popular system that implements eventual  consistency is DNS
  • 11. Quorum­based Technique  • To enforce consistent operation in a distributed  system. • Consider the following parameters: – N = Total number of replicas – W = Replicas to wait for acknowledgement during writes – R = Replicas to access during reads • If W+R > N – the read set and the write set always overlap and one can  guarantee strong consistency • If W+R <= N – the read and write set might not overlap and consistency  cannot be guaranteed
  • 12. Agenda ● Consistency Models ● Transactions ● Why to distribute? ● Decentralized Architecture ● Design Techniques & Tradeoffs ● Few Real World Examples ● Conclusions
  • 13. Transactions ● Extended form of consistency across multiple operations ● Example: Transfer money from A to B ● Subtract from A ● Add to B ● What if something happens in between? ● Another transaction on A or B ● Machine Crashes ● ...
  • 14. Why Transactions? ● Correctness ● Consistency ● Enforce Invariants ● ACID
  • 15. Agenda ● Consistency Models ● Transactions ● Why to distribute? ● Decentralized Architecture ● Design Techniques & Tradeoffs ● Few Real World Examples ● Conclusions
  • 16. Why to distribute? ● Catastrophic Failures ● Expected Failures ● Routine Maintenance ● Geolocality ● CDN, edge caching
  • 17. Why NOT to distribute? ● Within a Datacenter ● High bandwidth: 1­100Gbps interconnects ● Low latency: < 1ms within a rack, < 5ms across ● Little to no cost ● Between Datacenters ● Low bandwidth: 10Mbps­1Gbps ● High latency: expect 100s of ms ● High Cost for fiber
  • 18. Agenda ● Consistency Models ● Transactions ● Why to distribute? ● Decentralized Architecture ● Design Techniques & Tradeoffs ● Few Real World Examples ● Conclusions
  • 19. Decentralized Architecture ● Operating from multiple data­centers simultaneously ● Hard problem ● Maintaining consistency? Harder ● Transactions? Hardest
  • 20. Option 1: Don't ● Most common ● Make sure data­center never goes down ● Bad at catastrophic failure ● Large scale data loss ● Not great for serving ● No geolocation
  • 21. Option 2: Primary with hot  failover(s) ● Better, but not ideal ● Mediocre at catastrophic failure ● Window of lost data ● Failover data may be inconsistent ● Geolocated for reads, not for writes
  • 22. Option 3: Truly Distributed ● Simultaneous writes in different DCs, maintaining  consistency ● Two­way: Hard ● N­way: Harder ● Handles catastrophic failure, geolocality ● But high latency
  • 23. Agenda ● Consistency Models ● Transactions ● Why to distribute? ● Decentralized Architecture ● Design Techniques & Tradeoffs ● Few Real World Examples ● Conclusions
  • 24. Tradeoffs Backups M/S MM 2PC Paxos Consistency Transactions Latency Throughput Data Loss Failover
  • 25. Backups ● Make a copy ● Weak consistency ● Usually no transactions
  • 26. Tradeoffs – Backups Backups M/S MM 2PC Paxos Consistency Weak Transactions No Latency Low Throughput High Data Loss High Failover Down
  • 27. Master/slave replication ● Usually asynchronous ● Good for throughput, latency ● Weak/eventual consistency ● Support transactions
  • 28. Tradeoffs – Master/Slave Backups M/S MM 2PC Paxos Consistency Weak Eventual Transactions No Full Latency Low Low Throughput High High Data Loss High Some Failover Down Read Only
  • 29. Multi­master replication ● Asynchronous, eventual consistency ● Concurrent writes ● Need serialization protocol ● e.g. monotonically increasing timestamps ● Either with master election or distributed consensus protocol ● No strong consistency ● No global transactions
  • 30. Tradeoffs ­ Multi­master Backups M/S MM 2PC Paxos Consistency Weak Eventual Eventual Transactions No Full Local Latency Low Low Low Throughput High High High Data Loss High Some Some Failover Down Read Only Read/write
  • 31. Two Phase Commit ● Semi­distributed consensus protocol ● deterministic coordinator ● 1: Request 2: Commit/Abort ● Heavyweight, synchronous, high latency ● 3PC: Asynchronous (One extra round trip) ● Poor Throughput
  • 32. Tradeoffs ­ 2PC Backups M/S MM 2PC Paxos Consistency Weak Eventual Eventual Strong Transactions No Full Local Full Latency Low Low Low High Throughput High High High Low Data Loss High Some Some None Failover Down Read Only Read/write Read/write
  • 33. Paxos ● Decentralized, distributed consensus protocol ● Protocol similar to 2PC/3PC ● Lighter, but still high latency ● Three class of agents: proposers, acceptors, learners ● 1. a) prepare b) promise 2. a) accept b) accepted  ● Survives minority failure
  • 34. Tradeoffs Backups M/S MM 2PC Paxos Consistency Weak Eventual Eventual Strong Strong Transactions No Full Local Full Full Latency Low Low Low High High Throughput High High High Low Medium Data Loss High Some Some None None Failover Down Read Only Read/write Read/write Read/write
  • 35. Agenda ● Consistency Models ● Transactions ● Why to distribute? ● Decentralized Architecture ● Design Techniques & Tradeoffs ● Few Real World Examples ● Conclusions
  • 36. Examples ● Megastore ● Google's Scalable, Highly Available Datastore ● Strong Consistency, Paxos ● Optimized for reads ● Dynamo ● Amazon’s Highly Available Key­value Store ● Eventual Consistency, Consistent Hashing, Vector Clocks ● Optimized for writes ● PNUTS ● Yahoo's Massively Parallel & Distributed Database System ● Timeline Consistency  ● Optimized for reads
  • 37. Conclusions ● No silver bullet ● There are no simple solutions ● Design systems based on application needs
  • 39.
  • 41. Vector Clocks • Used to capture causality between different  versions of the same object. • A vector clock is a list of (node, counter) pairs. • Every version of every object is associated with  one vector clock. • If the counters on the first object’s clock are  less­than­or­equal to all of the nodes in the  second clock, then the first is an ancestor of the  second and can be forgotten.
  • 43. Partitioning Algorithm • Consistent hashing: – The output range of a hash  function is treated as a  fixed circular space or  “ring”. • Virtual Nodes – Each node can be responsible  for more than one virtual  node. – When a new node is added, it  is assigned multiple  positions. – Various Advantages