SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Fault tolerance in
                               Cassandra
                                       Richard Low

                                 richard@acunu.com
                                       @acunu
                                    @richardalow


Cassandra London Meetup, 5 Sept 2011

Tuesday, 6 September 2011
Menu

                   • Failure modes
                   • Maintaining availability
                   • Recovery


Tuesday, 6 September 2011
Failure modes




Tuesday, 6 September 2011
Failures are the norm


                   • With more than a few nodes, something
                            goes wrong all the time
                   • Don’t want to be down all the time


Tuesday, 6 September 2011
Failure causes

                   • Hardware failure
                   • Bug
                   • Power
                   • Natural disaster

Tuesday, 6 September 2011
Failure modes
                   • Data centre failure
                   • Node failure
                   • Disk failure



Tuesday, 6 September 2011
Failure modes
                   • Data centre failure
                   • Node failure
                   • Disk failure
                   • Temporary
                   • Permanent

Tuesday, 6 September 2011
Failure modes

                   • Network failure
                    • One node
                    • Network partition
                    • Whole data centre

Tuesday, 6 September 2011
Failure modes

       • Operator failure
        • Delete files
        • Delete entire database
        • Incorrect configuration

Tuesday, 6 September 2011
Failure modes

                   • Want a system that can tolerate all the
                            above failures
                   • Make assumptions about probabilities of
                            multiple events
                            • Be careful when assuming independence

Tuesday, 6 September 2011
Solutions

                   • Do nothing
                   • Make boxes bullet proof
                   • Replication


Tuesday, 6 September 2011
Availability
Tuesday, 6 September 2011
How maintain
                             availability in the
                            presence of failure?


Tuesday, 6 September 2011
Replication

                   • Buy cheap nodes and cheap disks
                   • Store multiple copies of the data
                   • Don’t care if some disappear


Tuesday, 6 September 2011
Replication

                   • What about consistency?
                   • What if I can’t tolerate out-of-date reads?
                   • How restore a replica?


Tuesday, 6 September 2011
RF and CL
                   • Replication factor
                    • How many copies
                    • How much failure can tolerate
                   • Consistency Level
                    • How many nodes must be contactable
                            for operation to succeed


Tuesday, 6 September 2011
Simple example
                   • Replication factor 3
                   • Uniform network topology
                   • Read and write at CL.QUORUM
                    • Strong consistency
                    • Available if any one node is down
                    • Can recover if any two nodes fail
Tuesday, 6 September 2011
In general

                   • RF N, reads and writes at CL.QUORUM
                   • Available if ceil(N/2)-1 nodes fail
                   • Can recover if N-1 nodes fail


Tuesday, 6 September 2011
Multi data centre

                   • Cassandra knows location of hosts
                    • Through the snitch
                   • Can ensure replicas in each DC
                    • NetworkTopologyStrategy
                   • => can cope with whole DC failure

Tuesday, 6 September 2011
Recovery
Tuesday, 6 September 2011
Recovery
                   • Want to maintain replication factor
                   • Ensures recovery guarantees

                   • Methods:
                    • Automatic
                    • Manual
Tuesday, 6 September 2011
Automatic


Tuesday, 6 September 2011
Automatic processes


                   • Eventually moves replicas towards
                            consistency
                   • The ‘eventual’ in ‘eventual consistency’


Tuesday, 6 September 2011
Hinted Handoff
                   • Hints
                    • Stored on any node
                    • When a node is temporarily unavailable
                    • Delivered when the node comes back
                   • Can use CL.ANY
                    • Writes not immediately readable
Tuesday, 6 September 2011
Read Repair


                   • Since done a read, might as well repair any
                            old copies
                   • Compare values, update any out of sync


Tuesday, 6 September 2011
Manual


Tuesday, 6 September 2011
Repair: method
                   • Ensures a node is up to date
                   • Run ‘nodetool -h <node> repair’
                   • Reads through entire data on the node
                   • Builds a Merkel tree
                   • Compares with replicas
                   • Streams differences
Tuesday, 6 September 2011
Repair: when

                   • After node has been down a long time
                   • After increasing replication factor
                   • Every 10 days to ensure tombstones are
                            propagated
                   • Can be used to restore a failed node

Tuesday, 6 September 2011
Replace a node: method

                   • Bootstrap new node with <old_token>-1
                   • Tell existing nodes old node is dead
                    • nodetool remove


Tuesday, 6 September 2011
Replace a node: when

                   • Complete node failure
                   • Cannot replace failed disk
                   • Corruption


Tuesday, 6 September 2011
Restore from backup:
                            method
                   • Stop Cassandra on the node
                   • Copy SSTables from backup
                   • Restart Cassandra
                    • Make take a while reading indexes

Tuesday, 6 September 2011
Restore from backup:
                             when
                   • Disk failure
                    • with no RAID rebuild available
                   • Operator error
                   • Corruption
                   • Hacker

Tuesday, 6 September 2011
Thanks :)
                              www.acunu.com
                            richard@acunu.com
                                  @acunu
                               @richardalow




Tuesday, 6 September 2011

Contenu connexe

En vedette

How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationSchubert Zhang
 

En vedette (8)

How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 

Plus de Acunu

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinAcunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu
 
All Your Base
All Your BaseAll Your Base
All Your BaseAcunu
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraAcunu
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonAcunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with CassandraAcunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraAcunu
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: CassandraAcunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation CassandraAcunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
 

Plus de Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 

Dernier

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Fault Tolerance in Cassandra

  • 1. Fault tolerance in Cassandra Richard Low richard@acunu.com @acunu @richardalow Cassandra London Meetup, 5 Sept 2011 Tuesday, 6 September 2011
  • 2. Menu • Failure modes • Maintaining availability • Recovery Tuesday, 6 September 2011
  • 3. Failure modes Tuesday, 6 September 2011
  • 4. Failures are the norm • With more than a few nodes, something goes wrong all the time • Don’t want to be down all the time Tuesday, 6 September 2011
  • 5. Failure causes • Hardware failure • Bug • Power • Natural disaster Tuesday, 6 September 2011
  • 6. Failure modes • Data centre failure • Node failure • Disk failure Tuesday, 6 September 2011
  • 7. Failure modes • Data centre failure • Node failure • Disk failure • Temporary • Permanent Tuesday, 6 September 2011
  • 8. Failure modes • Network failure • One node • Network partition • Whole data centre Tuesday, 6 September 2011
  • 9. Failure modes • Operator failure • Delete files • Delete entire database • Incorrect configuration Tuesday, 6 September 2011
  • 10. Failure modes • Want a system that can tolerate all the above failures • Make assumptions about probabilities of multiple events • Be careful when assuming independence Tuesday, 6 September 2011
  • 11. Solutions • Do nothing • Make boxes bullet proof • Replication Tuesday, 6 September 2011
  • 13. How maintain availability in the presence of failure? Tuesday, 6 September 2011
  • 14. Replication • Buy cheap nodes and cheap disks • Store multiple copies of the data • Don’t care if some disappear Tuesday, 6 September 2011
  • 15. Replication • What about consistency? • What if I can’t tolerate out-of-date reads? • How restore a replica? Tuesday, 6 September 2011
  • 16. RF and CL • Replication factor • How many copies • How much failure can tolerate • Consistency Level • How many nodes must be contactable for operation to succeed Tuesday, 6 September 2011
  • 17. Simple example • Replication factor 3 • Uniform network topology • Read and write at CL.QUORUM • Strong consistency • Available if any one node is down • Can recover if any two nodes fail Tuesday, 6 September 2011
  • 18. In general • RF N, reads and writes at CL.QUORUM • Available if ceil(N/2)-1 nodes fail • Can recover if N-1 nodes fail Tuesday, 6 September 2011
  • 19. Multi data centre • Cassandra knows location of hosts • Through the snitch • Can ensure replicas in each DC • NetworkTopologyStrategy • => can cope with whole DC failure Tuesday, 6 September 2011
  • 21. Recovery • Want to maintain replication factor • Ensures recovery guarantees • Methods: • Automatic • Manual Tuesday, 6 September 2011
  • 23. Automatic processes • Eventually moves replicas towards consistency • The ‘eventual’ in ‘eventual consistency’ Tuesday, 6 September 2011
  • 24. Hinted Handoff • Hints • Stored on any node • When a node is temporarily unavailable • Delivered when the node comes back • Can use CL.ANY • Writes not immediately readable Tuesday, 6 September 2011
  • 25. Read Repair • Since done a read, might as well repair any old copies • Compare values, update any out of sync Tuesday, 6 September 2011
  • 27. Repair: method • Ensures a node is up to date • Run ‘nodetool -h <node> repair’ • Reads through entire data on the node • Builds a Merkel tree • Compares with replicas • Streams differences Tuesday, 6 September 2011
  • 28. Repair: when • After node has been down a long time • After increasing replication factor • Every 10 days to ensure tombstones are propagated • Can be used to restore a failed node Tuesday, 6 September 2011
  • 29. Replace a node: method • Bootstrap new node with <old_token>-1 • Tell existing nodes old node is dead • nodetool remove Tuesday, 6 September 2011
  • 30. Replace a node: when • Complete node failure • Cannot replace failed disk • Corruption Tuesday, 6 September 2011
  • 31. Restore from backup: method • Stop Cassandra on the node • Copy SSTables from backup • Restart Cassandra • Make take a while reading indexes Tuesday, 6 September 2011
  • 32. Restore from backup: when • Disk failure • with no RAID rebuild available • Operator error • Corruption • Hacker Tuesday, 6 September 2011
  • 33. Thanks :) www.acunu.com richard@acunu.com @acunu @richardalow Tuesday, 6 September 2011