SlideShare a Scribd company logo
1 of 47
Download to read offline
ABOUT NETFLIX!
NETFLIX
ACTIVE - ACTIVE!
WHAT IS ACTIVE ACTIVE? 
Also called dual active, it is a phrase used to describe a 
network of independent processing nodes where each node has 
access to replicated database. Traffic intended for a failed node 
is either passed onto an existing node or load balanced across 
the remaining nodes.
WHY ACTIVE-ACTIVE ?! 
ENTERPISE IT 
SOLUTIONS 
WEB SCALE 
CLOUD 
SOLUTIONS 
RAPID 
SCALING 
HIGH 
AVAILABILITY
DOES AN INSTANCE FAIL?! 
• It can, plan for it! 
• Bad code / configuration pushes! 
• Latent issues! 
• Hardware failure! 
• Test with Chaos Monkey!
DOES A ZONE FAIL?! 
• Rarely, but happened before! 
• Routing issues! 
• DC-specific issues! 
• App-specific issues within a zone! 
• Test with Chaos Gorilla!
DOES A REGION FAIL?! 
• Full region – unlikely, very rare! 
• Individual Services can fail region-wide! 
• Most likely, a region-wide configuration issues! 
• Test with Chaos Kong!
EVERYTHING FAILS… EVENTUALLY! 
• Keep your services running by embracing isolation and 
redundancy! 
• Construct a highly agile and highly available service 
from ephemeral and assumed broken components!
ISOLATION! 
• Changes in one region should not affect others! 
• Regional outage should not affect others! 
• Network partitioning between regions should not affect 
functionality / operations!
REDUNDANCY! 
• Make more than one (of pretty much everything)! 
• Specifically, distribute services across Availability 
Zones and regions!
HISTORY: X-MAS EVE 2012! 
• Netflix multi-hour outage! 
• US-East1 regional Elastic Load Balancing issue! 
! 
• “...data was deleted by a maintenance process 
that was inadvertently run against the 
production ELB state data”!
ACTIVE-ACTIVE ARCHITECTURE!
THE PROCESS!
IDENTIFYING CLUSTERS FOR AA 
!
SNITCH CHANGES! 
EC2Snitch! EC2MultiRegionSnitch! 
Uses Private IPs! Uses Public IPs!
PRIAM.MULTIREGION.ENABLE =TRUE! 
storage_port : Using Private IPs! 
ssl_storage_port : Using Public IPs!
SPIN UP NODES IN NEW REGION! 
us-east-1! us-west-2! 
APP
UPDATE KEYSPACE! 
Update keyspace <keyspace> with placement_strategy = 
'NetworkTopologyStrategy'! 
and strategy_options = {us-east : 3, us-west-2 : 3};! 
Existing region and replication factor ! New region and replication factor!
REBUILD NEW REGION 
Run – nodetool rebuild us-east-1 on all us-west-2 nodes
RUN NODETOOL REPAIR
VALIDATION!
BENCHMARKING GLOBAL CASSANDRA 
WRITE INTENSIVE TEST OF CROSS-REGION REPLICATION 
CAPACITY 
16 X HI1.4XLARGE SSD NODES PER ZONE = 96 TOTAL 
192 TB OF SSD IN SIX LOCATIONS UP AND RUNNING 
CASSANDRA IN 20 MINUTES! 
US-West-2 Region - Oregon 
Zone A 
Cassandra Replicas 
Zone B 
Cassandra Replicas 
Zone C 
Cassandra Replicas 
US-East-1 Region - Virginia 
Zone A 
Cassandra Replicas 
Zone B 
Cassandra Replicas 
Zone C 
Cassandra Replicas 
Test 
Load 
Test 
Load 
Validation 
Load 
Interzone Traffic 1 Million Writes! 
CL.ONE (Wait for One 
Replica to ack)! 
1 Million Reads! 
after 500 ms! 
CL.ONE with No! 
Data Loss! 
Interregional Traffic! 
Up to 9Gbits/s, 83ms! 18 TB backups 
from S3
TEST FOR THUNDERING HERD!
TEST FOR RETRIES! 
FAILURE 
RETRY
KEY METRICS USED! 
• 99 /95 th Read Latency (Client & C*)! 
• Dropped Metrics on C*! 
• Exceptions on C*! 
• Heap Usage on C*! 
• Threads Pending on C*!
CONFIGURATION FOR TEST! 
• 24 Node C* SSDs! 
• 220 Client instances! 
• 70+ Jmeter Instances!
C* IOPS
TOTAL READ IOPS 
TOTAL WRITE IOPS
95th LATENCY 
99th LATENCY
CHECK FOR CEILING!
NETWORK PARTITION! 
us-east-1 us-west-2
TAKEAWAYS!
REPAIRS AFTER EXTENSION ARE PAINFUL !!!
TIME TO REPAIR DEPENDS ON! 
• Number of regions! 
• Number of replicas! 
• Data size! 
• Amount of entropy! 
!
ADJUST GC_GRACE AFTER 
EXTENSION! 
• Column Family Setting! 
• Defined in seconds! 
• Default 10 days! 
• Tweak gc_grace settings to 
accommodate time taken to repair! 
• BEWARE of deleted columns!
RUNBOOK!
PLAN FOR CAPACITY!
CONSISTENCY LEVEL ! 
• Check the client for consistency level setting! 
• In a Multiregional cluster QUORUM <> 
LOCAL_QUORUM! 
• Recommended consistency levels 
LOCAL_ONE (CASSANDRA-6202) for reads 
and LOCAL_QUORUM for writes! 
• For region resiliency avoid – ALL or 
QUORUM calls!
CREATE CHAOS!!! 
HOW DO WE KNOW IT WORKS?
Benchmark …! 
! 
Time Consuming ! 
! 
But worth it!!
Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes

More Related Content

What's hot

Laporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQL
Laporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQLLaporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQL
Laporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQL
Shofura Kamal
 
Bab 7 multiplexing
Bab 7 multiplexingBab 7 multiplexing
Bab 7 multiplexing
brilorabbit
 
Database minimarket-Garnis Q
Database minimarket-Garnis QDatabase minimarket-Garnis Q
Database minimarket-Garnis Q
G Nis
 
operasi arithematik
operasi arithematik operasi arithematik
operasi arithematik
Lela Warni
 

What's hot (20)

Laporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQL
Laporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQLLaporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQL
Laporan Praktikum Basis Data Modul I-Membangun Database SQL Pada MYSQL
 
Jurnal RC4,RC5,RC6
Jurnal RC4,RC5,RC6Jurnal RC4,RC5,RC6
Jurnal RC4,RC5,RC6
 
manajemen memori
manajemen memorimanajemen memori
manajemen memori
 
Metode transportasi
Metode transportasiMetode transportasi
Metode transportasi
 
Nine step methodology
Nine step methodologyNine step methodology
Nine step methodology
 
Project Charter
Project CharterProject Charter
Project Charter
 
Bab 7 multiplexing
Bab 7 multiplexingBab 7 multiplexing
Bab 7 multiplexing
 
6. konfigurasi jaringan
6. konfigurasi jaringan6. konfigurasi jaringan
6. konfigurasi jaringan
 
Project charter
Project charterProject charter
Project charter
 
Presentasi mengenai steam motor
Presentasi mengenai steam motorPresentasi mengenai steam motor
Presentasi mengenai steam motor
 
Database minimarket-Garnis Q
Database minimarket-Garnis QDatabase minimarket-Garnis Q
Database minimarket-Garnis Q
 
PKM K
PKM KPKM K
PKM K
 
Analisis ERD Database Rumah Sakit
Analisis ERD Database Rumah SakitAnalisis ERD Database Rumah Sakit
Analisis ERD Database Rumah Sakit
 
OLAP
OLAPOLAP
OLAP
 
Laporan Keuangan Toko/Minimarket
Laporan Keuangan Toko/MinimarketLaporan Keuangan Toko/Minimarket
Laporan Keuangan Toko/Minimarket
 
Kebutuhan Data Warehouse
Kebutuhan Data WarehouseKebutuhan Data Warehouse
Kebutuhan Data Warehouse
 
20. implementasi data mining pada penjualan produk elektronik dengan algoritm...
20. implementasi data mining pada penjualan produk elektronik dengan algoritm...20. implementasi data mining pada penjualan produk elektronik dengan algoritm...
20. implementasi data mining pada penjualan produk elektronik dengan algoritm...
 
operasi arithematik
operasi arithematik operasi arithematik
operasi arithematik
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Tugas simbad
Tugas simbadTugas simbad
Tugas simbad
 

Viewers also liked

NoSQL with Cassandra
NoSQL with CassandraNoSQL with Cassandra
NoSQL with Cassandra
Gasol Wu
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
DataWorks Summit
 

Viewers also liked (17)

Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra Clusters
 
Cassandra datamodel
Cassandra datamodelCassandra datamodel
Cassandra datamodel
 
NoSQL with Cassandra
NoSQL with CassandraNoSQL with Cassandra
NoSQL with Cassandra
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based Replication
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra model
Cassandra modelCassandra model
Cassandra model
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard design
 

Similar to Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes

VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
Eonblast
 
Coates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceCoates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substance
BOSC 2010
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 

Similar to Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes (20)

Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 
Coates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substanceCoates bosc2010 clouds-fluff-and-no-substance
Coates bosc2010 clouds-fluff-and-no-substance
 
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
AWS re:Invent 2016: From Resilience to Ubiquity - #NetflixEverywhere Global A...
 
HA and DR for Cloud Workloads
HA and DR for Cloud WorkloadsHA and DR for Cloud Workloads
HA and DR for Cloud Workloads
 
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
The Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel AvivThe Pace of Innovation - Pop-up Loft Tel Aviv
The Pace of Innovation - Pop-up Loft Tel Aviv
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
 
OpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureOpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking Architecture
 
Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...Moving to software-based production workflows and containerisation of media a...
Moving to software-based production workflows and containerisation of media a...
 
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloudLAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
LAN, WAN, SAN upgrades: hyperconverged vs traditional vs cloud
 
Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Cassandra Summit 2014: Active-Active Cassandra Behind the Scenes

  • 1.
  • 4.
  • 6. WHAT IS ACTIVE ACTIVE? Also called dual active, it is a phrase used to describe a network of independent processing nodes where each node has access to replicated database. Traffic intended for a failed node is either passed onto an existing node or load balanced across the remaining nodes.
  • 7. WHY ACTIVE-ACTIVE ?! ENTERPISE IT SOLUTIONS WEB SCALE CLOUD SOLUTIONS RAPID SCALING HIGH AVAILABILITY
  • 8. DOES AN INSTANCE FAIL?! • It can, plan for it! • Bad code / configuration pushes! • Latent issues! • Hardware failure! • Test with Chaos Monkey!
  • 9. DOES A ZONE FAIL?! • Rarely, but happened before! • Routing issues! • DC-specific issues! • App-specific issues within a zone! • Test with Chaos Gorilla!
  • 10. DOES A REGION FAIL?! • Full region – unlikely, very rare! • Individual Services can fail region-wide! • Most likely, a region-wide configuration issues! • Test with Chaos Kong!
  • 11. EVERYTHING FAILS… EVENTUALLY! • Keep your services running by embracing isolation and redundancy! • Construct a highly agile and highly available service from ephemeral and assumed broken components!
  • 12. ISOLATION! • Changes in one region should not affect others! • Regional outage should not affect others! • Network partitioning between regions should not affect functionality / operations!
  • 13. REDUNDANCY! • Make more than one (of pretty much everything)! • Specifically, distribute services across Availability Zones and regions!
  • 14. HISTORY: X-MAS EVE 2012! • Netflix multi-hour outage! • US-East1 regional Elastic Load Balancing issue! ! • “...data was deleted by a maintenance process that was inadvertently run against the production ELB state data”!
  • 16.
  • 17.
  • 20. SNITCH CHANGES! EC2Snitch! EC2MultiRegionSnitch! Uses Private IPs! Uses Public IPs!
  • 21. PRIAM.MULTIREGION.ENABLE =TRUE! storage_port : Using Private IPs! ssl_storage_port : Using Public IPs!
  • 22.
  • 23. SPIN UP NODES IN NEW REGION! us-east-1! us-west-2! APP
  • 24. UPDATE KEYSPACE! Update keyspace <keyspace> with placement_strategy = 'NetworkTopologyStrategy'! and strategy_options = {us-east : 3, us-west-2 : 3};! Existing region and replication factor ! New region and replication factor!
  • 25. REBUILD NEW REGION Run – nodetool rebuild us-east-1 on all us-west-2 nodes
  • 28. BENCHMARKING GLOBAL CASSANDRA WRITE INTENSIVE TEST OF CROSS-REGION REPLICATION CAPACITY 16 X HI1.4XLARGE SSD NODES PER ZONE = 96 TOTAL 192 TB OF SSD IN SIX LOCATIONS UP AND RUNNING CASSANDRA IN 20 MINUTES! US-West-2 Region - Oregon Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C Cassandra Replicas US-East-1 Region - Virginia Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C Cassandra Replicas Test Load Test Load Validation Load Interzone Traffic 1 Million Writes! CL.ONE (Wait for One Replica to ack)! 1 Million Reads! after 500 ms! CL.ONE with No! Data Loss! Interregional Traffic! Up to 9Gbits/s, 83ms! 18 TB backups from S3
  • 30. TEST FOR RETRIES! FAILURE RETRY
  • 31. KEY METRICS USED! • 99 /95 th Read Latency (Client & C*)! • Dropped Metrics on C*! • Exceptions on C*! • Heap Usage on C*! • Threads Pending on C*!
  • 32. CONFIGURATION FOR TEST! • 24 Node C* SSDs! • 220 Client instances! • 70+ Jmeter Instances!
  • 34. TOTAL READ IOPS TOTAL WRITE IOPS
  • 35. 95th LATENCY 99th LATENCY
  • 39. REPAIRS AFTER EXTENSION ARE PAINFUL !!!
  • 40. TIME TO REPAIR DEPENDS ON! • Number of regions! • Number of replicas! • Data size! • Amount of entropy! !
  • 41. ADJUST GC_GRACE AFTER EXTENSION! • Column Family Setting! • Defined in seconds! • Default 10 days! • Tweak gc_grace settings to accommodate time taken to repair! • BEWARE of deleted columns!
  • 44. CONSISTENCY LEVEL ! • Check the client for consistency level setting! • In a Multiregional cluster QUORUM <> LOCAL_QUORUM! • Recommended consistency levels LOCAL_ONE (CASSANDRA-6202) for reads and LOCAL_QUORUM for writes! • For region resiliency avoid – ALL or QUORUM calls!
  • 45. CREATE CHAOS!!! HOW DO WE KNOW IT WORKS?
  • 46. Benchmark …! ! Time Consuming ! ! But worth it!!