SlideShare a Scribd company logo
1 of 36
Running Cassandra on Apache Mesos
across multiple datacenters at Uber
Abhishek Verma (verma@uber.com)
About me
● MS (2010) and PhD (2012) in Computer Science from University of Illinois
at Urbana-Champaign
● 2 years at Google, worked on Borg and Omega and first author of the
Borg paper
● ~ 1 year at TCS Research, Mumbai
● Currently at Uber working on running Cassandra on Mesos
© DataStax, All Rights Reserved. 2
“Transportation as reliable as running water,
everywhere, for everyone”
“Transportation as reliable as running
water, everywhere, for everyone”
99.99%
“Transportation as reliable as running water,
everywhere, for everyone”
efficient
Cluster Management @ Uber
● Statically partitioned machines across different services
● Move from custom deployment system to everything running on Mesos
● Gain efficiency by increasing machine utilization
○ Co-locate services on the same machine
○ Can lead to 30% fewer machines1
● Build stateful service frameworks to run on Mesos
© DataStax, All Rights Reserved. 6
“Large-scale cluster management at Google with Borg”, EuroSys 2015
Apache Mesos
7
● Mesos abstracts CPU, memory, storage away from machines
○ program like it’s a single pool of resources
● Linear scalability
● High availability
● Native support for launching containers
● Pluggable resource isolation
● Two level scheduling
Apache Cassandra
8
● Horizontal scalability
○ Scales reads and writes linearly as new nodes are added
● High availability
○ Fault tolerant with tunable consistency levels
● Low latency, solid performance
● Operational simplicity
○ Homogeneous cluster, no SPOF
● Rich data model
Uber
● Abhishek Verma
● Karthik Gandhi
● Matthias Eichstaedt
● Varun Gupta
● Zhitao Li
DC/OS Cassandra Service
9
Mesosphere
● Chris Lambert
● Gabriel Hartmann
● Keith Chambers
● Kenneth Owens
● Mohit Soni
https://github.com/mesosphere/dcos-cassandra-service
Cassandra service architecture
10
Framework
dcos-cassandra-service
Mesos agent
Mesos master
(Leader)
Web interface
Control plane API
C*Cluster 1 C*Cluster 2
Aurora (DC1)
Mesos master
(Standby)
C*Node
1a
C*Node
2a
Mesos agent
C*Node
1b
C*Node
2b
Mesos agent
C*Node
1c
Aurora (DC2)
Deployment system
DC2
ZK ZK
ZK
ZooKeeper
quorum
Client App
uses CQL
interface
CQL CQL CQL CQL CQL
. . .
Cassandra Mesos primitives
11
● Mesos containerizer
● Override 5 ports in configuration (storage_port,
ssl_storage_port, native_transport_port, rpc_port, jmx_port)
● Use persistent volumes
○ Data stored outside of the sandbox directory
○ Offered to the same task if it crashes and restarts
● Use dynamic reservation
Custom seed provider
12
Node 1
10.0.0.1
http://scheduler/seeds
{
isSeed: true
seeds: [ ]
}
Node 1
10.0.0.1
Node 2
10.0.0.2
Node 3
10.0.0.3
Node 2
10.0.0.2
{
isSeed: true
seeds: [ 10.0.0.1]
}
{
isSeed: false
seeds: [ 10.0.0.1,
10.0.0.2]
}
Node 3
10.0.0.3
Number of Nodes = 3
Number of Seeds = 2
Cassandra Service: Features
13
● Custom seed provider
● Increasing cluster size
● Changing Cassandra configuration
● Replacing a dead node
● Backup/Restore
● Cleanup
● Repair
Plan, Phases and Blocks
14
● Plan
○ Phases
■ Reconciliation
■ Deployment
■ Backup
■ Restore
■ Cleanup
■ Repair
Spinning up a new Cassandra cluster
15
https://www.youtube.com/watch?v=gbYmjtDKSzs
Automate Cassandra operations
16
● Repair
○ Synchronize all data across replicas
■ Last write wins
○ Anti-entropy mechanism
○ Repair primary key range node-by-node
● Cleanup
○ Remove data whose ownership has changed
■ Because of addition or removal of nodes
Cleanup operation
17
https://www.youtube.com/watch?v=VxRLSl8MpYI
Failure scenarios
18
● Executor failure
○ Restarted automatically
● Cassandra daemon failure
○ Restarted automatically
● Node failure
○ Manual REST endpoint to replace node
● Scheduling framework failure
○ Existing nodes keep running, new nodes cannot be added
Experiments
19
Cluster startup
20
For each node in the cluster:
1.Receive and accept offer
2.Launch task
3.Fetch executor, JRE, Cassandra binaries from S3/HDFS
4.Launch executor
5.Launch Cassandra daemon
6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL
Cluster startup time
21
Framework can start ~ one new node per minute
Tuning JVM Garbage collection
22
Changed from CMS to G1 garbage collector
Left: https://github.com/apache/cassandra/blob/cassandra-2.2/conf/cassandra-env.sh#L213
Right: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_tune_jvm_c.html?scroll=concept_ds_sv5_k4w_dk__tuning-java-garbage-collection
Tuning JVM Garbage collection
23
Metric CMS G1
G1 : CMS
Factor
op rate 1951 13765 7.06
latency mean (ms) 3.6 0.4 9.00
latency median (ms) 0.3 0.3 1.00
latency 95th percentile (ms) 0.6 0.4 1.50
latency 99th percentile (ms) 1 0.5 2.00
latency 99.9th percentile (ms) 11.6 0.7 16.57
latency max (ms) 13496.9 4626.9 2.92
G1 garbage collector is much better without any tuning
Using cassandra-stress, 32 threads client
Cluster Setup
24
● 3 nodes
● Local DC
● 24 cores, 128 GB RAM, 2TB SAS drives
● Cassandra running on bare metal
● Cassandra running in a Mesos container
Bare metal Mesos
Read Latency
25
Mean: 0.38 ms
P95: 0.74 ms
P99: 0.91 ms
Mean: 0.44 ms
P95: 0.76 ms
P99: 0.98 ms
Bare metal Mesos
Read Throughput
26
Bare metal Mesos
Write Latency
27
Mean: 0.43 ms
P95: 0.94 ms
P99: 1.05 ms
Mean: 0.48 ms
P95: 0.93 ms
P99: 1.26 ms
Bare metal Mesos
Write Throughput
28
Running across datacenters
29
● Four datacenters
○ Each running dcos-cassandra-service instance
○ Sync datacenter phase
■ Periodically exchange seeds with external dcs
● Cassandra nodes gossip topology
○ Discover nodes in other datacenters
Asynchronous cross-dc replication latency
30
● Write a row to dc1 using consistency level LOCAL_ONE
○ Write timestamp to a file when operation completed
● Spin in a loop to read the same row using consistency LOCAL_ONE in dc2
○ Write timestamp to a file when operation completed
● Difference between the two gives asynchronous replication latency
○ p50 : 44.69ms, p95 : 46.38ms, p99:47.44ms
● Round trip ping latency
○ 77.8ms
Cassandra on Mesos in Production
31
● ~20 clusters replicating across two datacenters (west and east coast)
● ~300 machines across two datacenters
● Largest 2 clusters: more than a million writes/sec and ~100k reads/sec
● Mean read latency: 13ms and write latency: 25ms
● Mostly use LOCAL_QUORUM consistency level
Questions?
32
verma@uber.com
Cluster startup
33
For each node in the cluster:
1.Receive and accept offer
2.Launch task
3.Fetch executor, JRE, Cassandra binaries from S3/HDFS
4.Launch executor
5.Launch Cassandra daemon
6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL
Aurora hogging offers
Aurora hogs offers
34
● Aurora designed to be the only framework running on Mesos and
controlling all the machines
● Holds on to all received offers
○ Does not accept or reject them
● Mesos waits for --offer_timeout time duration and rescinds offer
● --offer_timeout config
○ Duration of time before an offer is rescinded from a framework. This helps fairness when
running frameworks that hold on to offers, or frameworks that accidentally drop offers. If
not set, offers do not timeout.
Long term solution: dynamic reservations
35
● Dynamically reserve all the machines resources to the “cassandra”
role
● Resources are offered only to cassandra frameworks
● Improves node startup time: 30s/node
● Node failure replacement or updates are much faster
Using the Cassandra cluster
36
https://www.youtube.com/watch?v=qgqO39DteHo

More Related Content

What's hot

Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Web Services Korea
 

What's hot (20)

MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptx
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Storage Basics
Storage BasicsStorage Basics
Storage Basics
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
VMware vSAN - Novosco, June 2017
VMware vSAN - Novosco, June 2017VMware vSAN - Novosco, June 2017
VMware vSAN - Novosco, June 2017
 
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
Amazon Redshift 아키텍처 및 모범사례::김민성::AWS Summit Seoul 2018
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 

Viewers also liked

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 

Viewers also liked (11)

Building Real-Time Applications with Android and WebSockets
Building Real-Time Applications with Android and WebSocketsBuilding Real-Time Applications with Android and WebSockets
Building Real-Time Applications with Android and WebSockets
 
Just Add Reality: Managing Logistics with the Uber Developer Platform
Just Add Reality: Managing Logistics with the Uber Developer PlatformJust Add Reality: Managing Logistics with the Uber Developer Platform
Just Add Reality: Managing Logistics with the Uber Developer Platform
 
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal..."Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
 
Taxi Startup Presentation for Taxi Company
Taxi Startup Presentation for Taxi CompanyTaxi Startup Presentation for Taxi Company
Taxi Startup Presentation for Taxi Company
 
Open-source Infrastructure at Lyft
Open-source Infrastructure at LyftOpen-source Infrastructure at Lyft
Open-source Infrastructure at Lyft
 
Uber's new mobile architecture
Uber's new mobile architectureUber's new mobile architecture
Uber's new mobile architecture
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek
31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek
31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Culture
CultureCulture
Culture
 

Similar to Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* Summit 2016

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
NETWAYS
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
Yulian Slobodyan
 

Similar to Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* Summit 2016 (20)

From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container service
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Container orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platformsContainer orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platforms
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
 
Mosix Cluster
Mosix ClusterMosix Cluster
Mosix Cluster
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Enhancing and Preparing TIMES for High Performance Computing
Enhancing and Preparing TIMES for High Performance ComputingEnhancing and Preparing TIMES for High Performance Computing
Enhancing and Preparing TIMES for High Performance Computing
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* Summit 2016

  • 1. Running Cassandra on Apache Mesos across multiple datacenters at Uber Abhishek Verma (verma@uber.com)
  • 2. About me ● MS (2010) and PhD (2012) in Computer Science from University of Illinois at Urbana-Champaign ● 2 years at Google, worked on Borg and Omega and first author of the Borg paper ● ~ 1 year at TCS Research, Mumbai ● Currently at Uber working on running Cassandra on Mesos © DataStax, All Rights Reserved. 2
  • 3. “Transportation as reliable as running water, everywhere, for everyone”
  • 4. “Transportation as reliable as running water, everywhere, for everyone” 99.99%
  • 5. “Transportation as reliable as running water, everywhere, for everyone” efficient
  • 6. Cluster Management @ Uber ● Statically partitioned machines across different services ● Move from custom deployment system to everything running on Mesos ● Gain efficiency by increasing machine utilization ○ Co-locate services on the same machine ○ Can lead to 30% fewer machines1 ● Build stateful service frameworks to run on Mesos © DataStax, All Rights Reserved. 6 “Large-scale cluster management at Google with Borg”, EuroSys 2015
  • 7. Apache Mesos 7 ● Mesos abstracts CPU, memory, storage away from machines ○ program like it’s a single pool of resources ● Linear scalability ● High availability ● Native support for launching containers ● Pluggable resource isolation ● Two level scheduling
  • 8. Apache Cassandra 8 ● Horizontal scalability ○ Scales reads and writes linearly as new nodes are added ● High availability ○ Fault tolerant with tunable consistency levels ● Low latency, solid performance ● Operational simplicity ○ Homogeneous cluster, no SPOF ● Rich data model
  • 9. Uber ● Abhishek Verma ● Karthik Gandhi ● Matthias Eichstaedt ● Varun Gupta ● Zhitao Li DC/OS Cassandra Service 9 Mesosphere ● Chris Lambert ● Gabriel Hartmann ● Keith Chambers ● Kenneth Owens ● Mohit Soni https://github.com/mesosphere/dcos-cassandra-service
  • 10. Cassandra service architecture 10 Framework dcos-cassandra-service Mesos agent Mesos master (Leader) Web interface Control plane API C*Cluster 1 C*Cluster 2 Aurora (DC1) Mesos master (Standby) C*Node 1a C*Node 2a Mesos agent C*Node 1b C*Node 2b Mesos agent C*Node 1c Aurora (DC2) Deployment system DC2 ZK ZK ZK ZooKeeper quorum Client App uses CQL interface CQL CQL CQL CQL CQL . . .
  • 11. Cassandra Mesos primitives 11 ● Mesos containerizer ● Override 5 ports in configuration (storage_port, ssl_storage_port, native_transport_port, rpc_port, jmx_port) ● Use persistent volumes ○ Data stored outside of the sandbox directory ○ Offered to the same task if it crashes and restarts ● Use dynamic reservation
  • 12. Custom seed provider 12 Node 1 10.0.0.1 http://scheduler/seeds { isSeed: true seeds: [ ] } Node 1 10.0.0.1 Node 2 10.0.0.2 Node 3 10.0.0.3 Node 2 10.0.0.2 { isSeed: true seeds: [ 10.0.0.1] } { isSeed: false seeds: [ 10.0.0.1, 10.0.0.2] } Node 3 10.0.0.3 Number of Nodes = 3 Number of Seeds = 2
  • 13. Cassandra Service: Features 13 ● Custom seed provider ● Increasing cluster size ● Changing Cassandra configuration ● Replacing a dead node ● Backup/Restore ● Cleanup ● Repair
  • 14. Plan, Phases and Blocks 14 ● Plan ○ Phases ■ Reconciliation ■ Deployment ■ Backup ■ Restore ■ Cleanup ■ Repair
  • 15. Spinning up a new Cassandra cluster 15 https://www.youtube.com/watch?v=gbYmjtDKSzs
  • 16. Automate Cassandra operations 16 ● Repair ○ Synchronize all data across replicas ■ Last write wins ○ Anti-entropy mechanism ○ Repair primary key range node-by-node ● Cleanup ○ Remove data whose ownership has changed ■ Because of addition or removal of nodes
  • 18. Failure scenarios 18 ● Executor failure ○ Restarted automatically ● Cassandra daemon failure ○ Restarted automatically ● Node failure ○ Manual REST endpoint to replace node ● Scheduling framework failure ○ Existing nodes keep running, new nodes cannot be added
  • 20. Cluster startup 20 For each node in the cluster: 1.Receive and accept offer 2.Launch task 3.Fetch executor, JRE, Cassandra binaries from S3/HDFS 4.Launch executor 5.Launch Cassandra daemon 6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL
  • 21. Cluster startup time 21 Framework can start ~ one new node per minute
  • 22. Tuning JVM Garbage collection 22 Changed from CMS to G1 garbage collector Left: https://github.com/apache/cassandra/blob/cassandra-2.2/conf/cassandra-env.sh#L213 Right: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_tune_jvm_c.html?scroll=concept_ds_sv5_k4w_dk__tuning-java-garbage-collection
  • 23. Tuning JVM Garbage collection 23 Metric CMS G1 G1 : CMS Factor op rate 1951 13765 7.06 latency mean (ms) 3.6 0.4 9.00 latency median (ms) 0.3 0.3 1.00 latency 95th percentile (ms) 0.6 0.4 1.50 latency 99th percentile (ms) 1 0.5 2.00 latency 99.9th percentile (ms) 11.6 0.7 16.57 latency max (ms) 13496.9 4626.9 2.92 G1 garbage collector is much better without any tuning Using cassandra-stress, 32 threads client
  • 24. Cluster Setup 24 ● 3 nodes ● Local DC ● 24 cores, 128 GB RAM, 2TB SAS drives ● Cassandra running on bare metal ● Cassandra running in a Mesos container
  • 25. Bare metal Mesos Read Latency 25 Mean: 0.38 ms P95: 0.74 ms P99: 0.91 ms Mean: 0.44 ms P95: 0.76 ms P99: 0.98 ms
  • 26. Bare metal Mesos Read Throughput 26
  • 27. Bare metal Mesos Write Latency 27 Mean: 0.43 ms P95: 0.94 ms P99: 1.05 ms Mean: 0.48 ms P95: 0.93 ms P99: 1.26 ms
  • 28. Bare metal Mesos Write Throughput 28
  • 29. Running across datacenters 29 ● Four datacenters ○ Each running dcos-cassandra-service instance ○ Sync datacenter phase ■ Periodically exchange seeds with external dcs ● Cassandra nodes gossip topology ○ Discover nodes in other datacenters
  • 30. Asynchronous cross-dc replication latency 30 ● Write a row to dc1 using consistency level LOCAL_ONE ○ Write timestamp to a file when operation completed ● Spin in a loop to read the same row using consistency LOCAL_ONE in dc2 ○ Write timestamp to a file when operation completed ● Difference between the two gives asynchronous replication latency ○ p50 : 44.69ms, p95 : 46.38ms, p99:47.44ms ● Round trip ping latency ○ 77.8ms
  • 31. Cassandra on Mesos in Production 31 ● ~20 clusters replicating across two datacenters (west and east coast) ● ~300 machines across two datacenters ● Largest 2 clusters: more than a million writes/sec and ~100k reads/sec ● Mean read latency: 13ms and write latency: 25ms ● Mostly use LOCAL_QUORUM consistency level
  • 33. Cluster startup 33 For each node in the cluster: 1.Receive and accept offer 2.Launch task 3.Fetch executor, JRE, Cassandra binaries from S3/HDFS 4.Launch executor 5.Launch Cassandra daemon 6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL Aurora hogging offers
  • 34. Aurora hogs offers 34 ● Aurora designed to be the only framework running on Mesos and controlling all the machines ● Holds on to all received offers ○ Does not accept or reject them ● Mesos waits for --offer_timeout time duration and rescinds offer ● --offer_timeout config ○ Duration of time before an offer is rescinded from a framework. This helps fairness when running frameworks that hold on to offers, or frameworks that accidentally drop offers. If not set, offers do not timeout.
  • 35. Long term solution: dynamic reservations 35 ● Dynamically reserve all the machines resources to the “cassandra” role ● Resources are offered only to cassandra frameworks ● Improves node startup time: 30s/node ● Node failure replacement or updates are much faster
  • 36. Using the Cassandra cluster 36 https://www.youtube.com/watch?v=qgqO39DteHo