SlideShare une entreprise Scribd logo
1  sur  35
Optimizing the Public Cloud for Cost
and Scalability with Cassandra
Charles Lamanna
Senior Development Lead
@clamanna
Ricardo Villalobos
Senior Cloud Architect
@ricvilla
MetricsHub
keep services up and running for the lowest possible cost
Live Status
Cost Awareness
Alerts and Notifications
Actions and Scaling
$
#CASSANDRA13
growth
2000+ customers in 6 months
0
500
1000
1500
2000
2500
10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/2013
Number of MetricsHub Customers
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/2013
Number of VMs Monitored by
MetricsHub
0
1
2
3
4
5
6
7
8
10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/2013
Number of Metricshub Employees
storing data
200M data points per hour
Planning for huge data ingestion rates
• MetricsHub requires high scale, real-time data:
• 1,000 data points per minute per VM
• 12 data points per endpoint per minute
• 500+ data points per storage account per hour
• Need to aggregate, analyze and take actions based on
this data stream (in near real-time)
• Must be cheap, scalable and reliable
Looked at Redis…
• Perform aggregation in memory (using INCR and other native
operations)
• Flush aggregate data from Redis to persistent storage at a
regular interval
• Is fast, powerful and a good OSS community
… but it was fragile, and expensive for this use
case
• RAM/Memory in the public cloud is *expensive* (but storage is
*cheap*)
• Flushing the data requires complex coordination
• If we did not flush quickly enough – out of memory!
Looked at SQL…
• Create tables for different time windows and granularities
• Roll over from table-to-table (and drop entire tables when
the data expires)
• Update in place (for counters, min, max, etc.) in a reliable
way
… but SQL did not fit
• Higher write than read volume pushed boundaries of the
servers
• Requires complex sharding after just a few dozen new
customers
• Is possible, but not worth the operational cost
Then we tried Cassandra (and
never went back)
• Scales fluidly
• Grows horizontally – double the nodes, double capacity
• Add / remove capacity / nodes with no downtime
• Highly available
• No single point of failure
• Replication factor (i.e. hot copies) is just a config switch
… and by the way
• Little-to-none operations cost
• New nodes take minutes to setup
• Nodes just keep running for months on end
• “Aggregate on write” – no jobs required!
• Atomic distributed counters make it easy to do aggregates on
write
• …and a nice kicker: has *great* perf / COGS in Azure
architecture
68 virtual machines (PAAS and IAAS)
Table Storage
Jobs Worker Role
(24 instances)
SQL Database
Blob storage
Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Web API Web Role
(8 instances)
End User Web
Browsers
Monitored Customer Resources
(e.g. websites; SQL databases)
Monitored Virtual Machines
Endpoints Replicated data
in multiple
datacenters
Clients
PaaS
IaaS
Services
Avoiding state
• Application logic / code all
lives on stateless
machines
• Keeps it simple: decreases
human operations cost
• Use Azure PAAS offerings
(Web and Worker roles)
Table Storage
Jobs Worker Role
(24 instances)
SQL
Database
Blob storage
Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Web API Web Role
(8 instances)
Endpoints Replicated data
in multiple
datacenters
PaaS
Windows Azure Cloud Services
(PAAS)
• Scale horizontally (grew from
1 to 30+ instances)
• Managed by the platform
(patched; coordinated
recycling; failover; etc.)
• 1 click deployment from
Visual Studio (with automatic
load balancer swaps)
Table Storage
SQL
Database
Blob storage
Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Web API Web Role
(8 instances)
Endpoints Replicated data
in multiple
datacenters
Jobs Worker Role
Runs recurring tasks
to pull, generate and
analyze data
Jobs are
synchronized and
scheduled using
Windows Azure
Tables and Queues
Jobs Worker Role
(24 instances)
Table Storage
Jobs Worker Role
(24 instances)
SQL
Database
Blob storage
Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Endpoints Replicated data
in multiple
datacenters
Web API Role
RESTful endpoint for
saving and reading
custom metrics.
Highly
concurrent, secure &
scalable.
Web API Web Role
(8 instances)
Table Storage
Jobs Worker Role
(24 instances)
SQL
Database
Blob storage
Cassandra VM Cluster
(32 XL instances)
Web API Web Role
(8 instances)
Endpoints Replicated data
in multiple
datacenters
Portal Web Role
Interface for our
customers – shows
trends, charts and
issues.
Portal Web Role
(3 instances)
Table Storage
Jobs Worker Role
(24 instances)
SQL
Database
Blob storage
Web API Web Role
(8 instances)
Endpoints Replicated data
in multiple
datacenters
Maintains all
state for metrics /
time series data. Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Cassandra Cluster
Windows Azure Virtual Machines
(IaaS)
Starting Select Image and VM Size New Disk Persisted in Storage
32 nodes, 8 “pods” of 4 nodes
Exposing the pods
• Each pod of 4 nodes
has a single load
balanced endpoint
• Clients (on our
stateless roles) treats
the endpoint as a pool
• Blacklists and skips an
endpoint if it starts
producing a lot of
errors
Where does the data go?
• Data files are on 8 mounted network
backed disks (*not* ephemeral disks)
• Data disks are geo-replicated (3
copies local; 1 remote) for “free” DR
• Azure data disks offer great
throughput (VMs end up CPU bound)
Our Column Families (CQL
3)
CREATE TABLE oneminute (
rk text,
ck text,
cnt counter,
sum counter,
PRIMARY KEY (rk, ck)
);
Updating values…
Realtime “average” values at any granularity, for any time window
update
oneminute/tenminute/oneday
set
sum = sum + {sample_value},
cnt = cnt + 1
where
rk = '{customer_name}' and
ck = '{metric_path}'
Reading values…
*ONE* round trip to fetch a metric over time (e.g. CPU over past
week)
select * from oneminute
where
rk = ‘{customer_name}' and
ck < '{metric_path_start}'
and
ck >= '{metric_path_end}‘
order by ck desc;
What’s next?
• Windows Azure Virtual Networks to connect /
secure all of our resources
(PAAS + IAAS + Services)
• Expand Cassandra cluster across datacenter
boundaries for improved availability
• Integrate with more off-the-shelf Azure
components to reduce operational overhead
Global Physical Infrastructure
servers/network/datacenters
REST API + OTHER SERVICES
compute data management networking
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cassandra - The MetricsHub Story by Charles Lamanna and Ricardo Villalobos

Contenu connexe

Tendances

How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 

Tendances (20)

Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow Scenarios
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Building a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQLBuilding a Machine Learning Recommendation Engine in SQL
Building a Machine Learning Recommendation Engine in SQL
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Overhauling a database engine in 2 months
Overhauling a database engine in 2 monthsOverhauling a database engine in 2 months
Overhauling a database engine in 2 months
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observability
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 

En vedette (7)

oficio
oficiooficio
oficio
 
Konu
KonuKonu
Konu
 
Bank of America Merrill Lynch Health Care Conferene May 16 2013
Bank of America Merrill Lynch Health Care Conferene May 16 2013Bank of America Merrill Lynch Health Care Conferene May 16 2013
Bank of America Merrill Lynch Health Care Conferene May 16 2013
 
Roma ii
Roma  ii Roma  ii
Roma ii
 
01 ler, compreender e interpretar final
01 ler, compreender e interpretar final01 ler, compreender e interpretar final
01 ler, compreender e interpretar final
 
Sólidos platónicos djanyck
Sólidos platónicos djanyckSólidos platónicos djanyck
Sólidos platónicos djanyck
 
Final part 3
Final part 3Final part 3
Final part 3
 

Similaire à C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cassandra - The MetricsHub Story by Charles Lamanna and Ricardo Villalobos

High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
DataStax Academy
 

Similaire à C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cassandra - The MetricsHub Story by Charles Lamanna and Ricardo Villalobos (20)

High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Building RightScale's Globally Distributed Datastore - RightScale Compute 2013
Building RightScale's Globally Distributed Datastore - RightScale Compute 2013Building RightScale's Globally Distributed Datastore - RightScale Compute 2013
Building RightScale's Globally Distributed Datastore - RightScale Compute 2013
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity Couchsurfing
 
04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.Achieve new levels of performance for Magento e-commerce sites.
Achieve new levels of performance for Magento e-commerce sites.
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 

Plus de DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cassandra - The MetricsHub Story by Charles Lamanna and Ricardo Villalobos

  • 1. Optimizing the Public Cloud for Cost and Scalability with Cassandra Charles Lamanna Senior Development Lead @clamanna Ricardo Villalobos Senior Cloud Architect @ricvilla
  • 2. MetricsHub keep services up and running for the lowest possible cost
  • 3. Live Status Cost Awareness Alerts and Notifications Actions and Scaling $ #CASSANDRA13
  • 4.
  • 5.
  • 7. 0 500 1000 1500 2000 2500 10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/2013 Number of MetricsHub Customers
  • 8. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/2013 Number of VMs Monitored by MetricsHub
  • 9. 0 1 2 3 4 5 6 7 8 10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/2013 Number of Metricshub Employees
  • 10. storing data 200M data points per hour
  • 11. Planning for huge data ingestion rates • MetricsHub requires high scale, real-time data: • 1,000 data points per minute per VM • 12 data points per endpoint per minute • 500+ data points per storage account per hour • Need to aggregate, analyze and take actions based on this data stream (in near real-time) • Must be cheap, scalable and reliable
  • 12. Looked at Redis… • Perform aggregation in memory (using INCR and other native operations) • Flush aggregate data from Redis to persistent storage at a regular interval • Is fast, powerful and a good OSS community
  • 13. … but it was fragile, and expensive for this use case • RAM/Memory in the public cloud is *expensive* (but storage is *cheap*) • Flushing the data requires complex coordination • If we did not flush quickly enough – out of memory!
  • 14. Looked at SQL… • Create tables for different time windows and granularities • Roll over from table-to-table (and drop entire tables when the data expires) • Update in place (for counters, min, max, etc.) in a reliable way
  • 15. … but SQL did not fit • Higher write than read volume pushed boundaries of the servers • Requires complex sharding after just a few dozen new customers • Is possible, but not worth the operational cost
  • 16. Then we tried Cassandra (and never went back) • Scales fluidly • Grows horizontally – double the nodes, double capacity • Add / remove capacity / nodes with no downtime • Highly available • No single point of failure • Replication factor (i.e. hot copies) is just a config switch
  • 17. … and by the way • Little-to-none operations cost • New nodes take minutes to setup • Nodes just keep running for months on end • “Aggregate on write” – no jobs required! • Atomic distributed counters make it easy to do aggregates on write • …and a nice kicker: has *great* perf / COGS in Azure
  • 19. Table Storage Jobs Worker Role (24 instances) SQL Database Blob storage Portal Web Role (3 instances) Cassandra VM Cluster (32 XL instances) Web API Web Role (8 instances) End User Web Browsers Monitored Customer Resources (e.g. websites; SQL databases) Monitored Virtual Machines Endpoints Replicated data in multiple datacenters Clients PaaS IaaS Services
  • 20. Avoiding state • Application logic / code all lives on stateless machines • Keeps it simple: decreases human operations cost • Use Azure PAAS offerings (Web and Worker roles) Table Storage Jobs Worker Role (24 instances) SQL Database Blob storage Portal Web Role (3 instances) Cassandra VM Cluster (32 XL instances) Web API Web Role (8 instances) Endpoints Replicated data in multiple datacenters PaaS
  • 21. Windows Azure Cloud Services (PAAS) • Scale horizontally (grew from 1 to 30+ instances) • Managed by the platform (patched; coordinated recycling; failover; etc.) • 1 click deployment from Visual Studio (with automatic load balancer swaps)
  • 22. Table Storage SQL Database Blob storage Portal Web Role (3 instances) Cassandra VM Cluster (32 XL instances) Web API Web Role (8 instances) Endpoints Replicated data in multiple datacenters Jobs Worker Role Runs recurring tasks to pull, generate and analyze data Jobs are synchronized and scheduled using Windows Azure Tables and Queues Jobs Worker Role (24 instances)
  • 23. Table Storage Jobs Worker Role (24 instances) SQL Database Blob storage Portal Web Role (3 instances) Cassandra VM Cluster (32 XL instances) Endpoints Replicated data in multiple datacenters Web API Role RESTful endpoint for saving and reading custom metrics. Highly concurrent, secure & scalable. Web API Web Role (8 instances)
  • 24. Table Storage Jobs Worker Role (24 instances) SQL Database Blob storage Cassandra VM Cluster (32 XL instances) Web API Web Role (8 instances) Endpoints Replicated data in multiple datacenters Portal Web Role Interface for our customers – shows trends, charts and issues. Portal Web Role (3 instances)
  • 25. Table Storage Jobs Worker Role (24 instances) SQL Database Blob storage Web API Web Role (8 instances) Endpoints Replicated data in multiple datacenters Maintains all state for metrics / time series data. Portal Web Role (3 instances) Cassandra VM Cluster (32 XL instances) Cassandra Cluster
  • 26. Windows Azure Virtual Machines (IaaS) Starting Select Image and VM Size New Disk Persisted in Storage
  • 27. 32 nodes, 8 “pods” of 4 nodes
  • 28. Exposing the pods • Each pod of 4 nodes has a single load balanced endpoint • Clients (on our stateless roles) treats the endpoint as a pool • Blacklists and skips an endpoint if it starts producing a lot of errors
  • 29. Where does the data go? • Data files are on 8 mounted network backed disks (*not* ephemeral disks) • Data disks are geo-replicated (3 copies local; 1 remote) for “free” DR • Azure data disks offer great throughput (VMs end up CPU bound)
  • 30. Our Column Families (CQL 3) CREATE TABLE oneminute ( rk text, ck text, cnt counter, sum counter, PRIMARY KEY (rk, ck) );
  • 31. Updating values… Realtime “average” values at any granularity, for any time window update oneminute/tenminute/oneday set sum = sum + {sample_value}, cnt = cnt + 1 where rk = '{customer_name}' and ck = '{metric_path}'
  • 32. Reading values… *ONE* round trip to fetch a metric over time (e.g. CPU over past week) select * from oneminute where rk = ‘{customer_name}' and ck < '{metric_path_start}' and ck >= '{metric_path_end}‘ order by ck desc;
  • 33. What’s next? • Windows Azure Virtual Networks to connect / secure all of our resources (PAAS + IAAS + Services) • Expand Cassandra cluster across datacenter boundaries for improved availability • Integrate with more off-the-shelf Azure components to reduce operational overhead
  • 34. Global Physical Infrastructure servers/network/datacenters REST API + OTHER SERVICES compute data management networking

Notes de l'éditeur

  1. All state is maintained in Cassandra or SQL
  2. Examples: Ping customer endpoint; pull load balancer stats; identify if a VM set is overloadedHuge scale and highly reliable framework (10s of thousands of jobs; no downtime)All jobs are isolated by task (e.g. ping URL) and customerCommunicates with Cassandra using FluentCassandra (.NET)Requests round robin balanced over 8 endpointsData stream is massive (100k writes / sec) and needs to be resilient
  3. Integrates with other partner services (e.g. Windows Azure store)Used by MetricsHub client agents (on customer machines)Based on .NET (C#) WebAPIsPersists all customer data (writes) to Cassandra only
  4. .NET based using MVC + IISHeavy use of jQuery / javascript on the client side 15+ OSS components are used in the portalBundled &amp; shipped 1-click deployment Updated our production portal several times a day
  5. FluentcassandraAll reads / writes for metric data go to this cluster; no need for a cache40+ VMs connect to this cluster