SlideShare une entreprise Scribd logo
1  sur  31
Cassandra & Puppet: Scaling data at $15/month Constant Contact March  2011 Dave Connors – VP Operations Jim Ancona – Systems Architect Mark Schena – Manager Systems Automation
Constant Contact Constant Contact 2000 – 2010   Market leader for Small Businesses ,[object Object]
Over 400k paying customers
No. 134 on the Deloitte Technology Fast 500 listingBusiness model ,[object Object]
~2 million database transactions per minute,[object Object]
Constant Contact  Small Businesses are looking to us for help with Social Media marketing ,[object Object]
Challenge with our business model,[object Object]
Cost = Low
Time to market = ?,[object Object]
Monitoring
Authentication
Logging
Risk profile
Roles & Responsibilities,[object Object]
Apache Cassandra Apache Cassandra ,[object Object]
Open sourced in 2008
Incubated at Apache
Became an Apache top-level project in 2010
http://cassandra.apache.org
In use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, …
Largest production cluster has over 100 TB of data in over 150 machines,[object Object]
Fault Tolerant
Elastic
Durable
Rich data model
Replicated data
Consistency options,[object Object]
Consistency LevelONE Consistency Level One Y
Consistency Level Quorum X
Risks and Mitigation Risks and Mitigation ,[object Object]
Developer unfamiliarity

Contenu connexe

Tendances

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
RX-M Enterprises LLC
 
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
Redis Labs
 

Tendances (20)

How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
 
Apache Cassandra - Einführung
Apache Cassandra - EinführungApache Cassandra - Einführung
Apache Cassandra - Einführung
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Practical Design Patterns in Docker Networking
Practical Design Patterns in Docker NetworkingPractical Design Patterns in Docker Networking
Practical Design Patterns in Docker Networking
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Microservices with Kafka Ecosystem
Microservices with Kafka EcosystemMicroservices with Kafka Ecosystem
Microservices with Kafka Ecosystem
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Scaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ssScaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ss
 
Apache CloudStack from API to UI
Apache CloudStack from API to UIApache CloudStack from API to UI
Apache CloudStack from API to UI
 
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 

En vedette

Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
Aerospike, Inc.
 
Kai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoKai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s Dynamo
Takeru INOUE
 

En vedette (20)

NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
 
Seattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / CassandraSeattle Scalability - GigaSpaces / Cassandra
Seattle Scalability - GigaSpaces / Cassandra
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Best Practices for couchDB developers on Microsoft Azure
Best Practices for couchDB developers on Microsoft AzureBest Practices for couchDB developers on Microsoft Azure
Best Practices for couchDB developers on Microsoft Azure
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
 
Kai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s DynamoKai – An Open Source Implementation of Amazon’s Dynamo
Kai – An Open Source Implementation of Amazon’s Dynamo
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
 
SpringPeople Introduction to Agile and Scrum
SpringPeople Introduction to Agile and ScrumSpringPeople Introduction to Agile and Scrum
SpringPeople Introduction to Agile and Scrum
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Sparksee Technology overview
Sparksee Technology overviewSparksee Technology overview
Sparksee Technology overview
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
Cassandra as an event sourced journal for big data analytics Cassandra Summit...
Cassandra as an event sourced journal for big data analytics Cassandra Summit...Cassandra as an event sourced journal for big data analytics Cassandra Summit...
Cassandra as an event sourced journal for big data analytics Cassandra Summit...
 
Sparksee overview
Sparksee overviewSparksee overview
Sparksee overview
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 

Similaire à Cassandra & puppet, scaling data at $15 per month

Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
Katie Gulley
 
Nhibernate Part 1
Nhibernate   Part 1Nhibernate   Part 1
Nhibernate Part 1
guest075fec
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
renjan131
 

Similaire à Cassandra & puppet, scaling data at $15 per month (20)

Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
 
Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementPython and Oracle : allies for best of data management
Python and Oracle : allies for best of data management
 
Advanced Web Development
Advanced Web DevelopmentAdvanced Web Development
Advanced Web Development
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
The "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/OpsThe "Holy Grail" of Dev/Ops
The "Holy Grail" of Dev/Ops
 
Nhibernate Part 1
Nhibernate   Part 1Nhibernate   Part 1
Nhibernate Part 1
 
No more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and AzureNo more Three Tier - A path to a better code for Cloud and Azure
No more Three Tier - A path to a better code for Cloud and Azure
 
Building Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceBuilding Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and Salesforce
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019Introduction to Puppet Enterprise - Jan 30, 2019
Introduction to Puppet Enterprise - Jan 30, 2019
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
 
Ben ford intro
Ben ford introBen ford intro
Ben ford intro
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Cassandra & puppet, scaling data at $15 per month

Notes de l'éditeur

  1. “… a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable'sColumnFamily-based data model.”
  2. Operational attributesFault TolerantData is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.DecentralizedEvery node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.ElasticRead and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.DurableCassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.Not just key-value. Replication and Consistency are configurable and datacenter aware
  3. Can also be configured cross-datacenter.
  4. Consistency Level is tunableONEQUORUMALLAt level ONE, one copy makes it to disk synchronously, before caller returns success.Same with reads. One node is read, so can get old data
  5. At QUORUM, two of three (QUORUM) written before returningQUORUM Read: Quorum must AGREEFirst two don’t, so wait for the third node to resolve the tie
  6. RISKS0.7.x in beta when we started, multiple betas and RCsRDBMS best practices are understood, if they exist for Cassandra how to discover them?Complicated system, lots of knobs, how to tune themMITIGATIONSWe deployed 0.7.2 across 72 servers in 90 minutes two days before we went live. 0.7.3 any day nowMailing lists, read code, file bugsA little about DatastaxStarted with a small app, higher risk ones come later, Mark will talk about monitoring—we have hundreds of graphs
  7. Data model: no joins, referential integrity or fixed schema. If you want those things, you have to code them.Rows with millions of columnsThrift: driver-level interface, doesn’t do things an application-level client should do, e.g. failover, retry
  8. Contributed bug reports and patches to Cassandra and HectorIncorporated in follow-on releasesNo need to maintain our own fork
  9. Mirror modeShort timeoutsLog errors DB2 is still database of record
  10. Necessary for success
  11. Authentication and Authorization; test basic functionaiity; rapidly deploy new changes