SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
Apache Cassandra:
Sharding and Consistency
€ whoami
● Federico Razzoli
● Freelance consultant
● Writing SQL since MySQL 2.23
hello@federico-razzoli.com
● I love open source, sharing,
Collaboration, win-win, etc
● I love MariaDB, MySQL, Postgres, etc
○ Even Db2, somehow
What Cassandra is for
● Get data by a Partitioning Key, ordered by a Clustering Key (primary key)
SELECT * FROM conf
WHERE year = 2019
ORDER BY attendees
● Simple aggregations
SELECT COUNT(*) FROM conf WHERE year = 2019
● No JOINs
● No filtering by other columns (by default)
● Different queries on same data? Multiple tables with different primary keys
● Data is eventually mostly consistent
CREATE TABLE
-- a database by any other name would contain as sweet tables
CREATE KEYSPACE stats
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
-- london dc has 2 copies of each row on different nodes,
-- paris dc has 3
'datacenter_london': 2,
'datacenter_paris': 3
};
CREATE TABLE stats.conf (
name TEXT,
year SMALLINT,
attendees SMALLINT,
-- rows with same year are in the same nodes,
-- ordered by attendees
PRIMARY KEY (year, attendees)
) WITH CLUSTERING ORDER BY (attendees DESC) ;
Cassandra Cluster
● Clients can read from / write to any
node
● The contacted node acts as a
coordinator
● The Coord. Computes a hash of the
Partitioning Key, which determines
which node(s) own the rows
● The coordinator sends / requests data
to the closest owner(s)
● Replication is asynchronous
Consitency levels
● Consistency vs Speed
● Read / Write
● Default consistency level is ONE:
○ You write to a node and don’t wait for the change to be replicated
○ You read from one node possibly stale data
● TWO, THREE, QUORUM
○ Writes and results must be validated by 2, 3 or the majority of nodes
● With multiple datacenters, any of these levels can be very expensive
Consitency levels: local and paranoid
● Faster:
○ LOCAL_ONE
○ LOCAL_QUORUM
○ If connection between datacentres breaks, data will be stale
● More paranoid:
○ EACH_QUORUM
○ ALL
○ Avoid inconsistencies in any dc
Cost of strictness
● Stricter consistency levels mean:
○ More communications between nodes, higher latency
○ Query may fail if nodes crash or connection loss
Interesting inconsistencies
● 2 UPDATEs on the same row/column?
○ The latest wins
○ Who decides which is latest? Sometimes, network/node latency :)
● Rows are DELETEd on a the node and another is crashed?
○ That node may not delete those rows
● You DELETE a row and immediately after I UPDATE it on the same node?
○ Row may not be DELETEd
One second left, no time to explain :D
Thank you kindly!
https://federico-razzoli.com
info@federico-razzoli.com

Contenu connexe

Tendances

Tendances (20)

ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Tips on High Performance Server Programming
Tips on High Performance Server ProgrammingTips on High Performance Server Programming
Tips on High Performance Server Programming
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 

Similaire à Cassandra sharding and consistency (lightning talk)

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 

Similaire à Cassandra sharding and consistency (lightning talk) (20)

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra1.2
Cassandra1.2Cassandra1.2
Cassandra1.2
 
MariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database ProxyMariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database Proxy
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
M|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureM|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database Infrastructure
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache Cassandra
 
MariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale:  an Intelligent Database ProxyMariaDB MaxScale:  an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database Proxy
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
Cassandra
CassandraCassandra
Cassandra
 

Plus de Federico Razzoli

Plus de Federico Razzoli (20)

Webinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBWebinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDB
 
MariaDB Security Best Practices
MariaDB Security Best PracticesMariaDB Security Best Practices
MariaDB Security Best Practices
 
A first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use themA first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use them
 
MariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improvedMariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improved
 
Webinar - MariaDB Temporal Tables: a demonstration
Webinar - MariaDB Temporal Tables: a demonstrationWebinar - MariaDB Temporal Tables: a demonstration
Webinar - MariaDB Temporal Tables: a demonstration
 
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Recent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeRecent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy life
 
Advanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfAdvanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdf
 
Automate MariaDB Galera clusters deployments with Ansible
Automate MariaDB Galera clusters deployments with AnsibleAutomate MariaDB Galera clusters deployments with Ansible
Automate MariaDB Galera clusters deployments with Ansible
 
Creating Vagrant development machines with MariaDB
Creating Vagrant development machines with MariaDBCreating Vagrant development machines with MariaDB
Creating Vagrant development machines with MariaDB
 
MariaDB, MySQL and Ansible: automating database infrastructures
MariaDB, MySQL and Ansible: automating database infrastructuresMariaDB, MySQL and Ansible: automating database infrastructures
MariaDB, MySQL and Ansible: automating database infrastructures
 
Playing with the CONNECT storage engine
Playing with the CONNECT storage enginePlaying with the CONNECT storage engine
Playing with the CONNECT storage engine
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
 
Database Design most common pitfalls
Database Design most common pitfallsDatabase Design most common pitfalls
Database Design most common pitfalls
 
MySQL and MariaDB Backups
MySQL and MariaDB BackupsMySQL and MariaDB Backups
MySQL and MariaDB Backups
 
JSON in MySQL and MariaDB Databases
JSON in MySQL and MariaDB DatabasesJSON in MySQL and MariaDB Databases
JSON in MySQL and MariaDB Databases
 
How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2
 
MySQL Transaction Isolation Levels (lightning talk)
MySQL Transaction Isolation Levels (lightning talk)MySQL Transaction Isolation Levels (lightning talk)
MySQL Transaction Isolation Levels (lightning talk)
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
 

Dernier

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Dernier (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Cassandra sharding and consistency (lightning talk)

  • 2. € whoami ● Federico Razzoli ● Freelance consultant ● Writing SQL since MySQL 2.23 hello@federico-razzoli.com ● I love open source, sharing, Collaboration, win-win, etc ● I love MariaDB, MySQL, Postgres, etc ○ Even Db2, somehow
  • 3. What Cassandra is for ● Get data by a Partitioning Key, ordered by a Clustering Key (primary key) SELECT * FROM conf WHERE year = 2019 ORDER BY attendees ● Simple aggregations SELECT COUNT(*) FROM conf WHERE year = 2019 ● No JOINs ● No filtering by other columns (by default) ● Different queries on same data? Multiple tables with different primary keys ● Data is eventually mostly consistent
  • 4. CREATE TABLE -- a database by any other name would contain as sweet tables CREATE KEYSPACE stats WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', -- london dc has 2 copies of each row on different nodes, -- paris dc has 3 'datacenter_london': 2, 'datacenter_paris': 3 }; CREATE TABLE stats.conf ( name TEXT, year SMALLINT, attendees SMALLINT, -- rows with same year are in the same nodes, -- ordered by attendees PRIMARY KEY (year, attendees) ) WITH CLUSTERING ORDER BY (attendees DESC) ;
  • 5. Cassandra Cluster ● Clients can read from / write to any node ● The contacted node acts as a coordinator ● The Coord. Computes a hash of the Partitioning Key, which determines which node(s) own the rows ● The coordinator sends / requests data to the closest owner(s) ● Replication is asynchronous
  • 6. Consitency levels ● Consistency vs Speed ● Read / Write ● Default consistency level is ONE: ○ You write to a node and don’t wait for the change to be replicated ○ You read from one node possibly stale data ● TWO, THREE, QUORUM ○ Writes and results must be validated by 2, 3 or the majority of nodes ● With multiple datacenters, any of these levels can be very expensive
  • 7. Consitency levels: local and paranoid ● Faster: ○ LOCAL_ONE ○ LOCAL_QUORUM ○ If connection between datacentres breaks, data will be stale ● More paranoid: ○ EACH_QUORUM ○ ALL ○ Avoid inconsistencies in any dc
  • 8. Cost of strictness ● Stricter consistency levels mean: ○ More communications between nodes, higher latency ○ Query may fail if nodes crash or connection loss
  • 9. Interesting inconsistencies ● 2 UPDATEs on the same row/column? ○ The latest wins ○ Who decides which is latest? Sometimes, network/node latency :) ● Rows are DELETEd on a the node and another is crashed? ○ That node may not delete those rows ● You DELETE a row and immediately after I UPDATE it on the same node? ○ Row may not be DELETEd One second left, no time to explain :D