SlideShare a Scribd company logo
1 of 44
Download to read offline
Cassandra 101
Introduction to Apache Cassandra
What is Cassandra?
● A distributed, columnar database
● Data model inspired by Google BigTable (2006)
● Distribution model inspired by Amazon Dynamo (2007)
● Open Sourced by Facebook in 2008
● Monolithic Kernel written in Java
● Used by Digg, Facebook, Twitter, Reddit, Rackspace,
CloudKick and others
Etymology
● In Greek mythology Cassandra (Also known as Alexandra) was
the daughter of King Priam and Queen Hecuba of Troy
● Her beauty caused Apollo to grant her the gift of prophecy
● When she did not return his love, Apollo placed a curse on her
so that no one would ever believe her predictions
Why Cassandra ?
● Minimal Administration
● No Single Point of Failure
● Scale Horizontally
● Writes are durable
● Optimized for writes
● Consistency is flexible, can be updated
online
● Schema is flexible, can be updated online
● Handles failure gracefully
● Replication is easy, Rack and DC aware
Commercial Support
Data Model
A Column is the basic unit consisting Key, Value and Timestamp
Data Model
A Column is the basic unit consisting Key, Value and Timestamp
RDBMS vs Cassandra
Map<RowKey, SortedMap<ColumnKey,
ColumnValue>>
Cassandra is good at
Reading data from a row in
the order it is stored, i.e. by
Column Name!
Understand the queries you
application requires before
building the data model
Consistent Hashing
Load Balancing in a changing world ...
● Evenly map keys to nodes
● Minimize key movement when
nodes join or leave
The Partitioner:
● RandomPartitioner transforms
Keys to Tokens using MD5
● In C* 1.2 the default hashing is
Murmur3 algorithm
Keys and Tokens?
0 999010
‘fop’ ‘foo’
MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b
MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00
Token Ring.
99 0
‘fop’
token: 10‘foo’
token: 90
Token Ranges (Pre 1.2)
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
Token Ranges With Virtual Nodes in 1.2
Node 1
Node 2
Node 3
● Easier to Enlarge or
shrink the cluster
● The cluster can grow in
steps of 1 node
● Node Recovery is much
more faster
Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
Selects Replication Factor number of nodes
for a row.
Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
SimpleStrategy with RF 3
Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
NetworkTopolgyStrategy Uses Replication Factor
per Data Center
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
EAST WEST
SimpleSnitch
Places all nodes in the same DC & RACK
(Default)
EC2Snitch/EC2MultiRegionSnitch
DC is set to AWS Region and a Rack to
Availability Zone
PropertyFileSnitch
Nodes DC and Racks are maintained in a
property file
GossipPropertyFileSnitch
Uses GOSSIP as first source for node info and
if not available it uses the property file
The Client and the Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’
token 90
Client
Multi DC Client and Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’
token 90
Client
Node 10
Node 20
Gossip
Nodes share information with
small number of neighbours,
who share information with
other small number of
neighbours …
● Used for intra-cluster
communication
● Routes client requests
● Detects nodes failure
● Peers are called by seeds in
config file.
Cassandra Objects
● CommitLog
● MemTable
● SSTable
● Index
● Bloom Filter
Consistency
● CAP theorem
○ Trade consistency for availability
○ Consistency is a choice
* it doesn't matter if you are good at somethings long as you are consistent.
Partition
Consistency
Availability
OR
Level Description
ZERO Cross fingers
ANY 1st to Respond (HH)
ONE, TWO, THREE 1st to Respond
QUORUM N/2+1 replicas
ALL All replicas
WRITE
Level Description
ZERO N/A
ANY N/A
ONE, TWO, THREE nth to Respond
QUORUM* N/2+1
ALL All replicas
READ
Consistency Level
● Specifies for each request
● Number of nodes to wait for
* QUORUM, LOCAL_QUORUM, EACH_QUOROM
Write ‘foo’ at Quorum with Hinted
Handoff
Node 1
Node 3 is
Down
Node 4 holds
‘foo’ for node 3
Node 2
‘foo’
token 90
Client
Read ‘foo’ at Quorum
Node 1
Node 3 is
Down
Node 4 holds
‘foo’ for node 3
Node 2
‘foo’
token 90
Client
Are used to resolve differences
● Stored for each Column Value
● 64bit Integers
Column Node 1 Node 2 Node 3
Vegetable ‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
<missing>
Fruit ‘Apple’
(timestamp 10)
‘banana’
(timestamp 15)
‘Apple’
(timestamp 10)
Column TimeStamps
Strong Consistency
W + R > N
#Write Nodes + #Read Nodes > Replication Factor
● QUORUM Read + QUORUM Write
● ALL Read + ONE Write
● ONE Read + ALL Write
Achieving Consistency
● Consistency Level
● Hinted Handoff
● Read Repair
● Anti Entropy (User triggered Repairs)
Write Path
● Append to Commit Log File
● Merge Columns into Memtable
● Asynchronously flush Memtabe to a
new file (Never update existing files)
● Data is stored in immutable files called
SSTables (Sorted String Tables)
SSTables Files
*-Data.db
*-Index.db
*-Filter.db
(And others)
Read Path
Bloom Filter (cache)
Index/Key Cache
Memory
SStable-1.Data.db
foo:
fruit (ts:10)
apple
vegetable (ts:15)
cucumber
….
….
….
SSTable-1-Index.db
Disk
Bloom Filter (cache)
Index/Key Cache
SStable-2.Data.db
foo:
fruit (ts:10)
apple
vegetable (ts:10)
Pepper
….
….
….
SSTable-2-Index.db
Bloom Filter Bloom Filter
Compactions
Compactions merges truth from multiple
SSTables into one SSTable with the same
truth
(Manual and continuous background process)
Column SSTable 1 SStable 2 New
Vegetable ‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
Fruit ‘Apple’
(timestamp 10)
<tombstone>
(timestamp 15)
<tombstone>
(timestamp: 15)
Writes and Reads
Managing Cassandra
● Single configuration file
/etc/cassandra/cassandra.yaml
file
● Single control command
/usr/bin/nodetool
● Monitoring done by DataStax OpsCenter
Troubleshooting Cassandra
Always inspect these files:
● /var/log/cassandra/cassandra.log (Startup)
● /var/log/cassandra/system.log (Normal work)
Backup
Use Cassandra snapshots...
And God said to Noah, Noah make me a backup ... 'cause I shall format
Client (API) Choices
● Thrift, original and still fully supported API:
○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera…
○ Python: Pycassa, Telephus, …
○ Ruby: Fauna
○ PHP: PHP Client Library
○ C#
○ Node.JS
○ GO
○ SImba ODBC
○ C++: LibQtCassandra
○ ORM
○ ….
● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL
CQL3 Create KeySpace
● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh):
● Create a new Keyspace with Replication factor of 3 and NetworkTopology
CREATE KEYSPACE
kenshoo_cass_fans
WITH replication =
{‘class’:’NetworkTopologyStrategy’,
‘us_east_dc’:3};
CQL3 Working with Tables
● CQL3 Example
● Table is a sparse collection of well known ordered columns
CREATE TABLE User
(
user_name text,
password text,
real_name text,
PRIMARY KEY (user_name)
);
---------------------------------------------------------
INSERT INTO User
(user_name, password, real_name)
VALUES
(‘nader’,’sekr8t’,’MR NADER’);
---------------------------------------------------------
SELECT * From User where user_name = ‘NADER’;
user_name| password | real_name
---------+----------+-----------
nader| sekr8t | MR NADER

More Related Content

What's hot

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Scaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekScaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekRedis Labs
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Percona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesPercona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesFrederic Descamps
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query languageCourtney Robinson
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflixVinay Kumar Chella
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basicsnickmbailey
 

What's hot (20)

Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Scaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekScaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane Paek
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra
CassandraCassandra
Cassandra
 
Percona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL ArchitecturesPercona Live 2022 - MySQL Architectures
Percona Live 2022 - MySQL Architectures
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 

Similar to Cassandra 101

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2aaronmorton
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010aaronmorton
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache CassandraAlex Thompson
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandrashimi_k
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 

Similar to Cassandra 101 (20)

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 

Recently uploaded

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Cassandra 101

  • 1. Cassandra 101 Introduction to Apache Cassandra
  • 2. What is Cassandra? ● A distributed, columnar database ● Data model inspired by Google BigTable (2006) ● Distribution model inspired by Amazon Dynamo (2007) ● Open Sourced by Facebook in 2008 ● Monolithic Kernel written in Java ● Used by Digg, Facebook, Twitter, Reddit, Rackspace, CloudKick and others
  • 3. Etymology ● In Greek mythology Cassandra (Also known as Alexandra) was the daughter of King Priam and Queen Hecuba of Troy ● Her beauty caused Apollo to grant her the gift of prophecy ● When she did not return his love, Apollo placed a curse on her so that no one would ever believe her predictions
  • 4. Why Cassandra ? ● Minimal Administration ● No Single Point of Failure ● Scale Horizontally ● Writes are durable ● Optimized for writes ● Consistency is flexible, can be updated online ● Schema is flexible, can be updated online ● Handles failure gracefully ● Replication is easy, Rack and DC aware
  • 6. Data Model A Column is the basic unit consisting Key, Value and Timestamp
  • 7. Data Model A Column is the basic unit consisting Key, Value and Timestamp
  • 8. RDBMS vs Cassandra Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
  • 9. Cassandra is good at Reading data from a row in the order it is stored, i.e. by Column Name! Understand the queries you application requires before building the data model
  • 10. Consistent Hashing Load Balancing in a changing world ... ● Evenly map keys to nodes ● Minimize key movement when nodes join or leave
  • 11. The Partitioner: ● RandomPartitioner transforms Keys to Tokens using MD5 ● In C* 1.2 the default hashing is Murmur3 algorithm
  • 12. Keys and Tokens? 0 999010 ‘fop’ ‘foo’ MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00
  • 13. Token Ring. 99 0 ‘fop’ token: 10‘foo’ token: 90
  • 14. Token Ranges (Pre 1.2) Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90
  • 15. Token Ranges With Virtual Nodes in 1.2 Node 1 Node 2 Node 3 ● Easier to Enlarge or shrink the cluster ● The cluster can grow in steps of 1 node ● Node Recovery is much more faster
  • 16. Replication Strategy Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 Selects Replication Factor number of nodes for a row.
  • 17. Replication Strategy Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 SimpleStrategy with RF 3
  • 18. Replication Strategy Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 NetworkTopolgyStrategy Uses Replication Factor per Data Center Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 EAST WEST
  • 19. SimpleSnitch Places all nodes in the same DC & RACK (Default)
  • 20. EC2Snitch/EC2MultiRegionSnitch DC is set to AWS Region and a Rack to Availability Zone
  • 21. PropertyFileSnitch Nodes DC and Racks are maintained in a property file
  • 22. GossipPropertyFileSnitch Uses GOSSIP as first source for node info and if not available it uses the property file
  • 23. The Client and the Coordinator Node 1 Node 3 Node 4 Node 2 ‘foo’ token 90 Client
  • 24. Multi DC Client and Coordinator Node 1 Node 3 Node 4 Node 2 ‘foo’ token 90 Client Node 10 Node 20
  • 25. Gossip Nodes share information with small number of neighbours, who share information with other small number of neighbours … ● Used for intra-cluster communication ● Routes client requests ● Detects nodes failure ● Peers are called by seeds in config file.
  • 26. Cassandra Objects ● CommitLog ● MemTable ● SSTable ● Index ● Bloom Filter
  • 27. Consistency ● CAP theorem ○ Trade consistency for availability ○ Consistency is a choice * it doesn't matter if you are good at somethings long as you are consistent. Partition Consistency Availability OR
  • 28. Level Description ZERO Cross fingers ANY 1st to Respond (HH) ONE, TWO, THREE 1st to Respond QUORUM N/2+1 replicas ALL All replicas WRITE Level Description ZERO N/A ANY N/A ONE, TWO, THREE nth to Respond QUORUM* N/2+1 ALL All replicas READ Consistency Level ● Specifies for each request ● Number of nodes to wait for * QUORUM, LOCAL_QUORUM, EACH_QUOROM
  • 29. Write ‘foo’ at Quorum with Hinted Handoff Node 1 Node 3 is Down Node 4 holds ‘foo’ for node 3 Node 2 ‘foo’ token 90 Client
  • 30. Read ‘foo’ at Quorum Node 1 Node 3 is Down Node 4 holds ‘foo’ for node 3 Node 2 ‘foo’ token 90 Client
  • 31. Are used to resolve differences ● Stored for each Column Value ● 64bit Integers Column Node 1 Node 2 Node 3 Vegetable ‘cucumber’ (timestamp 10) ‘cucumber’ (timestamp 10) <missing> Fruit ‘Apple’ (timestamp 10) ‘banana’ (timestamp 15) ‘Apple’ (timestamp 10) Column TimeStamps
  • 32. Strong Consistency W + R > N #Write Nodes + #Read Nodes > Replication Factor ● QUORUM Read + QUORUM Write ● ALL Read + ONE Write ● ONE Read + ALL Write
  • 33. Achieving Consistency ● Consistency Level ● Hinted Handoff ● Read Repair ● Anti Entropy (User triggered Repairs)
  • 34. Write Path ● Append to Commit Log File ● Merge Columns into Memtable ● Asynchronously flush Memtabe to a new file (Never update existing files) ● Data is stored in immutable files called SSTables (Sorted String Tables)
  • 36. Read Path Bloom Filter (cache) Index/Key Cache Memory SStable-1.Data.db foo: fruit (ts:10) apple vegetable (ts:15) cucumber …. …. …. SSTable-1-Index.db Disk Bloom Filter (cache) Index/Key Cache SStable-2.Data.db foo: fruit (ts:10) apple vegetable (ts:10) Pepper …. …. …. SSTable-2-Index.db Bloom Filter Bloom Filter
  • 37. Compactions Compactions merges truth from multiple SSTables into one SSTable with the same truth (Manual and continuous background process) Column SSTable 1 SStable 2 New Vegetable ‘cucumber’ (timestamp 10) ‘cucumber’ (timestamp 10) ‘cucumber’ (timestamp 10) Fruit ‘Apple’ (timestamp 10) <tombstone> (timestamp 15) <tombstone> (timestamp: 15)
  • 39. Managing Cassandra ● Single configuration file /etc/cassandra/cassandra.yaml file ● Single control command /usr/bin/nodetool ● Monitoring done by DataStax OpsCenter
  • 40. Troubleshooting Cassandra Always inspect these files: ● /var/log/cassandra/cassandra.log (Startup) ● /var/log/cassandra/system.log (Normal work)
  • 41. Backup Use Cassandra snapshots... And God said to Noah, Noah make me a backup ... 'cause I shall format
  • 42. Client (API) Choices ● Thrift, original and still fully supported API: ○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera… ○ Python: Pycassa, Telephus, … ○ Ruby: Fauna ○ PHP: PHP Client Library ○ C# ○ Node.JS ○ GO ○ SImba ODBC ○ C++: LibQtCassandra ○ ORM ○ …. ● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL
  • 43. CQL3 Create KeySpace ● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh): ● Create a new Keyspace with Replication factor of 3 and NetworkTopology CREATE KEYSPACE kenshoo_cass_fans WITH replication = {‘class’:’NetworkTopologyStrategy’, ‘us_east_dc’:3};
  • 44. CQL3 Working with Tables ● CQL3 Example ● Table is a sparse collection of well known ordered columns CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name) ); --------------------------------------------------------- INSERT INTO User (user_name, password, real_name) VALUES (‘nader’,’sekr8t’,’MR NADER’); --------------------------------------------------------- SELECT * From User where user_name = ‘NADER’; user_name| password | real_name ---------+----------+----------- nader| sekr8t | MR NADER