SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE
HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH
#SDF16
Cassandra for DBAs
Ulises Fasoli – Senior Consultant Trivadis Lausanne AD
#SDF16
Programme
1. NoSQL Landscape
2. What is Apache Cassandra?
3. Cassandra Architecture
4. Data Distribution & Replication
5. Cassandra Data Model
6. Cassandra Write path
7. Cassandra Read path
8. Tools
9. Last thoughts and conclusion
#SDF16
NoSQL Landscape
#SDF16
NoSQL Landscape : Type of databases
Type Example Description
Key-Value
keys map to arbitrary values of any
data type
Wide Column
keys mapped to sets of n-number of
typed columns
Document
document sets (JSON) queryable in
whole or part
Graph
data elements each relate to n others
in a graph/network
#SDF16
Brewer's CAP Theorem
• Consistency
do you get identical results, regardless which node
is queried?
• Availability
can the cluster respond to very high write and read
volumes?
• Network Partition tolerance is the cluster still
available when part of it goes dark?
Availability
Consistency
Network
Partition
Tolerance
n/a
CA CP
AP
Any networked shared-data system can have at most two of the three desirable properties:
#SDF16
What is Apache Cassandra?
#SDF16
Cassandra
 Fully distributed, with no single point of failure
 Free and open source, with deep developer support
 Highly performant, with near-linear horizontal scaling in proper
use cases
Bigtable Dynamo
#SDF16
Use cases for Cassandra
 Product Catalog / Playlists
 Personalization
 Ads / Recommendations
 Fraud Detection
 Time Series
 IoT / Sensor Data
 Graph / Network data
#SDF16
Cassandra Architecture
#SDF16
Architecture overview
 Designed with the understanding that system/hardware failures can and do occur
 Peer-to-peer, distributed system
 All nodes are identical in the cluster
 Data partitioned among all nodes in the cluster
 Custom data replication to ensure fault tolerance
 Read/Write-anywhere design
#SDF16
What is a cluster?
 A peer to peer set of nodes
• Node – one Cassandra instance
• Rack – a logical set of nodes
• Data Center – a logical set of racks
• Cluster – the full set of nodes which map to a single complete
token ring
Node 4
Node 1
Node 2
Data Center - East
Node 1
Node 3
Node 4
Node 3
Data Center - West
Rack 1
Rack 2
Node 2
Rack 1
Rack 2
Cassandra
Cluster
- 263+ 263
Token Range
(Murmur3)
#SDF16
What is a cluster?
Node 1
Node 3
Node 2Node 4
127.0.0.1
127.0.0.2
127.0.0.3
127.0.0.4
Seed
 Nodes join a cluster based on the configuration of
their own conf/cassandra.yaml file
 Some key settings :
• cluster_name
• seeds
• listen_address
#SDF16
What is a coordinator?
 The node chosen by the client to receive a
particular read or write request to its cluster
 Any node can coordinate any request
 Each client request may be coordinated by a
different node
 No single point of failure
 Fundamental principle to Cassandra's architecture
Node 1
Node 3
Node 2Node 4
Client Driver
#SDF16
Data Distribution & Replication
#SDF16
Data Partitioning & distribution
 Nodes are logically structured in Ring Topology.
 Each node is responsible for a part of the overall database
 Data is assigned to a specific node based on a hashed value of key
 Lightly loaded nodes can move position to alleviate highly loaded nodes
#SDF16
Data Partitioning
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)
#SDF16
Data Replication
 Defined at keyspace level
o Replication factor : how many replicas to make
o Replication strategy : on which node should each replica be
placed
 All partitions are "replicas", there are no "originals"
 First replica : placed on the node owning its token's primary
range
#SDF16
Data Replication / Distribution
 Native data replication / distribution support
 Transparently handled by Cassandra
 Multi-data center capable
 Hybrid Cloud/On premise support
#SDF16
What is consistency ?
Node 1
Node 3
Node 2Node 4
Client Driver
Just one?
CL=ONE
Two?
CL=TWO
51%?
CL=QUORUM
 Partition key determines which nodes are
sent any given request
• Consistency Level : how many nodes must
acknowledge before response is sent
 The meaning varies by type
• Write request – how many nodes
must acknowledge the write?
• Read request – how many nodes
must acknowledge by sending their
most recent copy of the data?
#SDF16
What is immediate vs. eventual consistency?
 Immediate Consistency – reads always return the most recent data
• Immediate consistency guaranteed with Consistency Level ALL
• Highest latency (all replicas are checked and compared)
 Eventual Consistency – reads may return stale data
• Consistency Level ONE carries the highest risk of stale data
• Lowest latency (first replica is immediately returned)
ANY ALLONE TWO . . . .
0 Total Nodes (N)1 2
Available Replicas
Consistency Level
Read repairs are there to prevent entropy
#SDF16
Cassandra Data Model
#SDF16
Cassandra Data Model
 The Cassandra data model defines
1. Column family as a way to store and organize data
2. Table as a two-dimensional view of a multi-dimensional column family
3. Cassandra Query Language (CQL) : A language to perform operations
on tables
#SDF16
Cassandra Data Model
Keyspace
Column Family Column Family
#SDF16
What is a column family
row
key3
v3.a
cola
v3.b
colb
v3.c
colc
v1.a
cola
v1.b
colb
v1.c
colc
v2.a
cola
v2.b
colb
v2.c
colc
v3.d
cold
v1.d
cold
v2.d
cold
COLUMNS
ROWS
row
key1
row
key2
CELLS
 Column family – set of rows with a similar structure
• Sorted columns
• Multidimensional
• Distributed
• Sparse
#SDF16
What are row, row key, column key, and column value?
• Rows – individual rows constitute a column family
• Row key – uniquely identifies a row in a column family
• Row – stores pairs of column keys and column values
• Column key – uniquely identifies a column value in a row
• Column value – stores one value or a collection of values
row
key
va
cola
vb
colb
vc
colc
vd
cold
Column keys (or column names)Row
Column values (or cells)
#SDF16
What are row, row key, column key, and column value?
John
Lennon
1940
born
England
country
1980
died
Rock
style
artist
type
The Beatles
England
country
1957
founded
Rock
style
band
type
Row key Column keys
Column values
#SDF16
What is a wide row?
 Rows may be described as “skinny” or “wide”
• Skinny row –fixed, relatively small number of column keys
 Wide row –relatively large number of column keys (hundreds or thousands)
• For example, a row that stores all bands of the same style
• The number of such bands will increase as new bands are formed
Rock
The Animals The Beatles...
...
...
...
...
...
#SDF16
What are composite row key and composite column key?
 Composite row key – multiple components separated by colon
 Composite column key – multiple components separated by colon
• Composite column keys are sorted by each component
Revolver:1966
Rock
genre
The Beatles
performer
{1: 'Taxman', ..., 14: 'Tomorrow Never Knows'}
tracks
Revolver:1966
Taxman
1:title
Eleanor Rigby
2:title
Tomorrow Never Knows
14:title...
...
#SDF16
What are partition, partition key, row, column, and cell?
Column family
view
Table with single-row
partitions
#SDF16
What are composite partition key and clustering column?
Table with multi-row partitions
partitions
album_title year num
ber
track_title
Revolver 1966 1 Taxman
Revolver 1966 … …
Revolver 1966 14 Tomorrow Never Knows
Let It Be 1970 1 Two Of Us
Let It Be 1970 … …
Let It Be 1970 11 Get Back
Magical Mystery Tour 1967 1 Magical Mystery Tour
Magical Mystery Tour 1967 … …
Magical Mystery Tour 1967 11 All You Need Is Love
rows in a partition/table
columns
composite partition key
clustering column
cells
#SDF16
What are composite partition key and clustering column?
Revolver:1966
Taxman
1:title
Two Of Us
1:title
Let It Be:1970
Magical Mystery
Tour:1967
Magical Mystery Tour
1:title
Doctor Robert
11:title
Get Back
11:title
All You Need Is Love
11:title
Tomorrow Never
Knows
14:title...
...
...
...
...
...
...
...
Table with multi-row partitions : Column family view
#SDF16
What are static columns?
Table with multi-row partitions and static columns
album_title year num
ber
genre performer track_title
Revolver 1966 1 Rock The Beatles Taxman
Revolver 1966 … Rock The Beatles …
Revolver 1966 14 Rock The Beatles Tomorrow Never Knows
Let It Be 1970 1 Rock The Beatles Two Of Us
Let It Be 1970 … Rock The Beatles …
Let It Be 1970 11 Rock The Beatles Get Back
Static
columns
#SDF16
What is a primary key?
 Primary key uniquely identifies a row in a table
 Simple or composite partition key and all clustering columns (if present)
performer born country died founded style type
John Lennon 1940 England 1980 Rock artist
Paul McCartney 1942 England Rock artist
album_title year number track_title
Revolver 1966 1 Taxman
Revolver 1966 … …
Revolver 1966 14 Tomorrow Never
Let It Be 1970 1 Two Of Us
Let It Be 1970 … …
Let It Be 1970 11 Get Back
composite partition key
+
clustering column
Primary
key
Single partition key
#SDF16
What is a table or CQL Table?
 A CQL table is a column family
• CQL tables provide two-dimensional views of a column family, which contains
potentially multi-dimensional data, due to composite keys and collections
 CQL table and column family are largely interchangeable terms
 Supported by declarative language Cassandra Query Language (CQL)
#SDF16
Cassandra Query Language (CQL)
 Data Definition Language, subset of CQL
 SQL-like syntax, but with somewhat different semantics
#SDF16
Cassandra Data Model differences from RDBMS
Cassandra RDBMS
Cassandra deals with unstructured data. RDBMS deals with structured data.
Cassandra has a flexible schema. It has a fixed schema.
In Cassandra, a table is a list of “nested key-value
pairs”. (ROW x COLUMN key x COLUMN value)
In RDBMS, a table is an array of arrays. (ROW x
COLUMN)
keyspace is the outermost container that contains data
corresponding to an application.
Database is the outermost container that contains
data corresponding to an application.
Tables or column families are the entity of a keyspace. Tables are the entities of a database.
Row is a unit of replication in Cassandra. Row is an individual record in RDBMS.
Column is a unit of storage in Cassandra. Column represents the attributes of a relation.
Relationships are represented using collections. RDBMS supports the concepts of foreign keys, joins.
#SDF16
Cassandra Write path
#SDF16
Write path: how is data written
 Cassandra is a log-structured storage engine
 Data is sequentially appended, not placed in pre-set locations
RDBMS CASSANDRA
Continuously appends to a log
?
?
Seeks and writes values to
various pre-set locations
#SDF16
How does the write path flow on a node?
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
Node memory
Node file system
Client partition key1 first:Oscar last:Orange level:42
partition key2 first:martin last:Blue
Memtable (corresponds to a CQL table)Coordinator
CommitLog
AppendOnly … … … …
… … … …
Flush current state to SSTable
Each write request … Periodically …
Periodically …
… … … …
… … … …
… … … …
… … … …
… … … …
Compaction
Compact related SSTables
SSTables
partition key3 first:Ricky last:Red
#SDF16
What is the Commit Log?
 An append-only log used to automatically rebuild Memtables
on restart of a downed node.
 Memtables flush to disk when CommitLog size reaches total
allowed space
 Entries are marked as flushed, as corresponding Memtable
entries flush to disk as an SSTable
 CommitLog options are configured in the Cassandra.yaml file
CommitLog
#SDF16
What are Memtables and how are they flushed to disk?
 Memtables are in-memory representations of a CQL table :
• Each node has a Memtable for each CQL table in the keyspace
• Each Memtable accrues writes and provides reads for data not yet flushed
• Updates to Memtables mutate the in-memory partition
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable
#SDF16
What is a SSTable and what are its characteristics?
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
 A SSTable ("sorted string table") is
• an immutable file of sorted partitions
• written to disk through fast, sequential i/o
• contains the state of a Memtable when flushed
 The current data state of a CQL table is comprised of
• its corresponding Memtable plus
• all current SSTables flushed from that Memtable
 SSTables are periodically compacted from many to one
#SDF16
Cassandra Read path
#SDF16
How does the read path flow on each node?
MemTable (e.g., player)
Coordinator
SSTables (e.g., player)
… … … …
pk7 … … level:42
timestamp 1114
pk1 … … …
pk7 first:Betty
timestamp 541
last:Blue
timestamp 541
level:63
timestamp 541
pk2 … … …
pk7 first:Elizabeth
timestamp 994
pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
Read
<pk7>
Hit
pk1, pk2pk1, pk2, pk7
Node memory
Node file system
Off Heap On HeapRow cache hit
pk1 … … …
pk2 … … …
#SDF16
How does the read path flow on each node?
MemTable (e.g., player)
… … … …
pk7 … … level:42
timestamp 1114
pk1 … … …
pk7 first:Betty
timestamp 541
last:Blue
timestamp 541
level:63
timestamp 541
pk2 … … …
pk7 first:Elizabeth
timestamp 994
pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
pk1 … … …
pk2 … … …
Bloom Filter
Bloom Filter
Bloom Filter
Miss
pk1, pk2
Node memory
Node file system
Hit
Hit
Read
<pk7>
Off Heap On HeapKey cache hit
Coordinator
?
Key
Cache
pk7?
, pk7
SSTables (e.g., player)
#SDF16
How does the read path flow on each node?
MemTable (e.g., player)
… … … …
pk7 … … level:42
timestamp 1114
pk1 … … …
pk7 first:Betty
timestamp 541
last:Blue
timestamp 541
level:63
timestamp 541
Pk2 … … …
pk7 first:Elizabeth
timestamp 994
Pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
pk1 … … …
pk2 … … …
pk1, pk2
Bloom Filter
Bloom Filter
Bloom Filter
Miss
pk1, pk2, pk7
Node memory
Node file system
Miss Partition
Summary
Partition
Index
Miss Partition
Summary
Partition
Index
?
?
Key
Cache
pk7
Read
<pk7>
Off Heap On HeapRow and Key miss
Coordinator
SSTables (e.g., player)
#SDF16
Tools
#SDF16
Tools : CQLSH
 Interactive command line CQL utility
 Supports tab completion for commands
 Think of it as SQL*Plus for Cassandra
#SDF16
Tools : Cassandra Cluster Manager (CCM)
 Open source utility
 Creates and manages multi-node clusters on a local machine
 Not for production configuration
 Useful for :
• Testing failure scenarios
• Development / Prototyping without the hardware
• Version migrations
• …
#SDF16
Tools : Nodetool
 Command-line cluster management utility
 Supports over 40 commands like :
• Status
• Info
• ring
#SDF16
Tools : Datastax : DevCenter
 Visually Create and Navigate Database Objects
 View Query Results and Tune Queries for Faster Performance
#SDF16
Tools : Datastax OpsCenter
 Web-based visual management and monitoring solution
 Visual cluster management
 Point-and-Click Provisioning and Administration
 Secured administration
 Visual monitoring and tuning
 Customizable Dashboards
#SDF16
Tools : Datastax OpsCenter
#SDF16
Tools : Datastax OpsCenter
#SDF16
Last thoughts and conclusion
#SDF16
DBAs wanted
 NoSQL and Cassandra will not replace RDBMS
 Different tools for different jobs
 Current situation :
• Community largely driven by developers and sysadmins
• Community needs insight from DBAs to make the database evolve
• Get involved!
#SDF16
https://academy.datastax.com/
https://academy.datastax.com/courses
https://academy.datastax.com/courses/ds201-cassandra-core-concepts
https://academy.datastax.com/courses/ds210-operations-and-performance-tuning
Sources and additionnal information
#SDF16
Ulises Fasoli
Senior Consultant AD Trivadis
Tél. +41 21 321 47 00
ulises.fasoli@trivadis.com
#SDF16
Feedback
Confirmez votre présence et évaluez la session avec ce QRC.
Un vol en montgolfière à gagner !

Contenu connexe

Similaire à Le monde NOSQL pour les spécialistes du relationnel,

Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scanditCharlie Zhu
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - DenverJon Haddad
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLMongoDB
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
 
NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.Tony Rogerson
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf3operatordcslipiPeng
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010aaronmorton
 

Similaire à Le monde NOSQL pour les spécialistes du relationnel, (20)

Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scandit
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQL
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.NoSQL, SQL, NewSQL - methods of structuring data.
NoSQL, SQL, NewSQL - methods of structuring data.
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
1 DES.pdf
1 DES.pdf1 DES.pdf
1 DES.pdf
 
Des lecture
Des lectureDes lecture
Des lecture
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010
 
Editors l21 l24
Editors l21 l24Editors l21 l24
Editors l21 l24
 

Plus de Swiss Data Forum Swiss Data Forum

Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...Swiss Data Forum Swiss Data Forum
 
Bigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueBigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueSwiss Data Forum Swiss Data Forum
 
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Swiss Data Forum Swiss Data Forum
 
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessLe Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessSwiss Data Forum Swiss Data Forum
 
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...Swiss Data Forum Swiss Data Forum
 

Plus de Swiss Data Forum Swiss Data Forum (19)

Cloud transition - The Trivadis approach
Cloud transition - The Trivadis approachCloud transition - The Trivadis approach
Cloud transition - The Trivadis approach
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Optimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec AzureOptimiser votre infrastructure SQL Server avec Azure
Optimiser votre infrastructure SQL Server avec Azure
 
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
 
Building High-scalable Enterprise Solutions,
Building High-scalable Enterprise Solutions, Building High-scalable Enterprise Solutions,
Building High-scalable Enterprise Solutions,
 
Customer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° viewCustomer Event Hub - the modern Customer 360° view
Customer Event Hub - the modern Customer 360° view
 
Bigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétiqueBigdata et datamining au service de la transition énergétique
Bigdata et datamining au service de la transition énergétique
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le...
 
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom BusinessLe Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business
 
IoT – The reality of real world solutions
IoT – The reality of real world solutions IoT – The reality of real world solutions
IoT – The reality of real world solutions
 
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
The Power of Mobile & Cloud: Building a Homesecurity-System with Microsoft Az...
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
IT-Analytics: Screen your IT processes with BI Technology
IT-Analytics: Screen your IT processes with BI TechnologyIT-Analytics: Screen your IT processes with BI Technology
IT-Analytics: Screen your IT processes with BI Technology
 
PoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expériencePoC Oracle Exadata - Retour d'expérience
PoC Oracle Exadata - Retour d'expérience
 
A gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R EnterpriseA gentle introduction to Oracle R Enterprise
A gentle introduction to Oracle R Enterprise
 
Mobilité dans l'entreprise - Facts & Figures
Mobilité dans l'entreprise - Facts & FiguresMobilité dans l'entreprise - Facts & Figures
Mobilité dans l'entreprise - Facts & Figures
 
Information Life Cycle Management avec Oracle 12c
Information Life Cycle Management avec Oracle 12cInformation Life Cycle Management avec Oracle 12c
Information Life Cycle Management avec Oracle 12c
 
Data vault modeling et retour d'expérience
Data vault modeling et retour d'expérienceData vault modeling et retour d'expérience
Data vault modeling et retour d'expérience
 

Dernier

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Dernier (20)

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Le monde NOSQL pour les spécialistes du relationnel,

  • 1. BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH #SDF16 Cassandra for DBAs Ulises Fasoli – Senior Consultant Trivadis Lausanne AD
  • 2. #SDF16 Programme 1. NoSQL Landscape 2. What is Apache Cassandra? 3. Cassandra Architecture 4. Data Distribution & Replication 5. Cassandra Data Model 6. Cassandra Write path 7. Cassandra Read path 8. Tools 9. Last thoughts and conclusion
  • 4. #SDF16 NoSQL Landscape : Type of databases Type Example Description Key-Value keys map to arbitrary values of any data type Wide Column keys mapped to sets of n-number of typed columns Document document sets (JSON) queryable in whole or part Graph data elements each relate to n others in a graph/network
  • 5. #SDF16 Brewer's CAP Theorem • Consistency do you get identical results, regardless which node is queried? • Availability can the cluster respond to very high write and read volumes? • Network Partition tolerance is the cluster still available when part of it goes dark? Availability Consistency Network Partition Tolerance n/a CA CP AP Any networked shared-data system can have at most two of the three desirable properties:
  • 7. #SDF16 Cassandra  Fully distributed, with no single point of failure  Free and open source, with deep developer support  Highly performant, with near-linear horizontal scaling in proper use cases Bigtable Dynamo
  • 8. #SDF16 Use cases for Cassandra  Product Catalog / Playlists  Personalization  Ads / Recommendations  Fraud Detection  Time Series  IoT / Sensor Data  Graph / Network data
  • 10. #SDF16 Architecture overview  Designed with the understanding that system/hardware failures can and do occur  Peer-to-peer, distributed system  All nodes are identical in the cluster  Data partitioned among all nodes in the cluster  Custom data replication to ensure fault tolerance  Read/Write-anywhere design
  • 11. #SDF16 What is a cluster?  A peer to peer set of nodes • Node – one Cassandra instance • Rack – a logical set of nodes • Data Center – a logical set of racks • Cluster – the full set of nodes which map to a single complete token ring Node 4 Node 1 Node 2 Data Center - East Node 1 Node 3 Node 4 Node 3 Data Center - West Rack 1 Rack 2 Node 2 Rack 1 Rack 2 Cassandra Cluster - 263+ 263 Token Range (Murmur3)
  • 12. #SDF16 What is a cluster? Node 1 Node 3 Node 2Node 4 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4 Seed  Nodes join a cluster based on the configuration of their own conf/cassandra.yaml file  Some key settings : • cluster_name • seeds • listen_address
  • 13. #SDF16 What is a coordinator?  The node chosen by the client to receive a particular read or write request to its cluster  Any node can coordinate any request  Each client request may be coordinated by a different node  No single point of failure  Fundamental principle to Cassandra's architecture Node 1 Node 3 Node 2Node 4 Client Driver
  • 15. #SDF16 Data Partitioning & distribution  Nodes are logically structured in Ring Topology.  Each node is responsible for a part of the overall database  Data is assigned to a specific node based on a hashed value of key  Lightly loaded nodes can move position to alleviate highly loaded nodes
  • 17. #SDF16 Data Replication  Defined at keyspace level o Replication factor : how many replicas to make o Replication strategy : on which node should each replica be placed  All partitions are "replicas", there are no "originals"  First replica : placed on the node owning its token's primary range
  • 18. #SDF16 Data Replication / Distribution  Native data replication / distribution support  Transparently handled by Cassandra  Multi-data center capable  Hybrid Cloud/On premise support
  • 19. #SDF16 What is consistency ? Node 1 Node 3 Node 2Node 4 Client Driver Just one? CL=ONE Two? CL=TWO 51%? CL=QUORUM  Partition key determines which nodes are sent any given request • Consistency Level : how many nodes must acknowledge before response is sent  The meaning varies by type • Write request – how many nodes must acknowledge the write? • Read request – how many nodes must acknowledge by sending their most recent copy of the data?
  • 20. #SDF16 What is immediate vs. eventual consistency?  Immediate Consistency – reads always return the most recent data • Immediate consistency guaranteed with Consistency Level ALL • Highest latency (all replicas are checked and compared)  Eventual Consistency – reads may return stale data • Consistency Level ONE carries the highest risk of stale data • Lowest latency (first replica is immediately returned) ANY ALLONE TWO . . . . 0 Total Nodes (N)1 2 Available Replicas Consistency Level Read repairs are there to prevent entropy
  • 22. #SDF16 Cassandra Data Model  The Cassandra data model defines 1. Column family as a way to store and organize data 2. Table as a two-dimensional view of a multi-dimensional column family 3. Cassandra Query Language (CQL) : A language to perform operations on tables
  • 24. #SDF16 What is a column family row key3 v3.a cola v3.b colb v3.c colc v1.a cola v1.b colb v1.c colc v2.a cola v2.b colb v2.c colc v3.d cold v1.d cold v2.d cold COLUMNS ROWS row key1 row key2 CELLS  Column family – set of rows with a similar structure • Sorted columns • Multidimensional • Distributed • Sparse
  • 25. #SDF16 What are row, row key, column key, and column value? • Rows – individual rows constitute a column family • Row key – uniquely identifies a row in a column family • Row – stores pairs of column keys and column values • Column key – uniquely identifies a column value in a row • Column value – stores one value or a collection of values row key va cola vb colb vc colc vd cold Column keys (or column names)Row Column values (or cells)
  • 26. #SDF16 What are row, row key, column key, and column value? John Lennon 1940 born England country 1980 died Rock style artist type The Beatles England country 1957 founded Rock style band type Row key Column keys Column values
  • 27. #SDF16 What is a wide row?  Rows may be described as “skinny” or “wide” • Skinny row –fixed, relatively small number of column keys  Wide row –relatively large number of column keys (hundreds or thousands) • For example, a row that stores all bands of the same style • The number of such bands will increase as new bands are formed Rock The Animals The Beatles... ... ... ... ... ...
  • 28. #SDF16 What are composite row key and composite column key?  Composite row key – multiple components separated by colon  Composite column key – multiple components separated by colon • Composite column keys are sorted by each component Revolver:1966 Rock genre The Beatles performer {1: 'Taxman', ..., 14: 'Tomorrow Never Knows'} tracks Revolver:1966 Taxman 1:title Eleanor Rigby 2:title Tomorrow Never Knows 14:title... ...
  • 29. #SDF16 What are partition, partition key, row, column, and cell? Column family view Table with single-row partitions
  • 30. #SDF16 What are composite partition key and clustering column? Table with multi-row partitions partitions album_title year num ber track_title Revolver 1966 1 Taxman Revolver 1966 … … Revolver 1966 14 Tomorrow Never Knows Let It Be 1970 1 Two Of Us Let It Be 1970 … … Let It Be 1970 11 Get Back Magical Mystery Tour 1967 1 Magical Mystery Tour Magical Mystery Tour 1967 … … Magical Mystery Tour 1967 11 All You Need Is Love rows in a partition/table columns composite partition key clustering column cells
  • 31. #SDF16 What are composite partition key and clustering column? Revolver:1966 Taxman 1:title Two Of Us 1:title Let It Be:1970 Magical Mystery Tour:1967 Magical Mystery Tour 1:title Doctor Robert 11:title Get Back 11:title All You Need Is Love 11:title Tomorrow Never Knows 14:title... ... ... ... ... ... ... ... Table with multi-row partitions : Column family view
  • 32. #SDF16 What are static columns? Table with multi-row partitions and static columns album_title year num ber genre performer track_title Revolver 1966 1 Rock The Beatles Taxman Revolver 1966 … Rock The Beatles … Revolver 1966 14 Rock The Beatles Tomorrow Never Knows Let It Be 1970 1 Rock The Beatles Two Of Us Let It Be 1970 … Rock The Beatles … Let It Be 1970 11 Rock The Beatles Get Back Static columns
  • 33. #SDF16 What is a primary key?  Primary key uniquely identifies a row in a table  Simple or composite partition key and all clustering columns (if present) performer born country died founded style type John Lennon 1940 England 1980 Rock artist Paul McCartney 1942 England Rock artist album_title year number track_title Revolver 1966 1 Taxman Revolver 1966 … … Revolver 1966 14 Tomorrow Never Let It Be 1970 1 Two Of Us Let It Be 1970 … … Let It Be 1970 11 Get Back composite partition key + clustering column Primary key Single partition key
  • 34. #SDF16 What is a table or CQL Table?  A CQL table is a column family • CQL tables provide two-dimensional views of a column family, which contains potentially multi-dimensional data, due to composite keys and collections  CQL table and column family are largely interchangeable terms  Supported by declarative language Cassandra Query Language (CQL)
  • 35. #SDF16 Cassandra Query Language (CQL)  Data Definition Language, subset of CQL  SQL-like syntax, but with somewhat different semantics
  • 36. #SDF16 Cassandra Data Model differences from RDBMS Cassandra RDBMS Cassandra deals with unstructured data. RDBMS deals with structured data. Cassandra has a flexible schema. It has a fixed schema. In Cassandra, a table is a list of “nested key-value pairs”. (ROW x COLUMN key x COLUMN value) In RDBMS, a table is an array of arrays. (ROW x COLUMN) keyspace is the outermost container that contains data corresponding to an application. Database is the outermost container that contains data corresponding to an application. Tables or column families are the entity of a keyspace. Tables are the entities of a database. Row is a unit of replication in Cassandra. Row is an individual record in RDBMS. Column is a unit of storage in Cassandra. Column represents the attributes of a relation. Relationships are represented using collections. RDBMS supports the concepts of foreign keys, joins.
  • 38. #SDF16 Write path: how is data written  Cassandra is a log-structured storage engine  Data is sequentially appended, not placed in pre-set locations RDBMS CASSANDRA Continuously appends to a log ? ? Seeks and writes values to various pre-set locations
  • 39. #SDF16 How does the write path flow on a node? … … … … … … … … … … … … … … … … … … … … … … … … Node memory Node file system Client partition key1 first:Oscar last:Orange level:42 partition key2 first:martin last:Blue Memtable (corresponds to a CQL table)Coordinator CommitLog AppendOnly … … … … … … … … Flush current state to SSTable Each write request … Periodically … Periodically … … … … … … … … … … … … … … … … … … … … … Compaction Compact related SSTables SSTables partition key3 first:Ricky last:Red
  • 40. #SDF16 What is the Commit Log?  An append-only log used to automatically rebuild Memtables on restart of a downed node.  Memtables flush to disk when CommitLog size reaches total allowed space  Entries are marked as flushed, as corresponding Memtable entries flush to disk as an SSTable  CommitLog options are configured in the Cassandra.yaml file CommitLog
  • 41. #SDF16 What are Memtables and how are they flushed to disk?  Memtables are in-memory representations of a CQL table : • Each node has a Memtable for each CQL table in the keyspace • Each Memtable accrues writes and provides reads for data not yet flushed • Updates to Memtables mutate the in-memory partition partition key1 first:Oscar last:Orange level:42 partition key2 first:Ricky last:Red Memtable
  • 42. #SDF16 What is a SSTable and what are its characteristics? … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …  A SSTable ("sorted string table") is • an immutable file of sorted partitions • written to disk through fast, sequential i/o • contains the state of a Memtable when flushed  The current data state of a CQL table is comprised of • its corresponding Memtable plus • all current SSTables flushed from that Memtable  SSTables are periodically compacted from many to one
  • 44. #SDF16 How does the read path flow on each node? MemTable (e.g., player) Coordinator SSTables (e.g., player) … … … … pk7 … … level:42 timestamp 1114 pk1 … … … pk7 first:Betty timestamp 541 last:Blue timestamp 541 level:63 timestamp 541 pk2 … … … pk7 first:Elizabeth timestamp 994 pk7 first:Elizabeth last:Blue level:42 Row Cache (optional) Read <pk7> Hit pk1, pk2pk1, pk2, pk7 Node memory Node file system Off Heap On HeapRow cache hit pk1 … … … pk2 … … …
  • 45. #SDF16 How does the read path flow on each node? MemTable (e.g., player) … … … … pk7 … … level:42 timestamp 1114 pk1 … … … pk7 first:Betty timestamp 541 last:Blue timestamp 541 level:63 timestamp 541 pk2 … … … pk7 first:Elizabeth timestamp 994 pk7 first:Elizabeth last:Blue level:42 Row Cache (optional) pk1 … … … pk2 … … … Bloom Filter Bloom Filter Bloom Filter Miss pk1, pk2 Node memory Node file system Hit Hit Read <pk7> Off Heap On HeapKey cache hit Coordinator ? Key Cache pk7? , pk7 SSTables (e.g., player)
  • 46. #SDF16 How does the read path flow on each node? MemTable (e.g., player) … … … … pk7 … … level:42 timestamp 1114 pk1 … … … pk7 first:Betty timestamp 541 last:Blue timestamp 541 level:63 timestamp 541 Pk2 … … … pk7 first:Elizabeth timestamp 994 Pk7 first:Elizabeth last:Blue level:42 Row Cache (optional) pk1 … … … pk2 … … … pk1, pk2 Bloom Filter Bloom Filter Bloom Filter Miss pk1, pk2, pk7 Node memory Node file system Miss Partition Summary Partition Index Miss Partition Summary Partition Index ? ? Key Cache pk7 Read <pk7> Off Heap On HeapRow and Key miss Coordinator SSTables (e.g., player)
  • 48. #SDF16 Tools : CQLSH  Interactive command line CQL utility  Supports tab completion for commands  Think of it as SQL*Plus for Cassandra
  • 49. #SDF16 Tools : Cassandra Cluster Manager (CCM)  Open source utility  Creates and manages multi-node clusters on a local machine  Not for production configuration  Useful for : • Testing failure scenarios • Development / Prototyping without the hardware • Version migrations • …
  • 50. #SDF16 Tools : Nodetool  Command-line cluster management utility  Supports over 40 commands like : • Status • Info • ring
  • 51. #SDF16 Tools : Datastax : DevCenter  Visually Create and Navigate Database Objects  View Query Results and Tune Queries for Faster Performance
  • 52. #SDF16 Tools : Datastax OpsCenter  Web-based visual management and monitoring solution  Visual cluster management  Point-and-Click Provisioning and Administration  Secured administration  Visual monitoring and tuning  Customizable Dashboards
  • 56. #SDF16 DBAs wanted  NoSQL and Cassandra will not replace RDBMS  Different tools for different jobs  Current situation : • Community largely driven by developers and sysadmins • Community needs insight from DBAs to make the database evolve • Get involved!
  • 58. #SDF16 Ulises Fasoli Senior Consultant AD Trivadis Tél. +41 21 321 47 00 ulises.fasoli@trivadis.com
  • 59. #SDF16 Feedback Confirmez votre présence et évaluez la session avec ce QRC. Un vol en montgolfière à gagner !