SlideShare une entreprise Scribd logo
1  sur  110
Télécharger pour lire hors ligne
Introduction to Cassandra 
DuyHai DOAN, Technical Advocate 
@doanduyhai
Shameless self-promotion! 
@doanduyhai 
2 
Duy Hai DOAN 
Cassandra technical advocate 
• talks, meetups, confs 
• open-source devs (Achilles, …) 
• OSS Cassandra point of contact 
☞ duy_hai.doan@datastax.com 
• production troubleshooting
Datastax! 
@doanduyhai 
3 
• Founded in April 2010 
• We contribute a lot to Apache Cassandra™ 
• 400+ customers (25 of the Fortune 100), 200+ employees 
• Headquarter in San Francisco Bay area 
• EU headquarter in London, offices in France and Germany 
• Datastax Enterprise = OSS Cassandra + extra features
Agenda! 
@doanduyhai 
4 
Architecture 
• Cluster, Replication, Consistency 
Data model 
• Last Write Win (LWW), CQL basics, From SQL to CQL 
Dev Center Demo 
DSE overview 
CQL In Depth (time permitted)
Cassandra history! 
@doanduyhai 
5 
NoSQL database 
• created at Facebook 
• open-sourced since 2008 
• current version = 2.1 
• column-oriented ☞ distributed table
Cassandra 5 key facts! 
@doanduyhai 
6 
Linear scalability 
C* 
C*C* 
NetcoSports 
3 nodes, ≈3GB 
NetFlix 
1k+ nodes, PB+ 
YOU
Cassandra 5 key facts! 
@doanduyhai 
7 
Continuous availability (≈100% up-time) 
• resilient architecture (Dynamo)
Rolling Upgrades! 
@doanduyhai 
8 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
Live production
Rolling Upgrades! 
@doanduyhai 
9 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
Live production
Rolling Upgrades! 
@doanduyhai 
10 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
Live production
Cassandra 5 key facts! 
@doanduyhai 
11 
Continuous availability (≈100% up-time) 
• resilient architecture (Dynamo) 
• rolling upgrades 
• data backward compatible n/n+1 versions
Cassandra 5 key facts! 
@doanduyhai 
12 
Multi-data centers 
• out-of-the-box (config only) 
• AWS conf for multi-region DCs 
• GCE/CloudStack support
Muti-DC usages! 
@doanduyhai 
13 
New York (DC1) 
London (DC2) 
Data-locality, disaster recovery 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
n1 
n2 
n3 
n n4 5 
n1 
Async 
replication
Muti-DC usages! 
@doanduyhai 
14 
Workload segregation/virtual DC 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
n1 
n2 
n3 
n n4 5 
n1 
Production 
(Live) 
Analytics 
(Spark/Hadoop) 
Same DC
Muti-DC usages! 
@doanduyhai 
15 
Prod data copy for testing/benchmarking 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
n1 
n2 
n1 n3 
Use 
LOCAL 
consistency 
My tiny test 
cluster 
Data copy 
❌ 
Never read 
back
Cassandra 5 key facts! 
@doanduyhai 
16 
Operational simplicity 
• 1 node = 1 process + 1 config file 
• deployment automation 
• OpsCenter for monitoring
Cassandra 5 key facts! 
@doanduyhai 
17
Cassandra 5 key facts! 
@doanduyhai 
18 
Analytics combo 
• Cassandra + Spark = awesome ! 
• realtime streaming
Cassandra architecture! 
Cluster 
Replication 
Consistency
Cassandra architecture! 
@doanduyhai 
20 
Cluster layer 
• Amazon DynamoDB paper 
• masterless architecture 
Data-store layer 
• Google Big Table paper 
• Columns/columns family
Cassandra architecture! 
@doanduyhai 
21 
API (CQL & RPC) 
CLUSTER (DYNAMO) 
DATA STORE (BIG TABLES) 
DISKS 
Node1 
Client request 
API (CQL & RPC) 
CLUSTER (DYNAMO) 
DATA STORE (BIG TABLES) 
DISKS 
Node2
Data distribution! 
@doanduyhai 
22 
Random: hash of #partition → token = hash(#p) 
Hash: ]-X, X] 
X = huge number (264/2) 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8
Token Ranges! 
@doanduyhai 
23 
A: ]0, X/8] 
B: ] X/8, 2X/8] 
C: ] 2X/8, 3X/8] 
D: ] 3X/8, 4X/8] 
E: ] 4X/8, 5X/8] 
F: ] 5X/8, 6X/8] 
G: ] 6X/8, 7X/8] 
H: ] 7X/8, X] 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
A 
B 
C 
D 
E 
F 
G 
H
Linear scalability! 
@doanduyhai 
24 
n1 
n2 
8 nodes 10 nodes 
n3 
n4 
n5 
n6 
n7 
n8 
n1 
n2 
n3 n4 
n5 
n6 
n7 
n9 n8 
n10
Failure tolerance! 
@doanduyhai 
25 
Replication Factor (RF) = 3 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
{B, A, H} 
{C, B, A} 
{D, C, B} 
A 
B 
C 
D 
E 
F 
G 
H
Coordinator node! 
Incoming requests (read/write) 
Coordinator node handles the request 
Every node can be coordinator àmasterless 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
request
Consistency! 
@doanduyhai 
27 
Tunable at runtime 
• ONE 
• QUORUM (strict majority w.r.t. RF) 
• ALL 
Apply both to read & write
Write consistency! 
Write ONE 
• write request to all replicas in // 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Write consistency! 
Write ONE 
• write request to all replicas in // 
• wait for ONE ack before returning to 
client 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs
Write consistency! 
Write ONE 
• write request to all replicas in // 
• wait for ONE ack before returning to 
client 
• other acks later, asynchronously 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs 
10 μs 
120 μs
Write consistency! 
Write QUORUM 
• write request to all replicas in // 
• wait for QUORUM acks before 
returning to client 
• other acks later, asynchronously 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator 
5 μs 
10 μs 
120 μs
Read consistency! 
Read ONE 
• read from one node among all replicas 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read ONE 
• read from one node among all replicas 
• contact the fastest node (stats) 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
replicas to reach QUORUM 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
replicas to reach QUORUM 
• return most up-to-date data to client 
@doanduyhai 
n1 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Read consistency! 
Read QUORUM 
• read from one fastest node 
• AND request digest from other 
replicas to reach QUORUM 
• return most up-to-date data to client 
• repair if digest mismatch n1 
@doanduyhai 
n2 
n3 
n4 
n5 
n6 
n7 
n8 
1 
2 
3 
coordinator
Consistency trade-off! 
@doanduyhai 
38
Consistency in action! 
@doanduyhai 
39 
RF = 3, Write ONE, Read ONE 
B A A 
B A A 
Read ONE: A 
data replication in progress … 
Write ONE: B
Consistency in action! 
@doanduyhai 
40 
RF = 3, Write ONE, Read QUORUM 
B A A 
Write ONE: B 
Read QUORUM: A 
data replication in progress … 
B A A
Consistency in action! 
@doanduyhai 
41 
RF = 3, Write ONE, Read ALL 
B A A 
Read ALL: B 
data replication in progress … 
B A A 
Write ONE: B
Consistency in action! 
@doanduyhai 
42 
RF = 3, Write QUORUM, Read ONE 
B B A 
Write QUORUM: B 
Read ONE: A 
data replication in progress … 
B B A
Consistency in action! 
@doanduyhai 
43 
RF = 3, Write QUORUM, Read QUORUM 
B B A 
Read QUORUM: B 
data replication in progress … 
B B A 
Write QUORUM: B
Consistency level! 
@doanduyhai 
44 
ONE 
Fast, may not read latest written value
Consistency level! 
@doanduyhai 
45 
QUORUM 
Strict majority w.r.t. Replication Factor 
Good balance
Consistency level! 
@doanduyhai 
46 
ALL 
Paranoid 
Slow, no high availability
Consistency summary! 
ONERead + ONEWrite 
☞ available for read/write even (N-1) replicas down 
QUORUMRead + QUORUMWrite 
☞ available for read/write even 1+ replica down 
@doanduyhai 47
! " 
! 
Q & R
Data model! 
Cassandra Write Path! 
Last Write Win! 
CQL basics! 
From SQL to CQL!
Cassandra Write Path! 
@doanduyhai 
50 
Commit log1 
. . . 
1 
Commit log2 
Commit logn 
Memory
Cassandra Write Path! 
@doanduyhai 
51 
Memory 
MemTable 
Table1 
Commit log1 
. . . 
1 
Commit log2 
Commit logn 
MemTable 
Table2 
MemTable 
TableN 
2 
. . .
Cassandra Write Path! 
@doanduyhai 
52 
Commit log1 
Commit log2 
Commit logn 
Table1 
Table2 Table3 
SStable2 SStable3 3 
SStable1 
Memory 
. . .
Cassandra Write Path! 
@doanduyhai 
53 
MemTable . . . Memory 
Table1 
Commit log1 
Commit log2 
Commit logn 
Table1 
SStable1 
Table2 Table3 
SStable2 SStable3 
MemTable 
Table2 
MemTable 
TableN 
. . .
Cassandra Write Path! 
@doanduyhai 
54 
Commit log1 
Commit log2 
SStable3 . . . 
Commit logn 
Table1 
SStable1 
Memory 
Table2 Table3 
SStable2 SStable3 
SStable1 
SStable2
Last Write Win (LWW)! 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
@doanduyhai 
55 
jdoe 
age 
name 
33 John DOE 
#partition
Last Write Win (LWW)! 
@doanduyhai 
jdoe 
age (t1) name (t1) 
33 John DOE 
56 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
auto-generated timestamp 
.
Last Write Win (LWW)! 
@doanduyhai 
57 
UPDATE users SET age = 34 WHERE login = jdoe; 
jdoe 
SSTable1 SSTable2 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
58 
DELETE age FROM users WHERE login = jdoe; 
tombstone 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
59 
SELECT age FROM users WHERE login = jdoe; 
? ? ? 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Last Write Win (LWW)! 
@doanduyhai 
60 
SELECT age FROM users WHERE login = jdoe; 
✕ ✕ ✓ 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34
Compaction! 
@doanduyhai 
61 
SSTable1 SSTable2 SSTable3 
jdoe 
age (t3) 
ý 
jdoe 
age (t1) name (t1) 
33 John DOE 
jdoe 
age (t2) 
34 
New SSTable 
jdoe 
age (t3) name (t1) 
ý John DOE
Historical data! 
You want to keep data history ? 
• do not use internal generated timestamp !!! 
• ☞ time-series data modeling 
@doanduyhai 
62 
history 
id 
SSTable1 SSTable2 
date1(t1) date2(t2) … date9(t9) 
… … … … 
id 
date10(t10) date11(t11) … 
… 
… … … …
CRUD operations! 
@doanduyhai 
63 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); 
UPDATE users SET age = 34 WHERE login = jdoe; 
DELETE age FROM users WHERE login = jdoe; 
SELECT age FROM users WHERE login = jdoe;
Simple Table! 
@doanduyhai 
64 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
partition key (#partition)
Clustered table (1 – N)! 
@doanduyhai 
65 
CREATE TABLE mailbox ( 
login text, 
message_id timeuuid, 
interlocutor text, 
message text, 
PRIMARY KEY((login), message_id)); 
partition key clustering column 
(sorted) 
unicity
Queries! 
@doanduyhai 
66 
Get message by user and message_id (date) 
SELECT * FROM mailbox WHERE login = jdoe 
and message_id = ‘2014-09-25 16:00:00’; 
Get message by user and date interval 
SELECT * FROM mailbox WHERE login = jdoe 
and message_id <= ‘2014-09-25 16:00:00’ 
and message_id >= ‘2014-09-20 16:00:00’;
Queries! 
@doanduyhai 
67 
Get message by message_id only (#partition not provided) 
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’; 
Get message by date interval (#partition not provided) 
SELECT * FROM mailbox WHERE 
and message_id <= ‘2014-09-25 16:00:00’ 
and message_id >= ‘2014-09-20 16:00:00’;
Queries! 
Get message by user range (range query on #partition) 
Get message by user pattern (non exact match on #partition) 
@doanduyhai 
68 
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; 
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
WHERE clause restrictions! 
@doanduyhai 
69 
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition 
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed 
• ☞ full cluster scan 
On clustering columns, only range queries (<, ≤, >, ≥) and exact match 
WHERE clause only possible on columns defined in PRIMARY KEY
WHERE clause restrictions! 
@doanduyhai 
70 
What if I want to perform « arbitrary » WHERE clause ? 
• search form scenario, dynamic search fields
WHERE clause restrictions! 
@doanduyhai 
71 
What if I want to perform « arbitrary » WHERE clause ? 
• search form scenario, dynamic search fields 
☞ Apache Solr (Lucene) integration (Datastax Enterprise) 
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’; 
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
Collections & maps! 
@doanduyhai 
72 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
friends set<text>, 
hobbies list<text>, 
languages map<int, text>, 
… 
PRIMARY KEY(login));
User Defined Type (UDT)! 
Instead of 
@doanduyhai 
73 
CREATE TABLE users ( 
login text, 
… 
street_number int, 
street_name text, 
postcode int, 
country text, 
… 
PRIMARY KEY(login));
User Defined Type (UDT)! 
@doanduyhai 
74 
CREATE TYPE address ( 
street_number int, 
street_name text, 
postcode int, 
country text); 
CREATE TABLE users ( 
login text, 
… 
location frozen <address>, 
… 
PRIMARY KEY(login));
UDT insert! 
@doanduyhai 
75 
INSERT INTO users(login,name, location) VALUES ( 
‘jdoe’, 
’John DOE’, 
{ 
‘street_number’: 124, 
‘street_name’: ‘Congress Avenue’, 
‘postcode’: 95054, 
‘country’: ‘USA’ 
});
UDT update! 
@doanduyhai 
76 
UPDATE users set location = 
{ 
‘street_number’: 125, 
‘street_name’: ‘Congress Avenue’, 
‘postcode’: 95054, 
‘country’: ‘USA’ 
} 
WHERE login = jdoe; 
Can be nested ☞ store documents 
• but no dynamic fields (or use map<string, blob>)
From SQL to CQL! 
@doanduyhai 
77 
Remember…
From SQL to CQL! 
@doanduyhai 
78 
Remember… 
CQL is not SQL
From SQL to CQL! 
@doanduyhai 
79 
Remember… 
there is no join 
(do you want to scale ?)
From SQL to CQL! 
@doanduyhai 
80 
Remember… 
there is no integrity constraint 
(do you want to read-before-write ?)
From SQL to CQL! 
@doanduyhai 
81 
Normalized 
User 
1 
n 
Comment 
CREATE TABLE comments ( 
article_id uuid, 
comment_id timeuuid, 
author_id text, // typical join id 
content text, 
PRIMARY KEY((article_id), comment_id));
From SQL to CQL! 
@doanduyhai 
82 
De-normalized 
User 
1 
n 
Comment 
CREATE TABLE comments ( 
article_id uuid, 
comment_id timeuuid, 
author person, // person is UDT 
content text, 
PRIMARY KEY((article_id), comment_id));
Data modeling best practices! 
@doanduyhai 
83 
Start by queries 
• identify core functional read paths 
• 1 read scenario ≈ 1 SELECT
Data modeling best practices! 
@doanduyhai 
84 
Start by queries 
• identify core functional read paths 
• 1 read scenario ≈ 1 SELECT 
Denormalize 
• wisely, only duplicate necessary & immutable data 
• functional/technical trade-off
Data modeling best practices! 
@doanduyhai 
85 
Person UDT 
- firstname/lastname 
- date of birth 
- gender 
- mood 
- location
Data modeling best practices! 
@doanduyhai 
86 
John DOE, male 
birthdate: 21/02/1981 
subscribed since 03/06/2011 
☉ San Mateo, CA 
’’Impossible is not John DOE’’ 
Full detail read from 
User table on click
! " 
! 
Q & R
DSE (Datastax Enterprise)! 
@doanduyhai 
88 
Security 
Analytics (Spark & Hadoop) 
Search (Solr)
Use Cases! 
@doanduyhai 
89 
Messaging 
Collections/ 
Playlists 
Fraud 
detection 
Recommendation/ 
Personalization 
Internet of things/ 
Sensor data
CQL In Depth! 
Simple Table! 
Clustered Table!
Storage Engine! 
@doanduyhai 
91 
#partition1 
#col1 #col2 #col3 #col4 
cell1 cell2 cell3 cell4 
#partition2 
#col1 #col2 #col3 
cell1 cell2 cell3 
#partition3 
#col1 #col2 
cell1 cell2 
#partition4 
#col1 #col2 #col3 #col4 … 
cell1 cell2 cell3 cell4 … 
Partition Key 
Column Name 
Cell
Data Model Abstraction! 
@doanduyhai 
92 
Table ≈ Map<#p,SortedMap<#col,cell>> 
SortedMap<token,…>
Data Model Abstraction! 
@doanduyhai 
93 
Table ≈ Map<#p,SortedMap<#col,cell>> 
SortedMap<#col,cell>> 
! 
! 
SortedMap<token,…> 
Unicity 
Sort
Static Data Type! 
Partition Key Type Column Name Type Cell Type 
@doanduyhai 
94 
Table ≈ Map<#p,SortedMap<#col,cell>> 
Native types: bigint, blob, counter, decimal, double, float, inet, int, 
timestamp, timeuuid, uuid.
Simple Table Mapping! 
@doanduyhai 
95 
CREATE TABLE users ( 
login text, 
name text, 
age int, 
… 
PRIMARY KEY(login)); 
Map<login,SortedMap<column_label,value>>! 
text text blob
Simple Table Mapping! 
@doanduyhai 
96 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)!
Simple Table Mapping! 
@doanduyhai 
97 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)! 
Marker column
Simple Table Mapping! 
@doanduyhai 
98 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)! 
Sorted 
column_label
Simple Table Mapping! 
@doanduyhai 
99 
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); 
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); 
RowKey: jdoe 
=> (name=, value=, timestamp=1412419763515000) 
=> (name=age, value=00000021, timestamp=1412419763515000) 
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) 
RowKey: hsue 
=> (name=, value=, timestamp=1412419776578000) 
=> (name=age, value=0000001c, timestamp=1412419776578000) 
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)! 
Values 
as bytes
Clustered Table Mapping! 
@doanduyhai 
100 
CREATE TABLE daily_3g_quality_per_city ( 
operator text, 
city text, 
date int, //date as YYYYMMdd 
latency_ms int, 
power_watt double, 
PRIMARY KEY((operator), city, date); 
Map<operator, 
SortedMap<city, 
SortedMap<date, 
SortedMap<column_label,value>>>>!
Clustered Table Mapping! 
@doanduyhai 
101 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…)
Clustered Table Mapping! 
@doanduyhai 
102 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) 
Sort first by 
city
Clustered Table Mapping! 
@doanduyhai 
103 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) 
… then by 
date
Clustered Table Mapping! 
@doanduyhai 
104 
RowKey: verizon 
=> (name=Austin:20140910:, value=, timestamp=…) 
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) 
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) 
=> (name=Austin:20140911:, value=, timestamp=…) 
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) 
=> (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) 
=> (name=New York:20140913:, value=, timestamp=1412422893832000) 
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) 
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) 
=> (name=New York:20140917:, value=, timestamp=…) 
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…) 
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) 
… then by 
column_label
Query With Clustered Table! 
Select by operator and city for all dates 
Select by operator and city range for all dates 
@doanduyhai 
105 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’
Query With Clustered Table! 
Select by operator and city and date 
Select by operator and city and range of date 
@doanduyhai 
106 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ 
AND date >= 20140910 AND date <= 20140913
Query With Clustered Table! 
@doanduyhai 
107 
Select by operator and city and date tuples 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND city = ‘Austin’ 
AND date IN (20140910, 20140913)
Query With Clustered Table! 
@doanduyhai 
108 
Select by operator and date without city 
SELECT * FROM daily_3g_quality_per_city 
WHERE operator = ‘verizon’ AND date = 20140910 
Map<operator, 
SortedMap<city, 
SortedMap<date, 
SortedMap<column_label,value>>>>!
! " 
! 
Q & R
Thank You 
@doanduyhai 
duy_hai.doan@datastax.com 
https://academy.datastax.com/

Contenu connexe

Tendances

Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75Babak Farrokhi
 
Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016Jonathan Battiato
 
Kernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel reportKernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel reportAnne Nicolas
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGiuseppe Broccolo
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidFederico Campoli
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLFederico Campoli
 

Tendances (10)

Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75
 
Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016Postgresql on NFS - J.Battiato, pgday2016
Postgresql on NFS - J.Battiato, pgday2016
 
Kernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel reportKernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel report
 
Life With Perl
Life With PerlLife With Perl
Life With Perl
 
Gbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfsGbroccolo pgconfeu2016 pgnfs
Gbroccolo pgconfeu2016 pgnfs
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) Acid
 
Submission
SubmissionSubmission
Submission
 
Tainted LOB
Tainted LOBTainted LOB
Tainted LOB
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQL
 

En vedette

Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016Duyhai Doan
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChatDuyhai Doan
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data ModelingDuyhai Doan
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentationDuyhai Doan
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and librariesDuyhai Doan
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Duyhai Doan
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016Duyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUGDuyhai Doan
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016Duyhai Doan
 
Data stax academy
Data stax academyData stax academy
Data stax academyDuyhai Doan
 
Libon cassandra summiteu2014
Libon cassandra summiteu2014Libon cassandra summiteu2014
Libon cassandra summiteu2014Duyhai Doan
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaDuyhai Doan
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016Duyhai Doan
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemDuyhai Doan
 

En vedette (20)

Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChat
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data Modeling
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentation
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Cassandra introduction at FinishJUG
Cassandra introduction at FinishJUGCassandra introduction at FinishJUG
Cassandra introduction at FinishJUG
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
 
Data stax academy
Data stax academyData stax academy
Data stax academy
 
Libon cassandra summiteu2014
Libon cassandra summiteu2014Libon cassandra summiteu2014
Libon cassandra summiteu2014
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 

Similaire à Cassandra introduction mars jug

Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and dontsDuyhai Doan
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithmsDuyhai Doan
 
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...NoSQLmatters
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisDuyhai Doan
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and ToolsDuyhai Doan
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...hamidsamadi
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016Duyhai Doan
 
Stripe CTF3 wrap-up
Stripe CTF3 wrap-upStripe CTF3 wrap-up
Stripe CTF3 wrap-upStripe
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...DataStax
 
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...Codemotion
 

Similaire à Cassandra introduction mars jug (20)

Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and donts
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS Paris
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and Tools
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
 
Stripe CTF3 wrap-up
Stripe CTF3 wrap-upStripe CTF3 wrap-up
Stripe CTF3 wrap-up
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
 
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 

Plus de Duyhai Doan

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Duyhai Doan
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandraDuyhai Doan
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysDuyhai Doan
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentationDuyhai Doan
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDuyhai Doan
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016Duyhai Doan
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search rideDuyhai Doan
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Duyhai Doan
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsDuyhai Doan
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemDuyhai Doan
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDuyhai Doan
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Duyhai Doan
 

Plus de Duyhai Doan (14)

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandra
 
Big data 101 for beginners riga dev days
Big data 101 for beginners riga dev daysBig data 101 for beginners riga dev days
Big data 101 for beginners riga dev days
 
Datastax enterprise presentation
Datastax enterprise presentationDatastax enterprise presentation
Datastax enterprise presentation
 
Datastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basicsDatastax day 2016 : Cassandra data modeling basics
Datastax day 2016 : Cassandra data modeling basics
 
Apache cassandra in 2016
Apache cassandra in 2016Apache cassandra in 2016
Apache cassandra in 2016
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Sasi, cassandra on full text search ride
Sasi, cassandra on full text search rideSasi, cassandra on full text search ride
Sasi, cassandra on full text search ride
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized Views
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Cassandra introduction mars jug

  • 1. Introduction to Cassandra DuyHai DOAN, Technical Advocate @doanduyhai
  • 2. Shameless self-promotion! @doanduyhai 2 Duy Hai DOAN Cassandra technical advocate • talks, meetups, confs • open-source devs (Achilles, …) • OSS Cassandra point of contact ☞ duy_hai.doan@datastax.com • production troubleshooting
  • 3. Datastax! @doanduyhai 3 • Founded in April 2010 • We contribute a lot to Apache Cassandra™ • 400+ customers (25 of the Fortune 100), 200+ employees • Headquarter in San Francisco Bay area • EU headquarter in London, offices in France and Germany • Datastax Enterprise = OSS Cassandra + extra features
  • 4. Agenda! @doanduyhai 4 Architecture • Cluster, Replication, Consistency Data model • Last Write Win (LWW), CQL basics, From SQL to CQL Dev Center Demo DSE overview CQL In Depth (time permitted)
  • 5. Cassandra history! @doanduyhai 5 NoSQL database • created at Facebook • open-sourced since 2008 • current version = 2.1 • column-oriented ☞ distributed table
  • 6. Cassandra 5 key facts! @doanduyhai 6 Linear scalability C* C*C* NetcoSports 3 nodes, ≈3GB NetFlix 1k+ nodes, PB+ YOU
  • 7. Cassandra 5 key facts! @doanduyhai 7 Continuous availability (≈100% up-time) • resilient architecture (Dynamo)
  • 8. Rolling Upgrades! @doanduyhai 8 n1 n2 n3 n4 n5 n6 n7 n8 Live production
  • 9. Rolling Upgrades! @doanduyhai 9 n1 n2 n3 n4 n5 n6 n7 n8 Live production
  • 10. Rolling Upgrades! @doanduyhai 10 n1 n2 n3 n4 n5 n6 n7 n8 Live production
  • 11. Cassandra 5 key facts! @doanduyhai 11 Continuous availability (≈100% up-time) • resilient architecture (Dynamo) • rolling upgrades • data backward compatible n/n+1 versions
  • 12. Cassandra 5 key facts! @doanduyhai 12 Multi-data centers • out-of-the-box (config only) • AWS conf for multi-region DCs • GCE/CloudStack support
  • 13. Muti-DC usages! @doanduyhai 13 New York (DC1) London (DC2) Data-locality, disaster recovery n2 n3 n4 n5 n6 n7 n8 n1 n2 n3 n n4 5 n1 Async replication
  • 14. Muti-DC usages! @doanduyhai 14 Workload segregation/virtual DC n2 n3 n4 n5 n6 n7 n8 n1 n2 n3 n n4 5 n1 Production (Live) Analytics (Spark/Hadoop) Same DC
  • 15. Muti-DC usages! @doanduyhai 15 Prod data copy for testing/benchmarking n2 n3 n4 n5 n6 n7 n8 n1 n2 n1 n3 Use LOCAL consistency My tiny test cluster Data copy ❌ Never read back
  • 16. Cassandra 5 key facts! @doanduyhai 16 Operational simplicity • 1 node = 1 process + 1 config file • deployment automation • OpsCenter for monitoring
  • 17. Cassandra 5 key facts! @doanduyhai 17
  • 18. Cassandra 5 key facts! @doanduyhai 18 Analytics combo • Cassandra + Spark = awesome ! • realtime streaming
  • 19. Cassandra architecture! Cluster Replication Consistency
  • 20. Cassandra architecture! @doanduyhai 20 Cluster layer • Amazon DynamoDB paper • masterless architecture Data-store layer • Google Big Table paper • Columns/columns family
  • 21. Cassandra architecture! @doanduyhai 21 API (CQL & RPC) CLUSTER (DYNAMO) DATA STORE (BIG TABLES) DISKS Node1 Client request API (CQL & RPC) CLUSTER (DYNAMO) DATA STORE (BIG TABLES) DISKS Node2
  • 22. Data distribution! @doanduyhai 22 Random: hash of #partition → token = hash(#p) Hash: ]-X, X] X = huge number (264/2) n1 n2 n3 n4 n5 n6 n7 n8
  • 23. Token Ranges! @doanduyhai 23 A: ]0, X/8] B: ] X/8, 2X/8] C: ] 2X/8, 3X/8] D: ] 3X/8, 4X/8] E: ] 4X/8, 5X/8] F: ] 5X/8, 6X/8] G: ] 6X/8, 7X/8] H: ] 7X/8, X] n1 n2 n3 n4 n5 n6 n7 n8 A B C D E F G H
  • 24. Linear scalability! @doanduyhai 24 n1 n2 8 nodes 10 nodes n3 n4 n5 n6 n7 n8 n1 n2 n3 n4 n5 n6 n7 n9 n8 n10
  • 25. Failure tolerance! @doanduyhai 25 Replication Factor (RF) = 3 n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 {B, A, H} {C, B, A} {D, C, B} A B C D E F G H
  • 26. Coordinator node! Incoming requests (read/write) Coordinator node handles the request Every node can be coordinator àmasterless @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator request
  • 27. Consistency! @doanduyhai 27 Tunable at runtime • ONE • QUORUM (strict majority w.r.t. RF) • ALL Apply both to read & write
  • 28. Write consistency! Write ONE • write request to all replicas in // @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 29. Write consistency! Write ONE • write request to all replicas in // • wait for ONE ack before returning to client @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs
  • 30. Write consistency! Write ONE • write request to all replicas in // • wait for ONE ack before returning to client • other acks later, asynchronously @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs 10 μs 120 μs
  • 31. Write consistency! Write QUORUM • write request to all replicas in // • wait for QUORUM acks before returning to client • other acks later, asynchronously @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator 5 μs 10 μs 120 μs
  • 32. Read consistency! Read ONE • read from one node among all replicas @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 33. Read consistency! Read ONE • read from one node among all replicas • contact the fastest node (stats) @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 34. Read consistency! Read QUORUM • read from one fastest node @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 35. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other replicas to reach QUORUM @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 36. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other replicas to reach QUORUM • return most up-to-date data to client @doanduyhai n1 n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 37. Read consistency! Read QUORUM • read from one fastest node • AND request digest from other replicas to reach QUORUM • return most up-to-date data to client • repair if digest mismatch n1 @doanduyhai n2 n3 n4 n5 n6 n7 n8 1 2 3 coordinator
  • 39. Consistency in action! @doanduyhai 39 RF = 3, Write ONE, Read ONE B A A B A A Read ONE: A data replication in progress … Write ONE: B
  • 40. Consistency in action! @doanduyhai 40 RF = 3, Write ONE, Read QUORUM B A A Write ONE: B Read QUORUM: A data replication in progress … B A A
  • 41. Consistency in action! @doanduyhai 41 RF = 3, Write ONE, Read ALL B A A Read ALL: B data replication in progress … B A A Write ONE: B
  • 42. Consistency in action! @doanduyhai 42 RF = 3, Write QUORUM, Read ONE B B A Write QUORUM: B Read ONE: A data replication in progress … B B A
  • 43. Consistency in action! @doanduyhai 43 RF = 3, Write QUORUM, Read QUORUM B B A Read QUORUM: B data replication in progress … B B A Write QUORUM: B
  • 44. Consistency level! @doanduyhai 44 ONE Fast, may not read latest written value
  • 45. Consistency level! @doanduyhai 45 QUORUM Strict majority w.r.t. Replication Factor Good balance
  • 46. Consistency level! @doanduyhai 46 ALL Paranoid Slow, no high availability
  • 47. Consistency summary! ONERead + ONEWrite ☞ available for read/write even (N-1) replicas down QUORUMRead + QUORUMWrite ☞ available for read/write even 1+ replica down @doanduyhai 47
  • 48. ! " ! Q & R
  • 49. Data model! Cassandra Write Path! Last Write Win! CQL basics! From SQL to CQL!
  • 50. Cassandra Write Path! @doanduyhai 50 Commit log1 . . . 1 Commit log2 Commit logn Memory
  • 51. Cassandra Write Path! @doanduyhai 51 Memory MemTable Table1 Commit log1 . . . 1 Commit log2 Commit logn MemTable Table2 MemTable TableN 2 . . .
  • 52. Cassandra Write Path! @doanduyhai 52 Commit log1 Commit log2 Commit logn Table1 Table2 Table3 SStable2 SStable3 3 SStable1 Memory . . .
  • 53. Cassandra Write Path! @doanduyhai 53 MemTable . . . Memory Table1 Commit log1 Commit log2 Commit logn Table1 SStable1 Table2 Table3 SStable2 SStable3 MemTable Table2 MemTable TableN . . .
  • 54. Cassandra Write Path! @doanduyhai 54 Commit log1 Commit log2 SStable3 . . . Commit logn Table1 SStable1 Memory Table2 Table3 SStable2 SStable3 SStable1 SStable2
  • 55. Last Write Win (LWW)! INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); @doanduyhai 55 jdoe age name 33 John DOE #partition
  • 56. Last Write Win (LWW)! @doanduyhai jdoe age (t1) name (t1) 33 John DOE 56 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); auto-generated timestamp .
  • 57. Last Write Win (LWW)! @doanduyhai 57 UPDATE users SET age = 34 WHERE login = jdoe; jdoe SSTable1 SSTable2 age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 58. Last Write Win (LWW)! @doanduyhai 58 DELETE age FROM users WHERE login = jdoe; tombstone SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 59. Last Write Win (LWW)! @doanduyhai 59 SELECT age FROM users WHERE login = jdoe; ? ? ? SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 60. Last Write Win (LWW)! @doanduyhai 60 SELECT age FROM users WHERE login = jdoe; ✕ ✕ ✓ SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34
  • 61. Compaction! @doanduyhai 61 SSTable1 SSTable2 SSTable3 jdoe age (t3) ý jdoe age (t1) name (t1) 33 John DOE jdoe age (t2) 34 New SSTable jdoe age (t3) name (t1) ý John DOE
  • 62. Historical data! You want to keep data history ? • do not use internal generated timestamp !!! • ☞ time-series data modeling @doanduyhai 62 history id SSTable1 SSTable2 date1(t1) date2(t2) … date9(t9) … … … … id date10(t10) date11(t11) … … … … … …
  • 63. CRUD operations! @doanduyhai 63 INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33); UPDATE users SET age = 34 WHERE login = jdoe; DELETE age FROM users WHERE login = jdoe; SELECT age FROM users WHERE login = jdoe;
  • 64. Simple Table! @doanduyhai 64 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); partition key (#partition)
  • 65. Clustered table (1 – N)! @doanduyhai 65 CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id)); partition key clustering column (sorted) unicity
  • 66. Queries! @doanduyhai 66 Get message by user and message_id (date) SELECT * FROM mailbox WHERE login = jdoe and message_id = ‘2014-09-25 16:00:00’; Get message by user and date interval SELECT * FROM mailbox WHERE login = jdoe and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
  • 67. Queries! @doanduyhai 67 Get message by message_id only (#partition not provided) SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’; Get message by date interval (#partition not provided) SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
  • 68. Queries! Get message by user range (range query on #partition) Get message by user pattern (non exact match on #partition) @doanduyhai 68 SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe; SELECT * FROM mailbox WHERE login like ‘%doe%‘;
  • 69. WHERE clause restrictions! @doanduyhai 69 All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed • ☞ full cluster scan On clustering columns, only range queries (<, ≤, >, ≥) and exact match WHERE clause only possible on columns defined in PRIMARY KEY
  • 70. WHERE clause restrictions! @doanduyhai 70 What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields
  • 71. WHERE clause restrictions! @doanduyhai 71 What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields ☞ Apache Solr (Lucene) integration (Datastax Enterprise) SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’; SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
  • 72. Collections & maps! @doanduyhai 72 CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY(login));
  • 73. User Defined Type (UDT)! Instead of @doanduyhai 73 CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, … PRIMARY KEY(login));
  • 74. User Defined Type (UDT)! @doanduyhai 74 CREATE TYPE address ( street_number int, street_name text, postcode int, country text); CREATE TABLE users ( login text, … location frozen <address>, … PRIMARY KEY(login));
  • 75. UDT insert! @doanduyhai 75 INSERT INTO users(login,name, location) VALUES ( ‘jdoe’, ’John DOE’, { ‘street_number’: 124, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ });
  • 76. UDT update! @doanduyhai 76 UPDATE users set location = { ‘street_number’: 125, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ } WHERE login = jdoe; Can be nested ☞ store documents • but no dynamic fields (or use map<string, blob>)
  • 77. From SQL to CQL! @doanduyhai 77 Remember…
  • 78. From SQL to CQL! @doanduyhai 78 Remember… CQL is not SQL
  • 79. From SQL to CQL! @doanduyhai 79 Remember… there is no join (do you want to scale ?)
  • 80. From SQL to CQL! @doanduyhai 80 Remember… there is no integrity constraint (do you want to read-before-write ?)
  • 81. From SQL to CQL! @doanduyhai 81 Normalized User 1 n Comment CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_id text, // typical join id content text, PRIMARY KEY((article_id), comment_id));
  • 82. From SQL to CQL! @doanduyhai 82 De-normalized User 1 n Comment CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author person, // person is UDT content text, PRIMARY KEY((article_id), comment_id));
  • 83. Data modeling best practices! @doanduyhai 83 Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT
  • 84. Data modeling best practices! @doanduyhai 84 Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT Denormalize • wisely, only duplicate necessary & immutable data • functional/technical trade-off
  • 85. Data modeling best practices! @doanduyhai 85 Person UDT - firstname/lastname - date of birth - gender - mood - location
  • 86. Data modeling best practices! @doanduyhai 86 John DOE, male birthdate: 21/02/1981 subscribed since 03/06/2011 ☉ San Mateo, CA ’’Impossible is not John DOE’’ Full detail read from User table on click
  • 87. ! " ! Q & R
  • 88. DSE (Datastax Enterprise)! @doanduyhai 88 Security Analytics (Spark & Hadoop) Search (Solr)
  • 89. Use Cases! @doanduyhai 89 Messaging Collections/ Playlists Fraud detection Recommendation/ Personalization Internet of things/ Sensor data
  • 90. CQL In Depth! Simple Table! Clustered Table!
  • 91. Storage Engine! @doanduyhai 91 #partition1 #col1 #col2 #col3 #col4 cell1 cell2 cell3 cell4 #partition2 #col1 #col2 #col3 cell1 cell2 cell3 #partition3 #col1 #col2 cell1 cell2 #partition4 #col1 #col2 #col3 #col4 … cell1 cell2 cell3 cell4 … Partition Key Column Name Cell
  • 92. Data Model Abstraction! @doanduyhai 92 Table ≈ Map<#p,SortedMap<#col,cell>> SortedMap<token,…>
  • 93. Data Model Abstraction! @doanduyhai 93 Table ≈ Map<#p,SortedMap<#col,cell>> SortedMap<#col,cell>> ! ! SortedMap<token,…> Unicity Sort
  • 94. Static Data Type! Partition Key Type Column Name Type Cell Type @doanduyhai 94 Table ≈ Map<#p,SortedMap<#col,cell>> Native types: bigint, blob, counter, decimal, double, float, inet, int, timestamp, timeuuid, uuid.
  • 95. Simple Table Mapping! @doanduyhai 95 CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login)); Map<login,SortedMap<column_label,value>>! text text blob
  • 96. Simple Table Mapping! @doanduyhai 96 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)!
  • 97. Simple Table Mapping! @doanduyhai 97 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)! Marker column
  • 98. Simple Table Mapping! @doanduyhai 98 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)! Sorted column_label
  • 99. Simple Table Mapping! @doanduyhai 99 INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’); INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’); RowKey: jdoe => (name=, value=, timestamp=1412419763515000) => (name=age, value=00000021, timestamp=1412419763515000) => (name=name, value=4a6f686e20444f45, timestamp=1412419763515000) RowKey: hsue => (name=, value=, timestamp=1412419776578000) => (name=age, value=0000001c, timestamp=1412419776578000) => (name=name, value=48656c656e20535545, timestamp=1412419776578000)! Values as bytes
  • 100. Clustered Table Mapping! @doanduyhai 100 CREATE TABLE daily_3g_quality_per_city ( operator text, city text, date int, //date as YYYYMMdd latency_ms int, power_watt double, PRIMARY KEY((operator), city, date); Map<operator, SortedMap<city, SortedMap<date, SortedMap<column_label,value>>>>!
  • 101. Clustered Table Mapping! @doanduyhai 101 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…)
  • 102. Clustered Table Mapping! @doanduyhai 102 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) Sort first by city
  • 103. Clustered Table Mapping! @doanduyhai 103 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) … then by date
  • 104. Clustered Table Mapping! @doanduyhai 104 RowKey: verizon => (name=Austin:20140910:, value=, timestamp=…) => (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…) => (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…) => (name=Austin:20140911:, value=, timestamp=…) => (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…) => (name=Austin:20140911:power_watt, value=3ff6666666666666, timestamp=…) => (name=New York:20140913:, value=, timestamp=1412422893832000) => (name=New York:20140913:latency_ms, value=0000007b, timestamp=…) => (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…) => (name=New York:20140917:, value=, timestamp=…) => (name=New York:20140917:latency_ms, value=00000067, timestamp=…) => (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…) … then by column_label
  • 105. Query With Clustered Table! Select by operator and city for all dates Select by operator and city range for all dates @doanduyhai 105 SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’
  • 106. Query With Clustered Table! Select by operator and city and date Select by operator and city and range of date @doanduyhai 106 SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910 SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date >= 20140910 AND date <= 20140913
  • 107. Query With Clustered Table! @doanduyhai 107 Select by operator and city and date tuples SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date IN (20140910, 20140913)
  • 108. Query With Clustered Table! @doanduyhai 108 Select by operator and date without city SELECT * FROM daily_3g_quality_per_city WHERE operator = ‘verizon’ AND date = 20140910 Map<operator, SortedMap<city, SortedMap<date, SortedMap<column_label,value>>>>!
  • 109. ! " ! Q & R
  • 110. Thank You @doanduyhai duy_hai.doan@datastax.com https://academy.datastax.com/