2. Shameless self-promotion!
@doanduyhai
2
Duy Hai DOAN
Cassandra technical advocate
• talks, meetups, confs
• open-source devs (Achilles, …)
• OSS Cassandra point of contact
☞ duy_hai.doan@datastax.com
• production troubleshooting
3. Datastax!
@doanduyhai
3
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 200+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
4. Agenda!
@doanduyhai
4
Architecture
• Cluster, Replication, Consistency
Data model
• Last Write Win (LWW), CQL basics, From SQL to CQL
Dev Center Demo
DSE overview
CQL In Depth (time permitted)
5. Cassandra history!
@doanduyhai
5
NoSQL database
• created at Facebook
• open-sourced since 2008
• current version = 2.1
• column-oriented ☞ distributed table
6. Cassandra 5 key facts!
@doanduyhai
6
Linear scalability
C*
C*C*
NetcoSports
3 nodes, ≈3GB
NetFlix
1k+ nodes, PB+
YOU
13. Muti-DC usages!
@doanduyhai
13
New York (DC1)
London (DC2)
Data-locality, disaster recovery
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n n4 5
n1
Async
replication
14. Muti-DC usages!
@doanduyhai
14
Workload segregation/virtual DC
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n n4 5
n1
Production
(Live)
Analytics
(Spark/Hadoop)
Same DC
15. Muti-DC usages!
@doanduyhai
15
Prod data copy for testing/benchmarking
n2
n3
n4
n5
n6
n7
n8
n1
n2
n1 n3
Use
LOCAL
consistency
My tiny test
cluster
Data copy
❌
Never read
back
20. Cassandra architecture!
@doanduyhai
20
Cluster layer
• Amazon DynamoDB paper
• masterless architecture
Data-store layer
• Google Big Table paper
• Columns/columns family
21. Cassandra architecture!
@doanduyhai
21
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node1
Client request
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node2
22. Data distribution!
@doanduyhai
22
Random: hash of #partition → token = hash(#p)
Hash: ]-X, X]
X = huge number (264/2)
n1
n2
n3
n4
n5
n6
n7
n8
23. Token Ranges!
@doanduyhai
23
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
25. Failure tolerance!
@doanduyhai
25
Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H
26. Coordinator node!
Incoming requests (read/write)
Coordinator node handles the request
Every node can be coordinator àmasterless
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
request
27. Consistency!
@doanduyhai
27
Tunable at runtime
• ONE
• QUORUM (strict majority w.r.t. RF)
• ALL
Apply both to read & write
28. Write consistency!
Write ONE
• write request to all replicas in //
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
29. Write consistency!
Write ONE
• write request to all replicas in //
• wait for ONE ack before returning to
client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
30. Write consistency!
Write ONE
• write request to all replicas in //
• wait for ONE ack before returning to
client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs
31. Write consistency!
Write QUORUM
• write request to all replicas in //
• wait for QUORUM acks before
returning to client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs
32. Read consistency!
Read ONE
• read from one node among all replicas
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
33. Read consistency!
Read ONE
• read from one node among all replicas
• contact the fastest node (stats)
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
34. Read consistency!
Read QUORUM
• read from one fastest node
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
35. Read consistency!
Read QUORUM
• read from one fastest node
• AND request digest from other
replicas to reach QUORUM
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
36. Read consistency!
Read QUORUM
• read from one fastest node
• AND request digest from other
replicas to reach QUORUM
• return most up-to-date data to client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
37. Read consistency!
Read QUORUM
• read from one fastest node
• AND request digest from other
replicas to reach QUORUM
• return most up-to-date data to client
• repair if digest mismatch n1
@doanduyhai
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
47. Consistency summary!
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
@doanduyhai 47
55. Last Write Win (LWW)!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
@doanduyhai
55
jdoe
age
name
33 John DOE
#partition
56. Last Write Win (LWW)!
@doanduyhai
jdoe
age (t1) name (t1)
33 John DOE
56
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
auto-generated timestamp
.
57. Last Write Win (LWW)!
@doanduyhai
57
UPDATE users SET age = 34 WHERE login = jdoe;
jdoe
SSTable1 SSTable2
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
58. Last Write Win (LWW)!
@doanduyhai
58
DELETE age FROM users WHERE login = jdoe;
tombstone
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
59. Last Write Win (LWW)!
@doanduyhai
59
SELECT age FROM users WHERE login = jdoe;
? ? ?
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
60. Last Write Win (LWW)!
@doanduyhai
60
SELECT age FROM users WHERE login = jdoe;
✕ ✕ ✓
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
61. Compaction!
@doanduyhai
61
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE
62. Historical data!
You want to keep data history ?
• do not use internal generated timestamp !!!
• ☞ time-series data modeling
@doanduyhai
62
history
id
SSTable1 SSTable2
date1(t1) date2(t2) … date9(t9)
… … … …
id
date10(t10) date11(t11) …
…
… … … …
63. CRUD operations!
@doanduyhai
63
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
UPDATE users SET age = 34 WHERE login = jdoe;
DELETE age FROM users WHERE login = jdoe;
SELECT age FROM users WHERE login = jdoe;
66. Queries!
@doanduyhai
66
Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoe
and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval
SELECT * FROM mailbox WHERE login = jdoe
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
67. Queries!
@doanduyhai
67
Get message by message_id only (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval (#partition not provided)
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
68. Queries!
Get message by user range (range query on #partition)
Get message by user pattern (non exact match on #partition)
@doanduyhai
68
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe;
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
69. WHERE clause restrictions!
@doanduyhai
69
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
• ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match
WHERE clause only possible on columns defined in PRIMARY KEY
70. WHERE clause restrictions!
@doanduyhai
70
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
71. WHERE clause restrictions!
@doanduyhai
71
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
72. Collections & maps!
@doanduyhai
72
CREATE TABLE users (
login text,
name text,
age int,
friends set<text>,
hobbies list<text>,
languages map<int, text>,
…
PRIMARY KEY(login));
73. User Defined Type (UDT)!
Instead of
@doanduyhai
73
CREATE TABLE users (
login text,
…
street_number int,
street_name text,
postcode int,
country text,
…
PRIMARY KEY(login));
74. User Defined Type (UDT)!
@doanduyhai
74
CREATE TYPE address (
street_number int,
street_name text,
postcode int,
country text);
CREATE TABLE users (
login text,
…
location frozen <address>,
…
PRIMARY KEY(login));
76. UDT update!
@doanduyhai
76
UPDATE users set location =
{
‘street_number’: 125,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
}
WHERE login = jdoe;
Can be nested ☞ store documents
• but no dynamic fields (or use map<string, blob>)
78. From SQL to CQL!
@doanduyhai
78
Remember…
CQL is not SQL
79. From SQL to CQL!
@doanduyhai
79
Remember…
there is no join
(do you want to scale ?)
80. From SQL to CQL!
@doanduyhai
80
Remember…
there is no integrity constraint
(do you want to read-before-write ?)
81. From SQL to CQL!
@doanduyhai
81
Normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author_id text, // typical join id
content text,
PRIMARY KEY((article_id), comment_id));
82. From SQL to CQL!
@doanduyhai
82
De-normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author person, // person is UDT
content text,
PRIMARY KEY((article_id), comment_id));
83. Data modeling best practices!
@doanduyhai
83
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
84. Data modeling best practices!
@doanduyhai
84
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
Denormalize
• wisely, only duplicate necessary & immutable data
• functional/technical trade-off
85. Data modeling best practices!
@doanduyhai
85
Person UDT
- firstname/lastname
- date of birth
- gender
- mood
- location
86. Data modeling best practices!
@doanduyhai
86
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click
105. Query With Clustered Table!
Select by operator and city for all dates
Select by operator and city range for all dates
@doanduyhai
105
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’
106. Query With Clustered Table!
Select by operator and city and date
Select by operator and city and range of date
@doanduyhai
106
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
AND date >= 20140910 AND date <= 20140913
107. Query With Clustered Table!
@doanduyhai
107
Select by operator and city and date tuples
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
AND date IN (20140910, 20140913)
108. Query With Clustered Table!
@doanduyhai
108
Select by operator and date without city
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND date = 20140910
Map<operator,
SortedMap<city,
SortedMap<date,
SortedMap<column_label,value>>>>!