2. @doanduyhai
Who Am I ?!
Duy Hai DOAN
Cassandra technical advocate
• talks, meetups, confs
• open-source devs (Achilles, …)
• OSS Cassandra point of contact
☞ duy_hai.doan@datastax.com
☞ @doanduyhai
2
3. @doanduyhai
Datastax!
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 200+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
11. @doanduyhai
Multi-DC usages!
Prod data copy for testing/benchmarking
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3n1
Use
LOCAL
consistency
My tiny test
cluster
Data copy
NEVER WRITE HERE !!!
11
34. @doanduyhai
Consistency summary!
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
34
42. @doanduyhai
Last Write Win (LWW)!
jdoe
age
name
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
#partition
42
43. @doanduyhai
Last Write Win (LWW)!
jdoe
age (t1) name (t1)
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
auto-generated timestamp
.
43
44. @doanduyhai
Last Write Win (LWW)!
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
44
45. @doanduyhai
Last Write Win (LWW)!
DELETE age FROM users WHERE login = ‘jdoe’;
jdoe
age (t3)
ý
tombstone
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
45
46. @doanduyhai
Last Write Win (LWW)!
SELECT age FROM users WHERE login = ‘jdoe’;
???
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
46
47. @doanduyhai
Last Write Win (LWW)!
SELECT age FROM users WHERE login = ‘jdoe’;
✓✕✕
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
47
49. @doanduyhai
Historical data!
history
id
date1(t1) date2(t2) … date9(t9)
… … … …
SSTable1 SSTable2
You want to keep data history ?
• do not use internal generated timestamp !!!
• ☞ time-series data modeling
id
date10(t10)date11(t11) …
…
… … … …
49
50. @doanduyhai
CRUD operations!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
DELETE age FROM users WHERE login = ‘jdoe’;
SELECT age FROM users WHERE login = ‘jdoe’;
50
52. @doanduyhai
What about joins ?!
How can I join data between tables ?
How can I model 1 – N relationships ?
How to model a mailbox ?
EmailsUser
1 n
52
55. @doanduyhai
Queries!
Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoe
and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval
SELECT * FROM mailbox WHERE login = jdoe
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
55
56. @doanduyhai
Queries!
Get message by message_id only ?
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only ?
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
❓
❓
56
57. @doanduyhai
Queries!
Get message by message_id only (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only (#partition not provided)
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;
57
61. @doanduyhai
Queries!
SELECT * FROM mailbox WHERE login >= ‘hsue’ and login <= ‘jdoe’;
Get message by user range (range query on #partition)
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
Get message by user pattern (non exact match on #partition)
61
62. @doanduyhai
WHERE clause restrictions!
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
• ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match
WHERE clause only possible
• on columns defined in PRIMARY KEY
• on indexed columns ( )
62
64. @doanduyhai
WHERE clause restrictions!
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
DO NOT RE-INVENT THE WHEEL !
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
64
65. @doanduyhai
WHERE clause restrictions!
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
DO NOT RE-INVENT THE WHEEL !
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
65
66. @doanduyhai
Collections & maps!
CREATE TABLE users (
login text,
name text,
age int,
friends set<text>,
hobbies list<text>,
languages map<int, text>,
…
PRIMARY KEY(login));
66
Keep the cardinality low ≈ 1000
67. @doanduyhai
User Defined Type (UDT)!
CREATE TABLE users (
login text,
…
street_number int,
street_name text,
postcode int,
country text,
…
PRIMARY KEY(login));
Instead of
67
68. @doanduyhai
User Defined Type (UDT)!
CREATE TYPE address (
street_number int,
street_name text,
postcode int,
country text);
CREATE TABLE users (
login text,
…
location frozen <address>,
…
PRIMARY KEY(login));
68
70. @doanduyhai
UDT update!
UPDATE users set location =
{
‘street_number’: 125,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
}
WHERE login = jdoe;
Can be nested ☞ store documents
• but no dynamic fields (or use map<text, blob>)
70
71. @doanduyhai
From SQL to CQL!
Normalized
Comment
User
1
n
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author_login text, // typical join id
content text,
PRIMARY KEY((article_id), comment_id));
71
72. @doanduyhai
From SQL to CQL
1 SELECT
- 10 last comments
- 10 author_login
What to do with 10 author_login ???
Comment
User
1
n
72
73. @doanduyhai
From SQL to CQL
1 SELECT
- 10 last comments
- 10 author_login
What to do with 10 author_login ???
10 extra SELECT → N+1 SELECT problem !
Comment
User
1
n
73
74. @doanduyhai
From SQL to CQL!
De-normalized
Comment
User
1
n
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author frozen<person>, // person is UDT
content text,
PRIMARY KEY((article_id), comment_id));
74
75. @doanduyhai
Data modeling best practices!
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
75
76. @doanduyhai
Data modeling best practices!
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
Denormalize
• wisely, only duplicate necessary & immutable data
• functional/technical trade-off
76
78. @doanduyhai
Data modeling best practices!
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click
78
80. @doanduyhai
Data modeling trade-off
2 strategies
• either accept to normalize some data (extra SELECT required)
• or de-normalize and update everywhere upon data mutation
80
81. @doanduyhai
Data modeling trade-off
2 strategies
• either accept to normalize some data (extra SELECT required)
• or de-normalize and update everywhere upon data mutation
But always keep those scenarios rare (5%-10% max), focus on the 90%
81
82. @doanduyhai
Data modeling trade-off
2 strategies
• either accept to normalize some data (extra SELECT required)
• or de-normalize and update everywhere upon data mutation
But always keep those scenarios rare (5%-10% max), focus on the 90%
Example: Twitter tweet deletion
82
84. @doanduyhai
Lightweight Transaction (LWT)!
What ? ☞ make operations linearizable
Why ? ☞ solve a class of race conditions in Cassandra that
would require installing an external lock manager
84
85. @doanduyhai
Lightweight Transaction (LWT)!
INSERT INTO account (id, email)
VALUES (‘jdoe’,
‘john_doe@fiction.com’);
SELECT * FROM account
WHERE id= ‘jdoe’;
(0 rows)
SELECT * FROM account
WHERE id= ‘jdoe’;
(0 rows)
INSERT INTO account (id, email)
VALUES (‘jdoe’,
‘jdoe@fiction.com’);
winner
85
86. @doanduyhai
Lightweight Transaction (LWT)!
How ? ☞ implementing Paxos protocol on Cassandra
Syntax ?
INSERT INTO account (id, email) VALUES (‘jdoe’, ‘john_doe@fiction.com’)
IF NOT EXISTS;
UPDATE account SET email = ‘jdoe@fiction.com’
IF email = ‘john_doe@fiction.com’ WHERE id=‘jdoe’;
86