C*ollege Credit: What's New in Apache Cassandra 1.22. C* in a nutshell
• Massively scalable
• High performance
• Reliable/Available
©2012 DataStax 2
4. 1.2
• Concurrent schema • Atomic batches
changes
• CQL3
• Virtual nodes • Collections
• “Fat node” support • Data dictionary
• JBOD improvements • Tracing
• Off-heap bloom filters,
compression metadata
• Parallel leveled
compaction
©2012 DataStax
5. Concurrent Schema Changes
CREATE TABLE X;
...
DROP TABLE X;
Client
Cassandra
Cluster
Client
CREATE TABLE Y;
...
©2012 DataStax
DROP TABLE Y;
6. Virtual nodes
A C D
B E
A F
F B
P G
Ring without
Ring with vnodes
vnodes O H
E C N I
M J
D L K
©2012 DataStax
7. Virtual nodes
A C D
B E
A F
F B
P G
Ring without
Ring with vnodes
vnodes O H
E C N I
M J
D L K
©2012 DataStax
8. Virtual nodes
A C D
B E
A F
F B
P G
Ring without
Ring with vnodes
vnodes O H
E C N I
M J
D L K
©2012 DataStax
9. Node Rebuild without vnodes
Node 1 Node 2 Node 3
A B C
F E A F B A
A
F B
Ring without
vnodes
E C
D
D E F
C B D C E D
Node 4 Node 5 Node 6
©2012 DataStax
10. Node Rebuild with vnodes
Node 1 Node 2 Node 3
B E A P K G
G K M O C N
C D D J D H J F
B E
A F L A K F P I
P Ring with G
O VNodes H
N I M O E P H C
M J
L K I H I A B O
B L M C N E
F D G N J L
Node 4 Node 5 Node 6
©2012 DataStax
11. JBOD support
Cassandra
Instance
HDD1 HDD2 HDD3 HDD4
©2012 DataStax
12. JBOD support
Cassandra
Instance
HDD1
X
HDD2 HDD3 HDD4
©2012 DataStax
13. Moving O(n) structures off-heap
• Row (partition) bloom filter
• 1-2GB per billion rows
• Compression metadata
• ~20GB per TB compressed data
©2012 DataStax
14. On-Heap/Off-Heap
On-Heap Off-Heap
Managed by GC Not managed by GC
JVM Java Heap Native Memory
Java Process
©2012 DataStax
15. Batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Replica
©2012 DataStax
16. Batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Replica
©2012 DataStax
17. Batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Replica
©2012 DataStax
18. Batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Replica
©2012 DataStax
19. Batches
Partition
Replica
Client
X
Coordinator
Node
Partition
Replica
Partition
Replica
©2012 DataStax
20. Atomic batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Batchlog Replica
Node
©2012 DataStax
21. Atomic batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Batchlog Replica
Node
©2012 DataStax
22. Atomic batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Batchlog Replica
Node
©2012 DataStax
23. Atomic batches
Partition
Replica
Coordinator Partition
Client
Node Replica
Partition
Batchlog Replica
Node
©2012 DataStax
24. Atomic batches
Partition
Replica
Client
X
Coordinator
Node
Partition
Replica
Partition
Batchlog Replica
Node
©2012 DataStax
25. Atomic batches
Partition
Replica
Client
X
Coordinator
Node
Partition
Replica
Partition
Batchlog Replica
Node
©2012 DataStax
26. CQL: You got SQL in my NoSQL!
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE INDEX ON users(state);
SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
©2012 DataStax
27. Strictly “realtime” focused
• No joins
• No subqueries
• No aggregation functions* or GROUP BY
• Strictly limited ORDER BY
©2012 DataStax
28. songs
create column family songs
with key_validation_class = UUIDType
and comparator = UTF8Type -- cell names are strings
and column_metdata = [{column_name: title, validation_class: UTF8Type}
{column_name: album, validation_class: UTF8Type}
{column_name: artist, validation_class: UTF8Type
{column_name: data, validation_class: BytesType}
a3e64f8f... title: La Grange artist: ZZ Top album: Tres Hombres
8a172618... title: Moving in Stereo artist: Fu Manchu album: We Must Obey
2b09185b... title: Outside Woman Blues artist: Back Door Slam album: Roll Away
©2012 DataStax
29. CREATE TABLE songs (
id uuid PRIMARY KEY,
title text,
artist text,
album text,
data blob
);
id title artist album
a3e64f8f... La Grange ZZ Top Tres Hombres
8a172618... Moving in Stereo Fu Manchu We Must Obey
2b09185b... Outside Woman Blues Back Door Slam Roll Away
©2012 DataStax
30. song_tags
create column family song_tags
with key_validation_class = UUIDType
and comparator = UTF8Type;
a3e64f8f... blues: 1973:
8a172618... covers: 2003:
©2012 DataStax
31. CREATE TABLE song_tags (
id uuid,
tag_name text,
PRIMARY KEY (id, tag_name)
);
a3e64f8f... blues: 1973:
8a172618... covers: 2003:
id tag_name
a3e64f8f... blues
a3e64f8f... 1973
8a172618... covers
8a172618... 2003
©2012 DataStax
32. Easier way to add tags
ALTER TABLE songs ADD tags set<text>;
id title artist album tags
a3e64f8f... La Grange ZZ Top Tres Hombres {blues, 1973}
8a172618... Moving in Stereo Fu Manchu We Must Obey {covers, 2003}
2b09185b... Outside Woman Blues Back Door Slam Roll Away
©2012 DataStax
33. playlists
create column family playlists
with key_validation_class = UUIDType
and comparator = 'CompositeType(UTF8Type, UTF8Type, UTF8Type)'
and default_validation_class = UUIDType;
62c36092... La Grange, Moving in S..., Outside Wo...,
ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b...
Tres Hombres We Must O... Roll Away
©2012 DataStax
34. playlists
create column family playlists
with key_validation_class = UUIDType
and comparator = 'CompositeType(UTF8Type, UTF8Type, UTF8Type)'
and default_validation_class = UUIDType;
62c36092... La Grange, Moving in S..., Outside Wo...,
ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b...
Tres Hombres We Must O... Roll Away
©2012 DataStax
35. CREATE TABLE playlists (
id uuid,
title text,
album text,
artist text,
song_id uuid,
PRIMARY KEY (id, title, album, artist)
);
62c36092... La Grange, Moving in S..., Outside Wo...,
ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b...
Tres Hombres We Must O... Roll Away
id title artist album song_id
62c36092... La Grange ZZ Top Tres Hombres a3e64f8f...
62c36092... Moving in Stereo Fu Manchu We Must Obey 8a172618...
62c36092...
©2012 DataStax Outside Wo... Back Door Slam Roll Away 2b09185b...
36. Clustering
CREATE TABLE timeline ( user_id tweet_id _author _body
user_id uuid, jbellis 3290f9da.. rbranson lorem
tweet_id timeuuid,
jbellis 3895411a.. tjake ipsum
tweet_author uuid,
... ... ...
tweet_body text,
PRIMARY KEY (user_id, driftx 3290f9da.. rbranson lorem
tweet_id) driftx 71b46a84.. yzhang dolor
); ... ... ...
yukim 3290f9da.. rbranson lorem
SELECT * FROM timeline yukim e451dd42.. tjake amet
WHERE user_id = ’driftx’; ... ... ...
©2012 DataStax
37. Clustering
CREATE TABLE timeline ( user_id tweet_id _author _body
user_id uuid, jbellis 3290f9da.. rbranson lorem
tweet_id timeuuid,
jbellis 3895411a.. tjake ipsum
tweet_author uuid,
... ... ...
tweet_body text,
PRIMARY KEY (user_id, driftx 3290f9da.. rbranson lorem
tweet_id) driftx 71b46a84.. yzhang dolor
); ... ... ...
yukim 3290f9da.. rbranson lorem
SELECT * FROM timeline yukim e451dd42.. tjake amet
WHERE user_id = ’driftx’; ... ... ...
©2012 DataStax
38. Data dictionary
cqlsh:system> SELECT * FROM schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options
---------------+----------------+----------------+----------------------------
keyspace1 | True | SimpleStrategy | {"replication_factor":"1"}
system | True | LocalStrategy | {}
system_traces | True | SimpleStrategy | {"replication_factor":"1"}
©2012 DataStax
39. Data dictionary
cqlsh:system> SELECT * FROM schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options
---------------+----------------+----------------+----------------------------
keyspace1 | True | SimpleStrategy | {"replication_factor":"1"}
system | True | LocalStrategy | {}
system_traces | True | SimpleStrategy | {"replication_factor":"1"}
©2012 DataStax
40. Data dictionary
cqlsh:system> SELECT * FROM schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options
---------------+----------------+----------------+----------------------------
keyspace1 | True | SimpleStrategy | {"replication_factor":"1"}
system | True | LocalStrategy | {}
system_traces | True | SimpleStrategy | {"replication_factor":"1"}
cqlsh:system> SELECT * FROM schema_columnfamilies WHERE keyspace_name='keyspace1' AND
columnfamily_name='test';
©2012 DataStax
41. Data dictionary
cqlsh:system> SELECT * FROM schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options
---------------+----------------+----------------+----------------------------
keyspace1 | True | SimpleStrategy | {"replication_factor":"1"}
system | True | LocalStrategy | {}
system_traces | True | SimpleStrategy | {"replication_factor":"1"}
cqlsh:system> SELECT * FROM schema_columnfamilies WHERE keyspace_name='keyspace1' AND
columnfamily_name='test';
cqlsh:system> SELECT * FROM schema_columns WHERE keyspace_name='keyspace1' AND
columnfamily_name='test';
©2012 DataStax
42. Data dictionary
cqlsh:system> SELECT * FROM local;
key | bootstrapped | cluster_name | cql_version | data_center | gossip_generation |
partitioner | rack | release_version | ring_id
| thrift_version | tokens | truncated_at
-------+--------------+--------------+-------------+-------------+-------------------
+---------------------------------------------+-------+----------------------
+--------------------------------------+----------------+--------+--------------
local | COMPLETED | test | 3.0.0 | datacenter1 | 1352846064 |
org.apache.cassandra.dht.Murmur3Partitioner | rack1 | 1.2.0-beta2-SNAPSHOT |
224c55d5-21b4-42b0-8969-afc0cc04e812 | 19.35.0 | {0} | null
©2012 DataStax
43. Data dictionary
cqlsh:system> SELECT * FROM peers LIMIT 1;
peer | data_center | rack | release_version | ring_id
| rpc_address | schema_version | tokens
-----------+-------------+-------+----------------------
+--------------------------------------+-------------
+--------------------------------------+-----------------------
127.0.0.3 | datacenter1 | rack1 | 1.2.0-beta2-SNAPSHOT | f6782327-
ef8e-41cf-87b9-2edc287b1ffe | 127.0.0.3 | 915ed888-ddd0-3448-860c-582f4eea1bc6 |
{6148914691236517204}
©2012 DataStax
44. Request tracing
cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2);
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity | timestamp | source | source_elapsed
-------------------------------------+--------------+-----------+----------------
Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779
Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63
Applying mutation | 00:02:37,016 | 127.0.0.2 | 220
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250
Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277
Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888
Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550
©2012 DataStax
45. Tracing an antipattern
CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
©2012 DataStax
46. Tracing an antipattern
CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
©2012 DataStax
47. CREATE TABLE queues (
id text,
created_at timeuuid,
value blob,
PRIMARY KEY (id, created_at)
);
id created_at value
myqueue 3092e86f 9b0450d30de9
myqueue 0867f47c fc7aee5f6a66
myqueue 5fc74be0 668fdb3a2196
©2012 DataStax
48. cqlsh:foo> SELECT FROM queues WHERE id = 'myqueue' ORDER BY created_at LIMIT 1;
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity | timestamp | source | source_elapsed
------------------------------------------+--------------+-----------+---------------
execute_cql3_query | 19:31:05,650 | 127.0.0.1 | 0
Sending message to /127.0.0.3 | 19:31:05,651 | 127.0.0.1 | 541
Message received from /127.0.0.1 | 19:31:05,651 | 127.0.0.3 | 39
Executing single-partition query | 19:31:05,652 | 127.0.0.3 | 943
Acquiring sstable references | 19:31:05,652 | 127.0.0.3 | 973
Merging memtable contents | 19:31:05,652 | 127.0.0.3 | 1020
Merging data from memtables and sstables | 19:31:05,652 | 127.0.0.3 | 1081
Read 1 live cells and 100000 tombstoned | 19:31:05,686 | 127.0.0.3 | 35072
Enqueuing response to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35220
Sending message to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35314
Message received from /127.0.0.3 | 19:31:05,687 | 127.0.0.1 | 36908
Processing response from /127.0.0.3 | 19:31:05,688 | 127.0.0.1 | 37650
Request complete | 19:31:05,688 | 127.0.0.1 | 38047
©2012 DataStax