3. Why Cassandra?
- BigData!!!
- Volume (petabytes of data, trillions of entities)
- Velocity (real-time, streams, millions of transactions per second)
- Variety (un-, semi-, structured)
- writes are cheap, reads are ???
- near-linear horizontal scaling (in a proper use cases)
- fully distributed, with no single point of failure
- data replication by default
38. QDD - Physical model
- Technology dependent
- Analysis and validation (finding problems)
- Physical optimization (fixing problems)
- Data types
39. Physical storage
- Primary key
- Partition key
CREATE TABLE videos (
id int,
title text,
runtime int,
year int,
PRIMARY KEY (id)
);
id | title | runtime | year
----+---------------------+---------+------
1 | dzien swira | 93 | 2002
2 | chlopaki nie placza | 96 | 2000
3 | psy | 104 | 1992
4 | psy 2 | 96 | 1994
1
title runtime year
dzien swira 93 2002
2
title runtime year
chlopaki... 96 2000
3
title runtime year
psy 104 1992
4
title runtime year
psy 2 96 1994
SELECT FROM videos
WHERE title = ‘dzien swira’
40. Physical storage
CREATE TABLE videos_with_clustering (
title text,
runtime int,
year int,
PRIMARY KEY ((title), year)
);
- Primary key (could be compound)
- Partition key
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla
1954 runtime
98
1998 runtime
140
2014 runtime
123
1992 runtime
104
psy
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’;
SELECT FROM videos_with_clustering
WHERE title = ‘godzilla’ AND year > 1998;
41. Physical storage
CREATE TABLE videos_with_composite_pk(
title text,
runtime int,
year int,
PRIMARY KEY ((title, year))
);
- Primary key (could be compound)
- Partition key (could be composite)
- Clustering column (order, uniqueness)
title | year | runtime
-------------+------+---------
godzilla | 1954 | 98
godzilla | 1998 | 140
godzilla | 2014 | 123
psy | 1992 | 104
godzilla:1954
runtime
93
godzilla:1998
runtime
140
godzilla:2014
runtime
123
psy:1992
runtime
104
SELECT FROM videos_with_composite_pk
WHERE title = ‘godzilla’
AND year = 1954
42. Modeling - clustering column(s)
Q: Retrieve videos an actor has appeared in (newest first).
43. Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve videos an actor has appeared in (newest first).
44. Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
45. Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
46. Modeling - clustering column(s)
CREATE TABLE videos_by_actor (
actor text,
added_date timestamp,
video_id timeuuid,
character_name text,
description text,
encoding frozen<video_encoding>,
tags set<text>,
title text,
user_id uuid,
PRIMARY KEY ((actor), added_date, video_id, character_name)
) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
47. Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ( )
) WITH CLUSTERING ORDER BY ( );
Q: Retrieve last 1000 measurement from given day.
48. Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
49. Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id), date, event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
1 day = 86 400 rows
1 week = 604 800 rows
1 month = 2 592 000 rows
1 year = 31 536 000 rows
50. Modeling - compound partition key
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
51. Modeling - TTL
CREATE TABLE temperature_by_day (
weather_station_id text,
date text,
event_time timestamp,
temperature text
PRIMARY KEY ((weather_station_id, date), event_time)
) WITH CLUSTERING ORDER BY (event_time desc);
Retention policy - keep data only from last week.
INSERT INTO temperature_by_day … USING TTL 604800;
52. Modeling - bit map index
CREATE TABLE car (
year timestamp,
model text,
color timestamp,
vehicle_id int,
//other columns
PRIMARY KEY ((year, model, color), vehicle_id)
);
Q: Find car by year and/or model and/or color.
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);
SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
53. Modeling - wide rows
CREATE TABLE user (
email text,
name text,
age int,
PRIMARY KEY (email)
);
Q: Find user by email.
54. Modeling - wide rows
CREATE TABLE user (
domain text,
user text,
name text,
age int,
PRIMARY KEY ((domain), user)
);
Q: Find user by email.
55. Modeling - versioning with lightweight transactions
CREATE TABLE document (
id text,
content text,
version int,
locked_by text,
PRIMARY KEY ((id))
);
INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1)
IF NOT EXISTS;
UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;
UPDATE document SET content = 'better content', version = 2, locked_by = null
WHERE id = 'my doc' IF locked_by = 'andrzej';
56. Modeling - JSON with UDT and tuples
{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": “andrzej”,
"lastName": “ludwikowski”,
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
}
},
“x_dimension”: “1”,
“y_dimension”: “2”,
}
CREATE TYPE age (
description text,
type int,
minimum int
);
CREATE TYPE prop (
firstName text,
lastName text,
age frozen <age>
);
CREATE TABLE json (
title text,
type text,
properties list<frozen <prop>>,
dimensions tuple<int, int>
PRIMARY KEY (title)
);
57. Common use cases
- Sensor data (Zonar)
- Fraud detection (Barracuda)
- Playlist and collections (Spotify)
- Personalization and recommendation engines (Ebay)
- Messaging (Instagram)
- Event Sourcing!