SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Introduction to Data Modeling with
Apache Cassandra
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
1 Relational Modeling vs. Cassandra
2 The Basics
3 CQL Collections
4 Relationships
5 Time Series Use Case
2
Relational Modeling vs. Cassandra
3
The Good ol’ Relational Database
• Been around a long time (first proposed in 1970)
• Data modeling is well understood (typically 3NF or higher)
• ACID guarantees are easy for developers to reason about
• SQL is ubiquitous and allows flexible querying
– JOINs, Sub SELECTs, etc.
4
Relational Data Modeling
• Five normal forms
• Foreign Keys
• Joins at read time
– Example SQL: Get employee
and department for user id 5
(Helena Edelson)
Id First Last DeptId
1 Luke Tillman 201
2 Jon Haddad 201
5 Helena Edelson 205
5
Id Dept
201 Evangelists
205 Engineering
Employees
Departments
SELECT e.First, e.Last, d.Dept
FROM Employees e
JOIN Departments d
ON e.DeptId = d.Id
WHERE e.Id = 5
Relational Data Modeling Thought Process
6
Data
Models
Application
Cassandra Data Modeling Thought Process
7
Models
Application
Data
CQL vs SQL
• Similar syntax in many
cases, but...
• No Joins
• No Aggregations
Id First Last DeptId
1 Luke Tillman 201
2 Jon Haddad 201
5 Helena Edelson 205
8
Id Dept
201 Evangelists
205 Engineering
Employees
Departments
SELECT e.First, e.Last, d.Dept
FROM Employees e
JOIN Departments d
ON e.DeptId = d.Id
WHERE e.Id = 5
Denormalization
• Combine table columns into single view at write time
• No joins necessary
9
Id First Last Dept
1 Luke Tillman Evangelists
2 Jon Haddad Evangelists
5 Helena Edelson Engineering
Employees
SELECT First, Last, Dept
FROM Employees
WHERE Id = 5
Sequences and Auto-Incrementing Ids
• Great for letting the RDBMS handle auto-generating Ids
• Guaranteed to be unique
• Needs ACID to work (uh oh)
10
INSERT INTO Employees (Id, First, Last)
VALUES (seq.nextVal(), "Patrick", "McFadin")
No More Sequences
• Almost impossible in a distributed system like Cassandra
• Couple of great choices instead:
– Natural Keys: Unique values like Email
– Surrogate Key: UUID (or GUID for MS folks)
• UUID: Universally Unique Identifier
– 128-bit number represented in character form
– Can be generated easily on the client side
11
99051fe9-6a9c-46c2-b949-38ef78858dd0
The Basics
12
Cassandra Data Modeling Thought Process
• Start with your
application and the
queries it needs to
run
• Then build models to
satisfy those queries
13
Models
Application
Data
Entity Table
• Query: Find user by id
• Simple view of a single user
• UUID used for ID
• Simple primary key
14
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
SELECT firstname, lastname
FROM users
WHERE userid = 99051fe9-6a9c-46c2-b949-38ef78858dd0
Entity Table – A reminder on Partition Keys
• First part of Primary Key is the
Partition Key
15
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
firstname ...
Luke ...
Jon ...
Patrick ...
userid
689d56e5- …
93357d73- …
d978b136- …
More Complicated Primary Keys
• Query: Find comments for a video (most recent first)
16
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
SELECT commentid, userid, comment
FROM comments_by_video
WHERE videoid = 0fe6ab76-cf17-4664-abcc-4e363cee273f
LIMIT 10
Let's Break This Down
• TimeUUID: a UUID with a timestamp component
• Ordering by a TimeUUID is like ordering by its timestamp
17
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
eeaca440-c745-11e4-8830-0800200c9a6603/10/2015 16:53:09 GMT
Let's Break This Down
• The Primary Key uniquely identifies a row, so a comment is
uniquely identified by its videoid and commentid
18
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Let's Break This Down
• The first part of the Primary Key is the Partition Key, so
comments for a given video will be stored together in a partition
• When we query for a given videoid, we only need to talk to
one partition (and thus one node), which is fast
19
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Let's Break This Down
• The second part of the Primary Key is the Clustering Column(s)
• Inside a partition, comments for a given video will be ordered
by commentid
• Remember ordering by TimeUUID is ordering by timestamp
20
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Let's Break This Down
• We can specify a default clustering order when creating the
table which will affect the ordering of the data stored on disk
• Since our query was to get the latest comments for a video, we
order by commentid descending
21
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Let's Break This Down
22
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
videoid='0fe6a...'
userid=
'ac346...'
comment=
'Awesome!'
commentid='82be1...'
(10/1/2014 9:36AM)
userid=
'f89d3...'
comment=
'Garbage!'
commentid='765ac...'
(9/17/2014 7:55AM)
This query will be fast
23
videoid='0fe6a...'
userid=
'ac346...'
comment=
'Awesome!'
commentid='82be1...'
(10/1/2014 9:36AM)
userid=
'f89d3...'
comment=
'Garbage!'
commentid='765ac...'
(9/17/2014 7:55AM)
SELECT commentid, userid, comment
FROM comments_by_video
WHERE videoid = 0fe6ab76-cf17-4664-abcc-4e363cee273f
LIMIT 10
1. Locate
single
partition
2. Single seek
on disk
3. Slice 10 latest rows and return
Getting the most from queries
• Queries on Partition Key are fast
– Querying inside a single partition should be the goal
– Always specify a value for partition key when querying
• Queries on Partition Key and one or more Clustering Column(s)
are fast
– Again, inside a single partition should be the goal
– Use default ordering when creating the table to optimize if applicable
• Cassandra will give you errors if you try to stray
24
More than one way to query the same data
• New Query: Find comments made by a user (most recent first)
25
CREATE TABLE comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY (userid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
SELECT commentid, videoid, comment
FROM comments_by_user
WHERE userid = 99051fe9-6a9c-46c2-b949-38ef78858dd0
LIMIT 10
More than one way to query the same data
• Two views of the same data
• Use a batch when inserting to both tables
• Denormalize at write time to do efficient queries at read time
26
CREATE TABLE comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY (
userid, commentid)
) WITH CLUSTERING ORDER BY (
commentid DESC);
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (
videoid, commentid)
) WITH CLUSTERING ORDER BY (
commentid DESC);
CQL Collections
27
CQL Collection Basics
• Store a collection of related things in a column
• Meant to be dynamic part of a table
• Update syntax is very different from insert
• Reads require all of the collection to be read
28
CQL Set
• No duplicates, sorted by CQL type's comparator
29
INSERT INTO collections_example (id, set_example)
VALUES (1, {'Patrick', 'Jon', 'Luke'});
set_example set<text>
Collection name
(column name)
Collection type CQL type
CQL Set
• Adding an element to a set
• Removing an element from a set
30
UPDATE collections_example
SET set_example = set_example + {'Rebecca'}
WHERE id = 1
UPDATE collections_example
SET set_example = set_example - {'Luke'}
WHERE id = 1
CQL List
• Allows duplicates, sorted by insertion order
• Use with caution
31
INSERT INTO collections_example (id, list_example)
VALUES (1, ['Patrick', 'Jon', 'Luke']);
list_example list<text>
Collection name
(column name)
Collection type CQL type
CQL List
• Adding an element to the end of a list
• Adding an element to the beginning of a list
• Removing an element from a list
32
UPDATE collections_example
SET list_example = list_example + ['Rebecca']
WHERE id = 1
UPDATE collections_example
SET list_example = ['Rebecca'] + list_example
WHERE id = 1
UPDATE collections_example
SET list_example = list_example - ['Luke']
WHERE id = 1
CQL Map
• Key and value, sorted by key's CQL type comparator
33
INSERT INTO collections_example (id, map_example)
VALUES (1, { 'Patrick' : 72, 'Jon' : 33, 'Luke' : 34 });
map_example map<text, int>
Collection name
(column name)
Collection type Key CQL type Value CQL type
CQL Map
• Adding an element to a map
• Updating an existing element in a map
• Removing an element from a map
34
UPDATE collections_example
SET map_example['Rebecca'] = 29
WHERE id = 1
UPDATE collections_example
SET map_example['Jon'] = 34
WHERE id = 1
DELETE map_example['Luke']
FROM collections_example
WHERE id = 1
Relationships
35
Revisiting our One-to-Many Relationship
36
Id First Last DeptId
7bc7a... Luke Tillman 5078c...
d7463... Jon Haddad 5078c...
8c26b... Helena Edelson 1d0f3...
Id Dept
5078c... Evangelists
1d0f3... Engineering
EmployeesDepartments
Department Employeehas
n1
Revisiting our One-to-Many Relationship
• Query: Get an employee and
his/her department by
employee id
– Denormalize department data
37
First Last Dept
Luke Tillman Evangelists
Jon Haddad Evangelists
Helena Edelson Engineering
Id
7bc7a...
d7463...
8c26b...
Employees
CREATE TABLE employees (
id uuid,
first text,
last text,
dept text,
PRIMARY KEY (id)
);
SELECT first, last, dept
FROM employees
WHERE id = 7bc7a...
What about the other side of the relationship?
• Query: Get all the employees for a given department
38
CREATE TABLE employees_by_dept (
dept_id uuid,
emp_id uuid,
first text,
last text,
dept text,
PRIMARY KEY (dept_id, emp_id)
);
SELECT first, last, dept
FROM employees_by_dept
WHERE dept_id = 5078c...
What about the other side of the relationship?
39
CREATE TABLE employees_by_dept (
dept_id uuid,
emp_id uuid,
first text,
last text,
dept text,
PRIMARY KEY (dept_id, emp_id)
);
dept_id=
'5078c...'
emp_id='7bc7a...'
dept=
'Evangelists'
first=
'Luke'
last=
'Tillman'
emp_id='d7463...'
dept=
'Evangelists'
first=
'Jon'
last=
'Haddad'
Static Columns
• Department name (dept)
will be the same across all
rows in the partition
• This is a good candidate
for a static column
40
CREATE TABLE employees_by_dept (
dept_id uuid,
emp_id uuid,
first text,
last text,
dept text,
PRIMARY KEY (dept_id, emp_id)
);
dept_id=
'5078c...'
emp_id='7bc7a...'
dept=
'Evangelists'
first=
'Luke'
last=
'Tillman'
emp_id='d7463...'
dept=
'Evangelists'
first=
'Jon'
last=
'Haddad'
Static Columns
• For data that is shared across
all rows in a partition, use
static columns
• Updates to the value will
affect all rows in the partition
41
CREATE TABLE employees_by_dept (
dept_id uuid,
emp_id uuid,
first text,
last text,
dept text STATIC,
PRIMARY KEY (dept_id, emp_id)
);
dept_id=
'5078c...'
dept=
'Evangelists'
emp_id='7bc7a...'
first=
'Luke'
last=
'Tillman'
emp_id='d7463...'
first=
'Jon'
last=
'Haddad'
Time Series Use Case
42
Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
43
Weather Station
Needed Queries
• Get all data for one weather
station
• Get data for a single date
and time
• Get data for a range of dates
and times
Data Model for Queries
• Store data per weather
station
• Store time series in order:
first to last
44
Weather Station
• Weather station id and
time are unique
• Store as many as needed
45
CREATE TABLE temperatures (
weather_station text,
year int,
month int,
day int,
hour int,
temperature double,
PRIMARY KEY (
weather_station, year, month, day, hour)
);
INSERT INTO temperatures (weather_station, year, month, day, hour, temperature)
VALUES ('10010:99999', 2005, 12, 1, 7, -5.6);
INSERT INTO temperatures (weather_station, year, month, day, hour, temperature)
VALUES ('10010:99999', 2005, 12, 1, 8, -5.1);
INSERT INTO temperatures (weather_station, year, month, day, hour, temperature)
VALUES ('10010:99999', 2005, 12, 1, 9, -4.9);
INSERT INTO temperatures (weather_station, year, month, day, hour, temperature)
VALUES ('10010:99999', 2005, 12, 1, 10, -5.3);
Storage Model: Logical View
46
SELECT weather_station, hour, temperature
FROM temperatures
WHERE weather_station = '10010:99999'
10010:99999
10010:99999
10010:99999
10010:99999
weather_station
7
8
9
10
hour
-5.6
-5.1
-4.9
-5.3
temperature
Storage Model: Disk Layout
47
SELECT weather_station, hour, temperature
FROM temperatures
WHERE weather_station = '10010:99999'
10010:99999
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
2005:12:1:10
-5.3
Storage Model: Disk Layout
48
SELECT weather_station, hour, temperature
FROM temperatures
WHERE weather_station = '10010:99999'
10010:99999
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
2005:12:1:10
-5.3
2005:12:1:11
Merged, Sorted, and Stored Sequentially
Query Patterns
• Range queries
• "Slice" operation on disk
49
SELECT weather_station, hour, temperature
FROM temperatures
WHERE weather_station = '10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10
10010:99999
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
2005:12:1:10
-5.3
2005:12:1:11
Partition key for locality
Single seek on disk
Query Patterns
50
• Range queries
• "Slice" operation on disk
10010:99999
10010:99999
10010:99999
10010:99999
weather_station hour temperature
7
8
9
10
-5.6
-5.1
-4.9
-5.3
SELECT weather_station, hour, temperature
FROM temperatures
WHERE weather_station = '10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10
Query Patterns
51
• Programmers like this
10010:99999
10010:99999
10010:99999
10010:99999
weather_station hour temperature
7
8
9
10
-5.6
-5.1
-4.9
-5.3
SELECT weather_station, hour, temperature
FROM temperatures
WHERE weather_station = '10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10
Sorted in
time order
Takeaway: Goals of Cassandra Data Modeling
• Spread data evenly around the cluster
– Choose a good Primary Key (particularly, the Partition Key portion)
• Minimize the number of partitions read for a given query
– Remember: Partitions are spread out around the cluster
• Do not worry about:
– Minimizing the number of writes: Cassandra is really fast at writes
– Minimizing data duplication: this is not 3NF from RDBMS, disk is cheap
52
Questions?
Follow me for updates or to ask questions later: @LukeTillman
53

Contenu connexe

Tendances

Cassandra Summit 2014: Real Data Models of Silicon Valley
Cassandra Summit 2014: Real Data Models of Silicon ValleyCassandra Summit 2014: Real Data Models of Silicon Valley
Cassandra Summit 2014: Real Data Models of Silicon ValleyDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Via forensics icloud-keychain_passwords_13
Via forensics icloud-keychain_passwords_13Via forensics icloud-keychain_passwords_13
Via forensics icloud-keychain_passwords_13viaForensics
 
Integrating OpenStack with Active Directory
Integrating OpenStack with Active DirectoryIntegrating OpenStack with Active Directory
Integrating OpenStack with Active Directorycjellick
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Neo4j
 
Keystone deep dive 1
Keystone deep dive 1Keystone deep dive 1
Keystone deep dive 1Jsonr4
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...All Things Open
 
Improving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & MigrationsImproving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & MigrationsTim Donohue
 
10 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 2015
10 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 201510 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 2015
10 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 2015Scott Sutherland
 
DataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with JavaDataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with Javacarolinedatastax
 
Capture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninjaCapture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninjaVito Flavio Lorusso
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucidworks
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Libraryrajivkumarmca
 
Async servers and clients in Rest.li
Async servers and clients in Rest.liAsync servers and clients in Rest.li
Async servers and clients in Rest.liKaran Parikh
 
DSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & ConfigurationDSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & ConfigurationDuraSpace
 
Keystone - Openstack Identity Service
Keystone - Openstack Identity Service Keystone - Openstack Identity Service
Keystone - Openstack Identity Service Prasad Mukhedkar
 

Tendances (20)

Cassandra Summit 2014: Real Data Models of Silicon Valley
Cassandra Summit 2014: Real Data Models of Silicon ValleyCassandra Summit 2014: Real Data Models of Silicon Valley
Cassandra Summit 2014: Real Data Models of Silicon Valley
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Via forensics icloud-keychain_passwords_13
Via forensics icloud-keychain_passwords_13Via forensics icloud-keychain_passwords_13
Via forensics icloud-keychain_passwords_13
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Integrating OpenStack with Active Directory
Integrating OpenStack with Active DirectoryIntegrating OpenStack with Active Directory
Integrating OpenStack with Active Directory
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0
 
Keystone deep dive 1
Keystone deep dive 1Keystone deep dive 1
Keystone deep dive 1
 
Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...Leveraging Open Source for Database Development: Database Version Control wit...
Leveraging Open Source for Database Development: Database Version Control wit...
 
Improving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & MigrationsImproving DSpace Backups, Restores & Migrations
Improving DSpace Backups, Restores & Migrations
 
10 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 2015
10 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 201510 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 2015
10 Deadly Sins of SQL Server Configuration - APPSEC CALIFORNIA 2015
 
DataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with JavaDataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with Java
 
Hadoop Hive
Hadoop HiveHadoop Hive
Hadoop Hive
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Capture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninjaCapture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninja
 
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, LucidworksLucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
Lucene/Solr 8: The Next Major Release Steve Rowe, Lucidworks
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Library
 
Async servers and clients in Rest.li
Async servers and clients in Rest.liAsync servers and clients in Rest.li
Async servers and clients in Rest.li
 
DSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & ConfigurationDSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & Configuration
 
Keystone - Openstack Identity Service
Keystone - Openstack Identity Service Keystone - Openstack Identity Service
Keystone - Openstack Identity Service
 

En vedette

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraLuke Tillman
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Luke Tillman
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraLuke Tillman
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraLuke Tillman
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Luke Tillman
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
 

En vedette (7)

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
 

Similaire à Introduction to Data Modeling with Apache Cassandra

Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101DataStax Academy
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101DataStax Academy
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101DataStax Academy
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerDataStax
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Implementing Tables and Views.pptx
Implementing Tables and Views.pptxImplementing Tables and Views.pptx
Implementing Tables and Views.pptxLuisManuelUrbinaAmad
 
Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!Christopher Batey
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Manchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroChristopher Batey
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupChristopher Batey
 

Similaire à Introduction to Data Modeling with Apache Cassandra (20)

Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Apache Cassandra & Data Modeling
Apache Cassandra & Data ModelingApache Cassandra & Data Modeling
Apache Cassandra & Data Modeling
 
Implementing Tables and Views.pptx
Implementing Tables and Views.pptxImplementing Tables and Views.pptx
Implementing Tables and Views.pptx
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Manchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra Intro
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Rdbms day3
Rdbms day3Rdbms day3
Rdbms day3
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
 

Dernier

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 

Dernier (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Introduction to Data Modeling with Apache Cassandra

  • 1. Introduction to Data Modeling with Apache Cassandra Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. 1 Relational Modeling vs. Cassandra 2 The Basics 3 CQL Collections 4 Relationships 5 Time Series Use Case 2
  • 4. The Good ol’ Relational Database • Been around a long time (first proposed in 1970) • Data modeling is well understood (typically 3NF or higher) • ACID guarantees are easy for developers to reason about • SQL is ubiquitous and allows flexible querying – JOINs, Sub SELECTs, etc. 4
  • 5. Relational Data Modeling • Five normal forms • Foreign Keys • Joins at read time – Example SQL: Get employee and department for user id 5 (Helena Edelson) Id First Last DeptId 1 Luke Tillman 201 2 Jon Haddad 201 5 Helena Edelson 205 5 Id Dept 201 Evangelists 205 Engineering Employees Departments SELECT e.First, e.Last, d.Dept FROM Employees e JOIN Departments d ON e.DeptId = d.Id WHERE e.Id = 5
  • 6. Relational Data Modeling Thought Process 6 Data Models Application
  • 7. Cassandra Data Modeling Thought Process 7 Models Application Data
  • 8. CQL vs SQL • Similar syntax in many cases, but... • No Joins • No Aggregations Id First Last DeptId 1 Luke Tillman 201 2 Jon Haddad 201 5 Helena Edelson 205 8 Id Dept 201 Evangelists 205 Engineering Employees Departments SELECT e.First, e.Last, d.Dept FROM Employees e JOIN Departments d ON e.DeptId = d.Id WHERE e.Id = 5
  • 9. Denormalization • Combine table columns into single view at write time • No joins necessary 9 Id First Last Dept 1 Luke Tillman Evangelists 2 Jon Haddad Evangelists 5 Helena Edelson Engineering Employees SELECT First, Last, Dept FROM Employees WHERE Id = 5
  • 10. Sequences and Auto-Incrementing Ids • Great for letting the RDBMS handle auto-generating Ids • Guaranteed to be unique • Needs ACID to work (uh oh) 10 INSERT INTO Employees (Id, First, Last) VALUES (seq.nextVal(), "Patrick", "McFadin")
  • 11. No More Sequences • Almost impossible in a distributed system like Cassandra • Couple of great choices instead: – Natural Keys: Unique values like Email – Surrogate Key: UUID (or GUID for MS folks) • UUID: Universally Unique Identifier – 128-bit number represented in character form – Can be generated easily on the client side 11 99051fe9-6a9c-46c2-b949-38ef78858dd0
  • 13. Cassandra Data Modeling Thought Process • Start with your application and the queries it needs to run • Then build models to satisfy those queries 13 Models Application Data
  • 14. Entity Table • Query: Find user by id • Simple view of a single user • UUID used for ID • Simple primary key 14 CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); SELECT firstname, lastname FROM users WHERE userid = 99051fe9-6a9c-46c2-b949-38ef78858dd0
  • 15. Entity Table – A reminder on Partition Keys • First part of Primary Key is the Partition Key 15 CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); firstname ... Luke ... Jon ... Patrick ... userid 689d56e5- … 93357d73- … d978b136- …
  • 16. More Complicated Primary Keys • Query: Find comments for a video (most recent first) 16 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); SELECT commentid, userid, comment FROM comments_by_video WHERE videoid = 0fe6ab76-cf17-4664-abcc-4e363cee273f LIMIT 10
  • 17. Let's Break This Down • TimeUUID: a UUID with a timestamp component • Ordering by a TimeUUID is like ordering by its timestamp 17 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); eeaca440-c745-11e4-8830-0800200c9a6603/10/2015 16:53:09 GMT
  • 18. Let's Break This Down • The Primary Key uniquely identifies a row, so a comment is uniquely identified by its videoid and commentid 18 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 19. Let's Break This Down • The first part of the Primary Key is the Partition Key, so comments for a given video will be stored together in a partition • When we query for a given videoid, we only need to talk to one partition (and thus one node), which is fast 19 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 20. Let's Break This Down • The second part of the Primary Key is the Clustering Column(s) • Inside a partition, comments for a given video will be ordered by commentid • Remember ordering by TimeUUID is ordering by timestamp 20 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 21. Let's Break This Down • We can specify a default clustering order when creating the table which will affect the ordering of the data stored on disk • Since our query was to get the latest comments for a video, we order by commentid descending 21 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 22. Let's Break This Down 22 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); videoid='0fe6a...' userid= 'ac346...' comment= 'Awesome!' commentid='82be1...' (10/1/2014 9:36AM) userid= 'f89d3...' comment= 'Garbage!' commentid='765ac...' (9/17/2014 7:55AM)
  • 23. This query will be fast 23 videoid='0fe6a...' userid= 'ac346...' comment= 'Awesome!' commentid='82be1...' (10/1/2014 9:36AM) userid= 'f89d3...' comment= 'Garbage!' commentid='765ac...' (9/17/2014 7:55AM) SELECT commentid, userid, comment FROM comments_by_video WHERE videoid = 0fe6ab76-cf17-4664-abcc-4e363cee273f LIMIT 10 1. Locate single partition 2. Single seek on disk 3. Slice 10 latest rows and return
  • 24. Getting the most from queries • Queries on Partition Key are fast – Querying inside a single partition should be the goal – Always specify a value for partition key when querying • Queries on Partition Key and one or more Clustering Column(s) are fast – Again, inside a single partition should be the goal – Use default ordering when creating the table to optimize if applicable • Cassandra will give you errors if you try to stray 24
  • 25. More than one way to query the same data • New Query: Find comments made by a user (most recent first) 25 CREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); SELECT commentid, videoid, comment FROM comments_by_user WHERE userid = 99051fe9-6a9c-46c2-b949-38ef78858dd0 LIMIT 10
  • 26. More than one way to query the same data • Two views of the same data • Use a batch when inserting to both tables • Denormalize at write time to do efficient queries at read time 26 CREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY ( userid, commentid) ) WITH CLUSTERING ORDER BY ( commentid DESC); CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY ( videoid, commentid) ) WITH CLUSTERING ORDER BY ( commentid DESC);
  • 28. CQL Collection Basics • Store a collection of related things in a column • Meant to be dynamic part of a table • Update syntax is very different from insert • Reads require all of the collection to be read 28
  • 29. CQL Set • No duplicates, sorted by CQL type's comparator 29 INSERT INTO collections_example (id, set_example) VALUES (1, {'Patrick', 'Jon', 'Luke'}); set_example set<text> Collection name (column name) Collection type CQL type
  • 30. CQL Set • Adding an element to a set • Removing an element from a set 30 UPDATE collections_example SET set_example = set_example + {'Rebecca'} WHERE id = 1 UPDATE collections_example SET set_example = set_example - {'Luke'} WHERE id = 1
  • 31. CQL List • Allows duplicates, sorted by insertion order • Use with caution 31 INSERT INTO collections_example (id, list_example) VALUES (1, ['Patrick', 'Jon', 'Luke']); list_example list<text> Collection name (column name) Collection type CQL type
  • 32. CQL List • Adding an element to the end of a list • Adding an element to the beginning of a list • Removing an element from a list 32 UPDATE collections_example SET list_example = list_example + ['Rebecca'] WHERE id = 1 UPDATE collections_example SET list_example = ['Rebecca'] + list_example WHERE id = 1 UPDATE collections_example SET list_example = list_example - ['Luke'] WHERE id = 1
  • 33. CQL Map • Key and value, sorted by key's CQL type comparator 33 INSERT INTO collections_example (id, map_example) VALUES (1, { 'Patrick' : 72, 'Jon' : 33, 'Luke' : 34 }); map_example map<text, int> Collection name (column name) Collection type Key CQL type Value CQL type
  • 34. CQL Map • Adding an element to a map • Updating an existing element in a map • Removing an element from a map 34 UPDATE collections_example SET map_example['Rebecca'] = 29 WHERE id = 1 UPDATE collections_example SET map_example['Jon'] = 34 WHERE id = 1 DELETE map_example['Luke'] FROM collections_example WHERE id = 1
  • 36. Revisiting our One-to-Many Relationship 36 Id First Last DeptId 7bc7a... Luke Tillman 5078c... d7463... Jon Haddad 5078c... 8c26b... Helena Edelson 1d0f3... Id Dept 5078c... Evangelists 1d0f3... Engineering EmployeesDepartments Department Employeehas n1
  • 37. Revisiting our One-to-Many Relationship • Query: Get an employee and his/her department by employee id – Denormalize department data 37 First Last Dept Luke Tillman Evangelists Jon Haddad Evangelists Helena Edelson Engineering Id 7bc7a... d7463... 8c26b... Employees CREATE TABLE employees ( id uuid, first text, last text, dept text, PRIMARY KEY (id) ); SELECT first, last, dept FROM employees WHERE id = 7bc7a...
  • 38. What about the other side of the relationship? • Query: Get all the employees for a given department 38 CREATE TABLE employees_by_dept ( dept_id uuid, emp_id uuid, first text, last text, dept text, PRIMARY KEY (dept_id, emp_id) ); SELECT first, last, dept FROM employees_by_dept WHERE dept_id = 5078c...
  • 39. What about the other side of the relationship? 39 CREATE TABLE employees_by_dept ( dept_id uuid, emp_id uuid, first text, last text, dept text, PRIMARY KEY (dept_id, emp_id) ); dept_id= '5078c...' emp_id='7bc7a...' dept= 'Evangelists' first= 'Luke' last= 'Tillman' emp_id='d7463...' dept= 'Evangelists' first= 'Jon' last= 'Haddad'
  • 40. Static Columns • Department name (dept) will be the same across all rows in the partition • This is a good candidate for a static column 40 CREATE TABLE employees_by_dept ( dept_id uuid, emp_id uuid, first text, last text, dept text, PRIMARY KEY (dept_id, emp_id) ); dept_id= '5078c...' emp_id='7bc7a...' dept= 'Evangelists' first= 'Luke' last= 'Tillman' emp_id='d7463...' dept= 'Evangelists' first= 'Jon' last= 'Haddad'
  • 41. Static Columns • For data that is shared across all rows in a partition, use static columns • Updates to the value will affect all rows in the partition 41 CREATE TABLE employees_by_dept ( dept_id uuid, emp_id uuid, first text, last text, dept text STATIC, PRIMARY KEY (dept_id, emp_id) ); dept_id= '5078c...' dept= 'Evangelists' emp_id='7bc7a...' first= 'Luke' last= 'Tillman' emp_id='d7463...' first= 'Jon' last= 'Haddad'
  • 42. Time Series Use Case 42
  • 43. Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence 43
  • 44. Weather Station Needed Queries • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times Data Model for Queries • Store data per weather station • Store time series in order: first to last 44
  • 45. Weather Station • Weather station id and time are unique • Store as many as needed 45 CREATE TABLE temperatures ( weather_station text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ( weather_station, year, month, day, hour) ); INSERT INTO temperatures (weather_station, year, month, day, hour, temperature) VALUES ('10010:99999', 2005, 12, 1, 7, -5.6); INSERT INTO temperatures (weather_station, year, month, day, hour, temperature) VALUES ('10010:99999', 2005, 12, 1, 8, -5.1); INSERT INTO temperatures (weather_station, year, month, day, hour, temperature) VALUES ('10010:99999', 2005, 12, 1, 9, -4.9); INSERT INTO temperatures (weather_station, year, month, day, hour, temperature) VALUES ('10010:99999', 2005, 12, 1, 10, -5.3);
  • 46. Storage Model: Logical View 46 SELECT weather_station, hour, temperature FROM temperatures WHERE weather_station = '10010:99999' 10010:99999 10010:99999 10010:99999 10010:99999 weather_station 7 8 9 10 hour -5.6 -5.1 -4.9 -5.3 temperature
  • 47. Storage Model: Disk Layout 47 SELECT weather_station, hour, temperature FROM temperatures WHERE weather_station = '10010:99999' 10010:99999 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 2005:12:1:10 -5.3
  • 48. Storage Model: Disk Layout 48 SELECT weather_station, hour, temperature FROM temperatures WHERE weather_station = '10010:99999' 10010:99999 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 2005:12:1:10 -5.3 2005:12:1:11 Merged, Sorted, and Stored Sequentially
  • 49. Query Patterns • Range queries • "Slice" operation on disk 49 SELECT weather_station, hour, temperature FROM temperatures WHERE weather_station = '10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10 10010:99999 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 2005:12:1:10 -5.3 2005:12:1:11 Partition key for locality Single seek on disk
  • 50. Query Patterns 50 • Range queries • "Slice" operation on disk 10010:99999 10010:99999 10010:99999 10010:99999 weather_station hour temperature 7 8 9 10 -5.6 -5.1 -4.9 -5.3 SELECT weather_station, hour, temperature FROM temperatures WHERE weather_station = '10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10
  • 51. Query Patterns 51 • Programmers like this 10010:99999 10010:99999 10010:99999 10010:99999 weather_station hour temperature 7 8 9 10 -5.6 -5.1 -4.9 -5.3 SELECT weather_station, hour, temperature FROM temperatures WHERE weather_station = '10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10 Sorted in time order
  • 52. Takeaway: Goals of Cassandra Data Modeling • Spread data evenly around the cluster – Choose a good Primary Key (particularly, the Partition Key portion) • Minimize the number of partitions read for a given query – Remember: Partitions are spread out around the cluster • Do not worry about: – Minimizing the number of writes: Cassandra is really fast at writes – Minimizing data duplication: this is not 3NF from RDBMS, disk is cheap 52
  • 53. Questions? Follow me for updates or to ask questions later: @LukeTillman 53