SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
A Cassandra Data Model for Serving up 
Cat Videos 
Luke Tillman (@LukeTillman) 
Language Evangelist at DataStax
Who are you?! 
• Evangelist with a focus on the .NET Community 
• Long-time Developer 
• Recently presented at Cassandra Summit 2014 with Microsoft 
• Very Recent Denver Transplant 
2
1 What is this KillrVideo thing you speak of? 
2 Know Thy Data and Thy Queries 
3 Denormalize All the Things 
4 Building Healthy Relationships 
5 Knowing Your Limits 
3
What is this KillrVideo thing you speak of? 
4
KillrVideo, a Video Sharing Site 
• Think a YouTube competitor 
– Users add videos, rate them, comment on them, etc. 
– Can search for videos by tag
See the Live Demo, Get the Code 
• Live demo available at http://www.killrvideo.com 
– Written in C# 
– Live Demo running in Azure 
– Open source: https://github.com/luketillman/killrvideo-csharp 
• Interesting use case because of different data modeling 
challenges and the scale of something like YouTube 
– More than 1 billion unique users visit YouTube each month 
– 100 hours of video are uploaded to YouTube every minute 
6
Just How Popular are Cats on the Internet? 
7 
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Just How Popular are Cats on the Internet? 
8 
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Know Thy Data and Thy Queries
Getting to Know Your Data 
• What things do I have in the system? 
• What are the relationships between them? 
• This is your conceptual data model 
• You already do this in the RDBMS world
Some of the Entities and Relationships in KillrVideo 
11 
User 
id 
firstname 
lastname 
email 
password Video 
id 
name 
description 
location 
preview_image 
tags features 
Comment 
id 
comment 
adds 
timestamp 
posts 
timestamp 
1 
n 
n 
1 
1 
n 
n 
m 
rates 
rating
Getting to Know Your Queries 
• What are your application’s workflows? 
• How will I access the data? 
• Knowing your queries in advance is NOT optional 
• Different from RDBMS because I can’t just JOIN or create a new 
indexes to support new queries 
12
Some Application Workflows in KillrVideo 
13 
User Logs 
into site 
Show basic 
information 
about user 
Show videos 
added by a 
user 
Show 
comments 
posted by a 
user 
Search for a 
video by tag 
Show latest 
videos added 
to the site 
Show 
comments 
for a video 
Show ratings 
for a video 
Show video 
and its 
details
Some Queries in KillrVideo to Support Workflows 
14 
Users 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
Comments 
Show 
comments 
for a video 
Find comments by 
video (latest first) 
Show 
comments 
posted by a 
user 
Find comments by 
user (latest first) 
Ratings 
Show ratings 
for a video Find ratings by video
Some Queries in KillrVideo to Support Workflows 
15 
Videos 
Search for a 
video by tag Find video by tag 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
Show video 
and its 
details 
Find video by id 
Show videos 
added by a 
user 
Find videos by user 
(latest first)
Denormalize All the Things
Breaking the Relational Mindset 
• Disk is cheap now 
• Writes in Cassandra are FAST 
• Many times we end up with a “table per query” 
• Take advantage of atomic batches to write to multiple tables 
• Similar to materialized views from the RDBMS world 
17
Users – The Relational Way 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
• Single Users table with all user data and an Id Primary Key 
• Add an index on email address to allow queries by email
Users – The Cassandra Way 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
CREATE 
TABLE 
user_credentials 
( 
email 
text, 
password 
text, 
userid 
uuid, 
PRIMARY 
KEY 
(email) 
); 
CREATE 
TABLE 
users 
( 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
created_date 
timestamp, 
PRIMARY 
KEY 
(userid) 
);
Videos Everywhere! 
20 
Show video 
and its 
details 
Find video by id 
Show videos 
added by a 
user 
Find videos by user 
(latest first) 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
user_videos 
( 
userid 
uuid, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
(userid, 
added_date, 
videoid) 
) 
WITH 
CLUSTERING 
ORDER 
BY 
( 
added_date 
DESC, 
videoid 
ASC);
Videos Everywhere! 
Considerations When Duplicating Data 
• Can the data change? 
• How likely is it to change or how frequently will it change? 
• Do I have all the information I need to update duplicates and 
maintain consistency? 
21 
Search for a 
video by tag Find video by tag 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first)
Building Healthy Relationships
Modeling Relationships – Collection Types 
• Cassandra doesn’t support JOINs, but your data will still have 
relationships (and you can still model that in Cassandra) 
• One tool available is CQL collection types 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
);
Modeling Relationships – Client Side Joins 
24 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
); 
Currently requires query for video, 
followed by query for user by id based 
on results of first query 
CREATE 
TABLE 
users 
( 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
created_date 
timestamp, 
PRIMARY 
KEY 
(userid) 
);
Modeling Relationships – Client Side Joins 
• What is the cost? Might be OK in small situations 
• Do NOT scale 
• Avoid when possible 
25
Modeling Relationships – Client Side Joins 
26 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
... 
user_firstname 
text, 
user_lastname 
text, 
user_email 
text, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
users_by_video 
( 
videoid 
uuid, 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
PRIMARY 
KEY 
(videoid) 
); 
or
Modeling Relationships – Client Side Joins 
• Remember the considerations when you duplicate data 
• What happens if a user changes their name or email address? 
• Can I update the duplicated data? 
27
Knowing Your Limits
Cassandra Rules Can Impact Your Design 
• Video Ratings – use counters to track sum of all ratings and 
count of ratings 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
... 
rating_counter 
counter, 
rating_total 
counter, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
video_ratings 
( 
videoid 
uuid, 
rating_counter 
counter, 
rating_total 
counter, 
PRIMARY 
KEY 
(videoid) 
); 
• Counters are a good example of something with special rules
Single Nodes Have Limits Too 
• Latest videos are bucketed by 
day 
• Means all reads/writes to latest 
videos are going to same 
partition (and thus the same 
nodes) 
• Could create a hotspot 
30 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
CREATE 
TABLE 
latest_videos 
( 
yyyymmdd 
text, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
(yyyymmdd, 
added_date, 
videoid) 
) 
WITH 
CLUSTERING 
ORDER 
BY 
( 
added_date 
DESC, 
videoid 
ASC 
);
Single Nodes Have Limits Too 
• Mitigate by adding data to the 
Partition Key to spread load 
• Data that’s already naturally a 
part of the domain 
– Latest videos by category? 
• Arbitrary data, like a bucket 
number 
– Round robin at the app level 
31 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
CREATE 
TABLE 
latest_videos 
( 
yyyymmdd 
text, 
bucket_number 
int, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
( 
(yyyymmdd, 
bucket_number) 
added_date, 
videoid) 
) 
...
Questions? 
32 
Follow me on Twitter for updates or to ask questions later: @LukeTillman

Contenu connexe

Similaire à Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Zimmertwins Presentation
Zimmertwins PresentationZimmertwins Presentation
Zimmertwins Presentation
Ashok Modi
 

Similaire à Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos (20)

Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
 
Database History From Codd to Brewer
Database History From Codd to BrewerDatabase History From Codd to Brewer
Database History From Codd to Brewer
 
Microsoft Stream - The Video Evolution
Microsoft Stream - The Video EvolutionMicrosoft Stream - The Video Evolution
Microsoft Stream - The Video Evolution
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBox
 
ITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven DevelopmentITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven Development
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
Itproadd 01 60 minute version
Itproadd 01 60 minute versionItproadd 01 60 minute version
Itproadd 01 60 minute version
 
BUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media ServicesBUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media Services
 
Best Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company VideoBest Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company Video
 
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft StreamECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful Degradation
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
 
Zimmertwins Presentation
Zimmertwins PresentationZimmertwins Presentation
Zimmertwins Presentation
 

Plus de DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 

Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

  • 1. A Cassandra Data Model for Serving up Cat Videos Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time Developer • Recently presented at Cassandra Summit 2014 with Microsoft • Very Recent Denver Transplant 2
  • 3. 1 What is this KillrVideo thing you speak of? 2 Know Thy Data and Thy Queries 3 Denormalize All the Things 4 Building Healthy Relationships 5 Knowing Your Limits 3
  • 4. What is this KillrVideo thing you speak of? 4
  • 5. KillrVideo, a Video Sharing Site • Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
  • 6. See the Live Demo, Get the Code • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp • Interesting use case because of different data modeling challenges and the scale of something like YouTube – More than 1 billion unique users visit YouTube each month – 100 hours of video are uploaded to YouTube every minute 6
  • 7. Just How Popular are Cats on the Internet? 7 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 8. Just How Popular are Cats on the Internet? 8 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 9. Know Thy Data and Thy Queries
  • 10. Getting to Know Your Data • What things do I have in the system? • What are the relationships between them? • This is your conceptual data model • You already do this in the RDBMS world
  • 11. Some of the Entities and Relationships in KillrVideo 11 User id firstname lastname email password Video id name description location preview_image tags features Comment id comment adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 12. Getting to Know Your Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 12
  • 13. Some Application Workflows in KillrVideo 13 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 14. Some Queries in KillrVideo to Support Workflows 14 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 15. Some Queries in KillrVideo to Support Workflows 15 Videos Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first) Show video and its details Find video by id Show videos added by a user Find videos by user (latest first)
  • 17. Breaking the Relational Mindset • Disk is cheap now • Writes in Cassandra are FAST • Many times we end up with a “table per query” • Take advantage of atomic batches to write to multiple tables • Similar to materialized views from the RDBMS world 17
  • 18. Users – The Relational Way User Logs into site Find user by email address Show basic information about user Find user by id • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email
  • 19. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 20. Videos Everywhere! 20 Show video and its details Find video by id Show videos added by a user Find videos by user (latest first) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC);
  • 21. Videos Everywhere! Considerations When Duplicating Data • Can the data change? • How likely is it to change or how frequently will it change? • Do I have all the information I need to update duplicates and maintain consistency? 21 Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first)
  • 23. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 24. Modeling Relationships – Client Side Joins 24 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); Currently requires query for video, followed by query for user by id based on results of first query CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 25. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 25
  • 26. Modeling Relationships – Client Side Joins 26 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or
  • 27. Modeling Relationships – Client Side Joins • Remember the considerations when you duplicate data • What happens if a user changes their name or email address? • Can I update the duplicated data? 27
  • 29. Cassandra Rules Can Impact Your Design • Video Ratings – use counters to track sum of all ratings and count of ratings CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE TABLE video_ratings ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); • Counters are a good example of something with special rules
  • 30. Single Nodes Have Limits Too • Latest videos are bucketed by day • Means all reads/writes to latest videos are going to same partition (and thus the same nodes) • Could create a hotspot 30 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (yyyymmdd, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC );
  • 31. Single Nodes Have Limits Too • Mitigate by adding data to the Partition Key to spread load • Data that’s already naturally a part of the domain – Latest videos by category? • Arbitrary data, like a bucket number – Round robin at the app level 31 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, bucket_number int, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY ( (yyyymmdd, bucket_number) added_date, videoid) ) ...
  • 32. Questions? 32 Follow me on Twitter for updates or to ask questions later: @LukeTillman