SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
The data model is dead,
long live the data model!!
Patrick McFadin
Senior Solutions Architect
DataStax
Thursday, May 2, 13
The data model is dead,
long live the data model!!
Patrick McFadin
Senior Solutions Architect
DataStax
Thursday, May 2, 13
Bridging the divide
The era of relational everything is over
The era of Polyglot Persistence* has begun
* http://www.martinfowler.com/bliki/PolyglotPersistence.html
Thursday, May 2, 13
Coming from a relational world
Tradeoffs are hard
Feature RDBMS Cassandra
Single Point of
Failure
Cross Datacenter
Linear Scaling
Data modeling
Thursday, May 2, 13
Background -The data model
•The data model is alive and well
• Models define the business requirements
• Define of the structure of your data
• Relational is just one type (Network model anyone?)
4
Wait? I thought NoSQL meant no model?
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
Thursday, May 2, 13
Relational Background - Normal forms
•This IS the relational model
• 5 normal forms
• Need foreign keys
• Need joins
6
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Model brought from big table*
• Row Key and a lot of columns
• Column names sorted (UTF8, Int,Timestamp, etc)
7
Column Name ... Column Name
ColumnValue ColumnValue
Timestamp Timestamp
TTL TTL
Row Key
1 2 Billion
* http://research.google.com/archive/bigtable.html
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Rows belong to a node and are replicated
• Row lookups are fast
• Randomly distributed in cluster
8
RowKey1
RowKey2
RowKey3
RowKey4
RowKey5
RowKey6
RowKey7
RowKey8
RowKey9
RowKey10
RowKey11
RowKey12
Lookup5RowKey5
Thursday, May 2, 13
Relational Concept - Sequences
• Handy feature for auto-creation of Ids
• Guaranteed unique
• Depends on a single source of truth (one server)
9
INSERT INTO user (id, firstName, LastName)
VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
Thursday, May 2, 13
Cassandra Concept - No sequences
• Difficult in a distributed system
• Requires a lock (perf killer)
• What to do?
- Use part of the data to create a unique index, or...
- UUID to the rescue!
10
Thursday, May 2, 13
Concept - UUID
• Universal Unique ID
• 128 bit number represented in character form
• Easily generated on the client
• Same as GUID for the MS folks
11
99051fe9-6a9c-46c2-b949-38ef78858dd0
RFC 4122 if you want a reference
Thursday, May 2, 13
Cassandra Concept - Entity model
• User table (!!)
• Username is the unique key
• Static but can be changed dynamically without downtime
12
CREATE TABLE users (
username varchar,
firstname varchar,
lastname varchar,
email varchar,
password varchar,
created_date timestamp,
PRIMARY KEY (username)
);
ALTER TABLE users ADD city text;
Thursday, May 2, 13
Relational Concept - De-normalization
•To combine relations into a single row
• Used in relational modeling to avoid complex joins
13
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
SELECT e.First, e.Last, d.Dept
FROM Department d, Employees e
WHERE 1 = e.id
AND e.id = d.id
Take this and then...
Thursday, May 2, 13
Relational Concept - De-normalization
• Combine table columns into a single view
• No joins
• All in how you set the data for fast reads
14
SELECT First, Last, Dept
FROM employees
WHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Relationship without being relational
• Users have many videos
• Wait? Where is the foreign key?
15
username firstname lastname email
tcodd Edgar Codd tcodd@relational.com
rboyce Raymond Boyce rboyce@relational.com
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol
b3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
Thursday, May 2, 13
Cassandra Concept - One-to-many
• Static table to store videos
• UUID for unique video id
• Add username to denormalize
16
CREATE TABLE videos (
videoid uuid,
videoname varchar,
username varchar,
description varchar,
tags varchar,
upload_date timestamp,
PRIMARY KEY(videoid)
);
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Lookup video by username
• Write in two tables at once for fast lookups
17
CREATE TABLE username_video_index (
username varchar,
videoid uuid,
upload_date timestamp,
video_name varchar,
PRIMARY KEY (username, videoid)
);
SELECT video_name
FROM username_video_index
WHERE username = ‘ctodd’
AND videoid = ‘99051fe9’
Creates a wide row!
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Users and videos have many comments
18
username firstname lastname email
tcodd Edgar Codd tcodd@relational.com
rboyce Raymond Boyce rboyce@relational.com
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol
b3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
username videoid comment
tcodd 99051fe9 Sweet!
rboyce b3a76c6b Boring :(
Comments
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view
• Insert both when comment is created
•View from either side
19
CREATE TABLE comments_by_video (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username)
);
CREATE TABLE comments_by_user (
username varchar,
videoid uuid,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (username,videoid)
);
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view
• Insert both when comment is created
•View from either side
19
CREATE TABLE comments_by_video (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username)
);
CREATE TABLE comments_by_user (
username varchar,
videoid uuid,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (username,videoid)
);
Don’t be afraid of writes. Bring it!
Thursday, May 2, 13
Relational Concept -Transactions
• Built in and easy to use
• Can be slow and heavy so don’t use them all the time
• Normal forms force ACID writes into many tables
20
lock
-change table one
-change table two
-change table three
commit
-or-
lock
-change table one
-change table two
-change table three
rollback
Thursday, May 2, 13
Crazy Concept - Do you need a transaction?
• Since they were easy in RDBMS, was it just default?
• Read this article
• In a nutshell,
21
http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Asynchronous transaction
Cashier takes your money
Barista makes your coffee
Error? Barista deals with it
Thursday, May 2, 13
Cassandra Concept -Transaction quality
• Requires a lock, which is costly in distributed systems
• Cassandra features can be used to advantage
- Row level isolation
- Atomic batches
22
Thursday, May 2, 13
Cassandra Concept -Transaction
•Track that something happened
• Use time stamps to preserve order
• Rectify when any doubt (just like banks do)
23
CREATE TABLE credit_transaction (
username varchar,
type varchar,
datetime timestamp,
credits int,
PRIMARY KEY (username,datetime,type)
) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);
Create this table
Sort the columns in reverse order
so last action is first on the list
Thursday, May 2, 13
Cassandra Concept -Transaction
• All transactions are stored
•Think RPN calculator, latest first
24
ADD:2013-04-25
21:10:32.745
REMOVE:2013-04-25
15:45:22.813
ADD:2013-04-25
07:15:12.542
$20 $5 $100
tcodd
Rectify account: + $100
- $5
+ 20
---------
= $115 Current balance
Thursday, May 2, 13
Cassandra Concept -Transaction
25
Create credit_transaction record
with ADD +Timestamp
Read user record total_credits
and credit_timestamp
user credit_timestamp <
credit_transaction
timestamp?
Set back in user record
credit_timestamp and
incremented total_credits
Create credit_transaction record
with REMOVE +Timestamp
Read user record total_credits
and credit_timestamp
user credit_timestamp <
credit_transaction
timestamp?
Set back in user record
credit_timestamp and
decremented total_credits
Fail transaction
and rectify
Success
Add Credit Remove credit
Thursday, May 2, 13
And if that doesn’t work...
• Lightweight transactions coming soon.
• Cassandra 2.0
• See CASSANDRA-5062
26
Thursday, May 2, 13
But wait there is more!!
•The next in this series: May 16th
27
Become a super modeler
• Final will be at the Cassandra Summit: June 11th
The worlds next top data model
Thursday, May 2, 13
Be there!!!
28
Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.
Thursday, May 2, 13
ThankYou
Q&A
Thursday, May 2, 13

Contenu connexe

Tendances

ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022René Cannaò
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basicsnickmbailey
 
AWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMTAWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMTI Goo Lee
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practiceEugene Fidelin
 
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법Ji-Woong Choi
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Aleksandr Kuzminsky
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 
Comparison of ACFS and DBFS
Comparison of ACFS and DBFSComparison of ACFS and DBFS
Comparison of ACFS and DBFSDanielHillinger
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsMostafa Mokhtar
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...hoondong kim
 
PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?Ohyama Masanori
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling TwitterBlaine
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...Mydbops
 

Tendances (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022ProxySQL Cluster - Percona Live 2022
ProxySQL Cluster - Percona Live 2022
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
AWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMTAWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMT
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
 
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 
Comparison of ACFS and DBFS
Comparison of ACFS and DBFSComparison of ACFS and DBFS
Comparison of ACFS and DBFS
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...
 
PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?
 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
 

Similaire à The data model is dead, long live the data model

Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraMichael Kjellman
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanDataStax Academy
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?Ivan Zoratti
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Nenad Bozic
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQLIvan Zoratti
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterDatabricks
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraJesus Guzman
 
Cassandra at Zalando
Cassandra at ZalandoCassandra at Zalando
Cassandra at ZalandoLuis Mineiro
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
 

Similaire à The data model is dead, long live the data model (20)

Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra at Zalando
Cassandra at ZalandoCassandra at Zalando
Cassandra at Zalando
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
 

Plus de Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 

Plus de Patrick McFadin (20)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

The data model is dead, long live the data model

  • 1. The data model is dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13
  • 2. The data model is dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13
  • 3. Bridging the divide The era of relational everything is over The era of Polyglot Persistence* has begun * http://www.martinfowler.com/bliki/PolyglotPersistence.html Thursday, May 2, 13
  • 4. Coming from a relational world Tradeoffs are hard Feature RDBMS Cassandra Single Point of Failure Cross Datacenter Linear Scaling Data modeling Thursday, May 2, 13
  • 5. Background -The data model •The data model is alive and well • Models define the business requirements • Define of the structure of your data • Relational is just one type (Network model anyone?) 4 Wait? I thought NoSQL meant no model? Thursday, May 2, 13
  • 6. Background - ACID vs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Thursday, May 2, 13
  • 7. Background - ACID vs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Thursday, May 2, 13
  • 8. Background - ACID vs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Cassandra let’s you tune this Thursday, May 2, 13
  • 9. Relational Background - Normal forms •This IS the relational model • 5 normal forms • Need foreign keys • Need joins 6 id First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department Thursday, May 2, 13
  • 10. Background - How Cassandra Stores Data • Model brought from big table* • Row Key and a lot of columns • Column names sorted (UTF8, Int,Timestamp, etc) 7 Column Name ... Column Name ColumnValue ColumnValue Timestamp Timestamp TTL TTL Row Key 1 2 Billion * http://research.google.com/archive/bigtable.html Thursday, May 2, 13
  • 11. Background - How Cassandra Stores Data • Rows belong to a node and are replicated • Row lookups are fast • Randomly distributed in cluster 8 RowKey1 RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12 Lookup5RowKey5 Thursday, May 2, 13
  • 12. Relational Concept - Sequences • Handy feature for auto-creation of Ids • Guaranteed unique • Depends on a single source of truth (one server) 9 INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’) Thursday, May 2, 13
  • 13. Cassandra Concept - No sequences • Difficult in a distributed system • Requires a lock (perf killer) • What to do? - Use part of the data to create a unique index, or... - UUID to the rescue! 10 Thursday, May 2, 13
  • 14. Concept - UUID • Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks 11 99051fe9-6a9c-46c2-b949-38ef78858dd0 RFC 4122 if you want a reference Thursday, May 2, 13
  • 15. Cassandra Concept - Entity model • User table (!!) • Username is the unique key • Static but can be changed dynamically without downtime 12 CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); ALTER TABLE users ADD city text; Thursday, May 2, 13
  • 16. Relational Concept - De-normalization •To combine relations into a single row • Used in relational modeling to avoid complex joins 13 id First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department SELECT e.First, e.Last, d.Dept FROM Department d, Employees e WHERE 1 = e.id AND e.id = d.id Take this and then... Thursday, May 2, 13
  • 17. Relational Concept - De-normalization • Combine table columns into a single view • No joins • All in how you set the data for fast reads 14 SELECT First, Last, Dept FROM employees WHERE id = ‘1’ id First Last Dept 1 Edgar Codd Engineering 2 Raymond Boyce Math Employees Thursday, May 2, 13
  • 18. Cassandra Concept - One-to-Many • Relationship without being relational • Users have many videos • Wait? Where is the foreign key? 15 username firstname lastname email tcodd Edgar Codd tcodd@relational.com rboyce Raymond Boyce rboyce@relational.com videoid videoname username description tags 99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol b3a76c6b Math tcodd Now my dog plays dogs,piano,lol Users Videos Thursday, May 2, 13
  • 19. Cassandra Concept - One-to-many • Static table to store videos • UUID for unique video id • Add username to denormalize 16 CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY(videoid) ); Thursday, May 2, 13
  • 20. Cassandra Concept - One-to-Many • Lookup video by username • Write in two tables at once for fast lookups 17 CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid) ); SELECT video_name FROM username_video_index WHERE username = ‘ctodd’ AND videoid = ‘99051fe9’ Creates a wide row! Thursday, May 2, 13
  • 21. Cassandra concept - Many-to-many • Users and videos have many comments 18 username firstname lastname email tcodd Edgar Codd tcodd@relational.com rboyce Raymond Boyce rboyce@relational.com videoid videoname username description tags 99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol b3a76c6b Math tcodd Now my dog plays dogs,piano,lol Users Videos username videoid comment tcodd 99051fe9 Sweet! rboyce b3a76c6b Boring :( Comments Thursday, May 2, 13
  • 22. Cassandra concept - Many-to-many • Model both sides of the view • Insert both when comment is created •View from either side 19 CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username) ); CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid) ); Thursday, May 2, 13
  • 23. Cassandra concept - Many-to-many • Model both sides of the view • Insert both when comment is created •View from either side 19 CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username) ); CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid) ); Don’t be afraid of writes. Bring it! Thursday, May 2, 13
  • 24. Relational Concept -Transactions • Built in and easy to use • Can be slow and heavy so don’t use them all the time • Normal forms force ACID writes into many tables 20 lock -change table one -change table two -change table three commit -or- lock -change table one -change table two -change table three rollback Thursday, May 2, 13
  • 25. Crazy Concept - Do you need a transaction? • Since they were easy in RDBMS, was it just default? • Read this article • In a nutshell, 21 http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf Asynchronous transaction Cashier takes your money Barista makes your coffee Error? Barista deals with it Thursday, May 2, 13
  • 26. Cassandra Concept -Transaction quality • Requires a lock, which is costly in distributed systems • Cassandra features can be used to advantage - Row level isolation - Atomic batches 22 Thursday, May 2, 13
  • 27. Cassandra Concept -Transaction •Track that something happened • Use time stamps to preserve order • Rectify when any doubt (just like banks do) 23 CREATE TABLE credit_transaction ( username varchar, type varchar, datetime timestamp, credits int, PRIMARY KEY (username,datetime,type) ) WITH CLUSTERING ORDER BY (datetime DESC, type ASC); Create this table Sort the columns in reverse order so last action is first on the list Thursday, May 2, 13
  • 28. Cassandra Concept -Transaction • All transactions are stored •Think RPN calculator, latest first 24 ADD:2013-04-25 21:10:32.745 REMOVE:2013-04-25 15:45:22.813 ADD:2013-04-25 07:15:12.542 $20 $5 $100 tcodd Rectify account: + $100 - $5 + 20 --------- = $115 Current balance Thursday, May 2, 13
  • 29. Cassandra Concept -Transaction 25 Create credit_transaction record with ADD +Timestamp Read user record total_credits and credit_timestamp user credit_timestamp < credit_transaction timestamp? Set back in user record credit_timestamp and incremented total_credits Create credit_transaction record with REMOVE +Timestamp Read user record total_credits and credit_timestamp user credit_timestamp < credit_transaction timestamp? Set back in user record credit_timestamp and decremented total_credits Fail transaction and rectify Success Add Credit Remove credit Thursday, May 2, 13
  • 30. And if that doesn’t work... • Lightweight transactions coming soon. • Cassandra 2.0 • See CASSANDRA-5062 26 Thursday, May 2, 13
  • 31. But wait there is more!! •The next in this series: May 16th 27 Become a super modeler • Final will be at the Cassandra Summit: June 11th The worlds next top data model Thursday, May 2, 13
  • 32. Be there!!! 28 Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it. Thursday, May 2, 13