SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Getting to know
by Michelle Darling
mdarlingcmt@gmail.com
August 2013
Agenda:
● What is Cassandra?
● Installation, CQL3
● Data Modelling
● Summary
Only 15 min to cover these, so
please hold questions til the
end, or email me :-) and I’ll
summarize Q&A for everyone.
Unfortunately, no time for:
● DB Admin
○ Detailed Architecture
○ Partitioning /
Consistent Hashing
○ Consistency Tuning
○ Data Distribution &
Replication
○ System Tables
● App Development
○ Using Python, Ruby etc
to access Cassandra
○ Using Hadoop to
stream data into
Cassandra
What is Cassandra?
“Fortuneteller of Doom”
from Greek Mythology. Tried to
warn others about future disasters,
but no one listened. Unfortunately,
she was 100% accurate.
NoSQL Distributed DB
● Consistency - A__ID
● Availability - High
● Point of Failure - none
● Good for Event
Tracking & Analysis
○ Time series data
○ Sensor device data
○ Social media analytics
○ Risk Analysis
○ Failure Prediction
Rackspace: “Which servers
are under heavy load
and are about to crash?”
The Evolution of Cassandra
2008: Open-Source Release / 2013: Enterprise & Community Editions
Data Model
● Wide rows, sparse arrays
● High performance through very
fast write throughput.
Infrastructure
● Peer-Peer Gossip
● Key-Value Pairs
● Tunable Consistency
2006
2005
● Originally for Inbox Search
● But now used for Instagram
Other NoSQL vs. Cassandra
NoSQL Taxonomy:
● Key-Value Pairs
○ Dynamo, Riak, Redis
● Column-Based
○ BigTable, HBase,
Cassandra
● Document-Based
○ MongoDB, Couchbase
● Graph
○ Neo4J
Big Data Capable
C* Differentiators:
● Production-proven at
Netflix, eBay, Twitter,
20 of Fortune 100
● “Clear Winner” in
Scalability,
Performance,
Availability
-- DataStax
Architecture
● Cluster (ring)
● Nodes (circles)
● Peer-to-Peer Model
● Gossip Protocol
Partitioner:
Consistent Hashing
Netflix
Streaming Video
● Personalized
Recommendations per
family member
● Built on Amazon Web
Services (AWS) +
Cassandra
Cloud installation using
● Amazon Web Services (AWS)
● Elastic Compute Cloud (EC2)
○ Free for the 1st year! Then pay only for what you use.
○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes,
● Amazon Machine Image (AMI)
○ Preconfigured installation template
○ Choose: “DataStax AMI for Cassandra
Community Edition”
○ Follow these *very good* step-by-step
instructions from DataStax.
○ AMIs also available for CouchBase, MongoDB
(make sure you pick the free tier community versions to avoid
monthly charge$$!!!).
AWS EC2 Dashboard
DataStax AMI Setup
DataStax AMI Setup
--clustername Michelle
--totalnodes 1
--version community
“Roll your Own” Installation
DataStax Community Edition
● Install instructions
For Linux, Windows,
MacOS:
http://www.datastax.com/2012/01/getting-
started-with-cassandra
● Video: “Set up a 4-
node Cassandra
cluster in under 2
minutes”
http://www.screenr.com/5G6
Invoke CQLSH, CREATE KEYSPACE
./bin/cqlsh
cqlsh> CREATE KEYSPACE big_data
… with strategy_class = ‘org.apache.cassandra.
locator.SimpleStrategy’
… with strategy_options:replication_factor=‘1’;
cqlsh> use big_data;
cqlsh:big_data>
Tip: Skip Thrift -- use CQL3
Thrift RPC
// Your Column
Column col = new Column(ByteBuffer.wrap("name".
getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());
// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);
List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String,
List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()),
cf_map);
cassandra.batch_mutate(mutations_map,
consistency_level);
CQL3
- Uses cqlsh
- “SQL-like” language
- Runs on top of Thrift RPC
- Much more user-friendly.
Thrift code on left
equals this in CQL3:
INSERT INTO (id, name)
VALUES ('key',
'value');
CREATE TABLE
cqlsh:big_data> create table user_tags (
… user_id varchar,
… tag varchar,
… value counter,
… primary key (user_id, tag)
…):
● TABLE user_tags: “How many times has a user
mentioned a hashtag?”
● COUNTER datatype - Computes & stores counter value
at the time data is written. This optimizes query
performance.
UPDATE TABLE
SELECT FROM TABLE
cqlsh:big_data> UPDATE user_tags SET
value=value+1 WHERE user_id = ‘paul’ AND tag =
‘cassandra’
cqlsh:big_data> SELECT * FROM user_tags
user_id | tag | value
--------+-----------+----------
paul | cassandra | 1
DATA MODELING
A Major Paradigm Shift!
RDBMS Cassandra
Structured Data, Fixed Schema Unstructured Data, Flexible Schema
“Array of Arrays”
2D: ROW x COLUMN
“Nested Key-Value Pairs”
3D: ROW Key x COLUMN key x COLUMN values
DATABASE KEYSPACE
TABLE TABLE a.k.a COLUMN FAMILY
ROW ROW a.k.a PARTITION. Unit of replication.
COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit
of storage. Up to 2 billion columns per row.
FOREIGN KEYS, JOINS,
ACID Consistency
Referential Integrity not enforced, so A_CID.
BUT relationships represented using COLLECTIONS.
Cassandra
3D+: Nested Objects
RDBMS
2D: Rows
x columns
Example:
“Twissandra” Web App
Twitter-Inspired
sample application
written in Python +
Cassandra.
● Play with the app:
twissandra.com
● Examine & learn
from the code on
GitHub.
Features/Queries:
● Sign In, Sign Up
● Post Tweet
● Userline (User’s tweets)
● Timeline (All tweets)
● Following (Users being
followed by user)
● Followers (Users
following this user)
Twissandra.com vs Twitter.com
Twissandra - RDBMS Version
Entities
● USER, TWEET
● FOLLOWER, FOLLOWING
● FRIENDS
Relationships:
● USER has many TWEETs.
● USER is a FOLLOWER of many
USERs.
● Many USERs are FOLLOWING
USER.
Twissandra - Cassandra Version
Tip: Model tables to mirror queries.
TABLES or CFs
● TWEET
● USER, USERNAME
● FOLLOWERS, FOLLOWING
● USERLINE, TIMELINE
Notes:
● Extra tables mirror queries.
● Denormalized tables are
“pre-formed”for faster
performance.
TABLE
Tip: Remember,
Skip Thrift -- use CQL3
What does C* data look like?
TABLE Userline
“List all of user’s Tweets”
*************
Row Key: user_id
Columns
● Column Key: tweet_id
● “at” Timestamp
● TTL (Time to Live) -
seconds til expiration
date.
*************
Cassandra Data Model = LEGOs?
FlexibleSchema
Summary:
● Go straight from SQL
to CQL3; skip Thrift, Column
Families, SuperColumns, etc
● Denormalize tables to
mirror important queries.
Roughly 1 table per impt query.
● Choose wisely:
○ Partition Keys
○ Cluster Keys
○ Indexes
○ TTL
○ Counters
○ Collections
See DataStax Music Service
Example
● Consider hybrid
approach:
○ 20% - RDBMS for highly
structured, OLTP, ACID
requirements.
○ 80% - Scale Cassandra to
handle the rest of data.
Remember:
● Cheap: storage,
servers, OpenSource
software.
● Precious: User AND
Developer Happiness.
Resources
C* Summit 2013:
● Slides
● Cassandra at eBay Scale (slides)
● Data Modelers Still Have Jobs -
Adjusting For the NoSQL
Environment (Slides)
● Real-time Analytics using
Cassandra, Spark and Shark
slides
● Cassandra By Example: Data
Modelling with CQL3 Slides
● DATASTAX C*OLLEGE CREDIT:
DATA MODELLING FOR APACHE
CASSANDRA slides
I wish I found these 1st:
● How do I Cassandra?
slides
● Mobile version of
DataStax web docs
(link)

Contenu connexe

Tendances

Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
ProxySQL on Kubernetes
ProxySQL on KubernetesProxySQL on Kubernetes
ProxySQL on KubernetesRené Cannaò
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)涛 吴
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureDataStax
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 

Tendances (20)

Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
ProxySQL on Kubernetes
ProxySQL on KubernetesProxySQL on Kubernetes
ProxySQL on Kubernetes
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data StructureUnderstanding How CQL3 Maps to Cassandra's Internal Data Structure
Understanding How CQL3 Maps to Cassandra's Internal Data Structure
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 

Similaire à Cassandra NoSQL Tutorial

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstackShivaling Sannalli
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSDataStax Academy
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersAnant Corporation
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And BeyondRomain Hardouin
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkRed Hat Developers
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Akash Kant
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasMongoDB
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 

Similaire à Cassandra NoSQL Tutorial (20)

Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstack
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET Developers
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB Atlas
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 

Plus de Michelle Darling

Plus de Michelle Darling (8)

Family pics2august014
Family pics2august014Family pics2august014
Family pics2august014
 
Final pink panthers_03_31
Final pink panthers_03_31Final pink panthers_03_31
Final pink panthers_03_31
 
Final pink panthers_03_30
Final pink panthers_03_30Final pink panthers_03_30
Final pink panthers_03_30
 
Php summary
Php summaryPhp summary
Php summary
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 
College day pressie
College day pressieCollege day pressie
College day pressie
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
V3 gamingcasestudy
V3 gamingcasestudyV3 gamingcasestudy
V3 gamingcasestudy
 

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Cassandra NoSQL Tutorial

  • 1. Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013
  • 2. Agenda: ● What is Cassandra? ● Installation, CQL3 ● Data Modelling ● Summary Only 15 min to cover these, so please hold questions til the end, or email me :-) and I’ll summarize Q&A for everyone. Unfortunately, no time for: ● DB Admin ○ Detailed Architecture ○ Partitioning / Consistent Hashing ○ Consistency Tuning ○ Data Distribution & Replication ○ System Tables ● App Development ○ Using Python, Ruby etc to access Cassandra ○ Using Hadoop to stream data into Cassandra
  • 3. What is Cassandra? “Fortuneteller of Doom” from Greek Mythology. Tried to warn others about future disasters, but no one listened. Unfortunately, she was 100% accurate. NoSQL Distributed DB ● Consistency - A__ID ● Availability - High ● Point of Failure - none ● Good for Event Tracking & Analysis ○ Time series data ○ Sensor device data ○ Social media analytics ○ Risk Analysis ○ Failure Prediction Rackspace: “Which servers are under heavy load and are about to crash?”
  • 4. The Evolution of Cassandra 2008: Open-Source Release / 2013: Enterprise & Community Editions Data Model ● Wide rows, sparse arrays ● High performance through very fast write throughput. Infrastructure ● Peer-Peer Gossip ● Key-Value Pairs ● Tunable Consistency 2006 2005 ● Originally for Inbox Search ● But now used for Instagram
  • 5. Other NoSQL vs. Cassandra NoSQL Taxonomy: ● Key-Value Pairs ○ Dynamo, Riak, Redis ● Column-Based ○ BigTable, HBase, Cassandra ● Document-Based ○ MongoDB, Couchbase ● Graph ○ Neo4J Big Data Capable C* Differentiators: ● Production-proven at Netflix, eBay, Twitter, 20 of Fortune 100 ● “Clear Winner” in Scalability, Performance, Availability -- DataStax
  • 6. Architecture ● Cluster (ring) ● Nodes (circles) ● Peer-to-Peer Model ● Gossip Protocol Partitioner: Consistent Hashing
  • 7. Netflix Streaming Video ● Personalized Recommendations per family member ● Built on Amazon Web Services (AWS) + Cassandra
  • 8. Cloud installation using ● Amazon Web Services (AWS) ● Elastic Compute Cloud (EC2) ○ Free for the 1st year! Then pay only for what you use. ○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes, ● Amazon Machine Image (AMI) ○ Preconfigured installation template ○ Choose: “DataStax AMI for Cassandra Community Edition” ○ Follow these *very good* step-by-step instructions from DataStax. ○ AMIs also available for CouchBase, MongoDB (make sure you pick the free tier community versions to avoid monthly charge$$!!!).
  • 11. DataStax AMI Setup --clustername Michelle --totalnodes 1 --version community
  • 12. “Roll your Own” Installation DataStax Community Edition ● Install instructions For Linux, Windows, MacOS: http://www.datastax.com/2012/01/getting- started-with-cassandra ● Video: “Set up a 4- node Cassandra cluster in under 2 minutes” http://www.screenr.com/5G6
  • 13. Invoke CQLSH, CREATE KEYSPACE ./bin/cqlsh cqlsh> CREATE KEYSPACE big_data … with strategy_class = ‘org.apache.cassandra. locator.SimpleStrategy’ … with strategy_options:replication_factor=‘1’; cqlsh> use big_data; cqlsh:big_data>
  • 14. Tip: Skip Thrift -- use CQL3 Thrift RPC // Your Column Column col = new Column(ByteBuffer.wrap("name". getBytes())); col.setValue(ByteBuffer.wrap("value".getBytes())); col.setTimestamp(System.currentTimeMillis()); // Don't ask ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); // Prepare to be amazed Mutation mutation = new Mutation(); mutation.setColumnOrSuperColumn(cosc); List<Mutation> mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set("Standard1", mutations); mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map); cassandra.batch_mutate(mutations_map, consistency_level); CQL3 - Uses cqlsh - “SQL-like” language - Runs on top of Thrift RPC - Much more user-friendly. Thrift code on left equals this in CQL3: INSERT INTO (id, name) VALUES ('key', 'value');
  • 15. CREATE TABLE cqlsh:big_data> create table user_tags ( … user_id varchar, … tag varchar, … value counter, … primary key (user_id, tag) …): ● TABLE user_tags: “How many times has a user mentioned a hashtag?” ● COUNTER datatype - Computes & stores counter value at the time data is written. This optimizes query performance.
  • 16. UPDATE TABLE SELECT FROM TABLE cqlsh:big_data> UPDATE user_tags SET value=value+1 WHERE user_id = ‘paul’ AND tag = ‘cassandra’ cqlsh:big_data> SELECT * FROM user_tags user_id | tag | value --------+-----------+---------- paul | cassandra | 1
  • 17. DATA MODELING A Major Paradigm Shift! RDBMS Cassandra Structured Data, Fixed Schema Unstructured Data, Flexible Schema “Array of Arrays” 2D: ROW x COLUMN “Nested Key-Value Pairs” 3D: ROW Key x COLUMN key x COLUMN values DATABASE KEYSPACE TABLE TABLE a.k.a COLUMN FAMILY ROW ROW a.k.a PARTITION. Unit of replication. COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit of storage. Up to 2 billion columns per row. FOREIGN KEYS, JOINS, ACID Consistency Referential Integrity not enforced, so A_CID. BUT relationships represented using COLLECTIONS.
  • 19. Example: “Twissandra” Web App Twitter-Inspired sample application written in Python + Cassandra. ● Play with the app: twissandra.com ● Examine & learn from the code on GitHub. Features/Queries: ● Sign In, Sign Up ● Post Tweet ● Userline (User’s tweets) ● Timeline (All tweets) ● Following (Users being followed by user) ● Followers (Users following this user)
  • 21. Twissandra - RDBMS Version Entities ● USER, TWEET ● FOLLOWER, FOLLOWING ● FRIENDS Relationships: ● USER has many TWEETs. ● USER is a FOLLOWER of many USERs. ● Many USERs are FOLLOWING USER.
  • 22. Twissandra - Cassandra Version Tip: Model tables to mirror queries. TABLES or CFs ● TWEET ● USER, USERNAME ● FOLLOWERS, FOLLOWING ● USERLINE, TIMELINE Notes: ● Extra tables mirror queries. ● Denormalized tables are “pre-formed”for faster performance.
  • 24. What does C* data look like? TABLE Userline “List all of user’s Tweets” ************* Row Key: user_id Columns ● Column Key: tweet_id ● “at” Timestamp ● TTL (Time to Live) - seconds til expiration date. *************
  • 25. Cassandra Data Model = LEGOs? FlexibleSchema
  • 26. Summary: ● Go straight from SQL to CQL3; skip Thrift, Column Families, SuperColumns, etc ● Denormalize tables to mirror important queries. Roughly 1 table per impt query. ● Choose wisely: ○ Partition Keys ○ Cluster Keys ○ Indexes ○ TTL ○ Counters ○ Collections See DataStax Music Service Example ● Consider hybrid approach: ○ 20% - RDBMS for highly structured, OLTP, ACID requirements. ○ 80% - Scale Cassandra to handle the rest of data. Remember: ● Cheap: storage, servers, OpenSource software. ● Precious: User AND Developer Happiness.
  • 27. Resources C* Summit 2013: ● Slides ● Cassandra at eBay Scale (slides) ● Data Modelers Still Have Jobs - Adjusting For the NoSQL Environment (Slides) ● Real-time Analytics using Cassandra, Spark and Shark slides ● Cassandra By Example: Data Modelling with CQL3 Slides ● DATASTAX C*OLLEGE CREDIT: DATA MODELLING FOR APACHE CASSANDRA slides I wish I found these 1st: ● How do I Cassandra? slides ● Mobile version of DataStax web docs (link)