SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Using Apache Cassandra for Big Data
What is this thing, and how do I use it?

Jeremiah Jordan
Lead Software Engineer/Support
@zanson

©2013 DataStax. Do not distribute without consent.
Monday, October 14, 13

1
Who I am
• Jeremiah Jordan
• Lead Software Engineer in Support at DataStax
• Previously Senior Architect at Morningstar, Inc.
• Using Cassandra since 0.6
• Before that, wrote code for the F22

Monday, October 14, 13
Cassandra - An introduction

Monday, October 14, 13
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable papers
• Shared nothing
• Distributed
• Data safe as possible
• Predictable scaling
Dynamo

BigTable
4
Monday, October 14, 13
Cassandra - More than one server
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server

• Each node owns a number of tokens
• Tokens denote a range of keys
• 4 nodes? -> Key range/4
• Each node owns 1/4 the data
5
Monday, October 14, 13
Cassandra - Locally Distributed

• Client writes to any node
• Node coordinates with others
• Data replicated in parallel
• Replication factor (RF): How
many copies of your data?
• RF = 3 here

Each node stores 3/4 of
clusters total data.
6
Monday, October 14, 13
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC

Single coordinator

7
Monday, October 14, 13
Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read or write

• ALL = All replicas ack
• QUORUM = > 51% of replicas ack
• LOCAL_QUORUM = > 51% in local DC ack
• ONE = Only one replica acks
8
Monday, October 14, 13
Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consistency Level = Success
• This example:
• RF = 3
• CL = QUORUM

>51% Ack so we are good!
9
Monday, October 14, 13
Application Example - Layout
• Active-Active
• Service based DNS routing

Cassandra Replication

10
Monday, October 14, 13
Application Example - Uptime
• Normal server maintenance
• Application is unaware

Cassandra Replication

11
Monday, October 14, 13
Application Example - Failure
• Data center failure

Another happy user!

• Data is safe. Route traffic.

12
33
Monday, October 14, 13
Five Years of Cassandra

0.1
Jul-08

...

0.3
Jul-09

0.6
May-10

0.7
Feb-11

1.0
Dec-11

DSE

Monday, October 14, 13

1.2
Oct-12

2.0
Jul-13
Cassandra 2.0 - Big new features

Monday, October 14, 13
Lightweight transactions: the problem
Session 1

Session 2

SELECT * FROM users
WHERE username = ’jbellis’

SELECT * FROM users
WHERE username = ’jbellis’

[empty resultset]

[empty resultset]

It’s a Race!
INSERT INTO users
(username,password)
VALUES (’jbellis’,‘xdg44hh’)

Who wins?
Monday, October 14, 13

INSERT INTO users
(userName,password)
VALUES (’jbellis’,‘8dhh43k’)
LWT: details
• 4 round trips vs 1 for normal updates
• Paxos - Paxos made easy
• Immediate consistency with no leader election or failover
• For reads, ConsistencyLevel.SERIAL
• http://www.datastax.com/dev/blog/lightweight-transactions-incassandra-2-0

Monday, October 14, 13
Using LWT
• Don’t overwrite an existing record
INSERT INTO USERS (username, email, ...)
VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... )
IF NOT EXISTS;

• Only update record if condition is met
UPDATE USERS
SET email = ’jonathan@datastax.com’, ...
WHERE username = ’jbellis’
IF email = ’jbellis@datastax.com’;

Monday, October 14, 13
LWT: Use with caution
• Great for 1% of your application
• Eventual consistency is your friend
• http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistencyhopeful-consistency-by-christos-kalantzis

Monday, October 14, 13
Installing Cassandra

Monday, October 14, 13
Download Cassandra

Monday, October 14, 13
Download Cassandra

Monday, October 14, 13
Download Cassandra

Monday, October 14, 13
Extract Cassandra

Monday, October 14, 13
Setup Data and Log Directories

Monday, October 14, 13
Start Cassandra

Monday, October 14, 13
Start Cassandra

Monday, October 14, 13
Installing Cassandra Python Driver

Monday, October 14, 13
Python Cassandra Driver

Monday, October 14, 13
Install Python Cassandra Driver

Monday, October 14, 13
Connect and Create a Keyspace
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.connect()
log.info("creating keyspace...")
KEYSPACE = "testkeyspace"
session.execute("""
CREATE KEYSPACE IF NOT EXISTS %s
WITH replication = { 'class': 'SimpleStrategy',
'replication_factor': '1' }
""" % KEYSPACE)

Monday, October 14, 13
Create a Table
log.info("setting keyspace...")
session.set_keyspace(KEYSPACE)
log.info("creating table...")
session.execute("""
CREATE TABLE IF NOT EXISTS mytable (
thekey text,
col1 text,
col2 text,
PRIMARY KEY (thekey, col1)
)
""")

Monday, October 14, 13
Insert a Row
query = SimpleStatement("""
INSERT INTO mytable (thekey, col1, col2)
VALUES ('key1', 'a', 'b')
""", consistency_level=ConsistencyLevel.ONE)
log.info("inserting row")
session.execute(query)

Monday, October 14, 13
Insert Rows (Prepared Statement)
prepared = session.prepare("""
INSERT INTO mytable (thekey, col1, col2)
VALUES (?, ?, ?)
""")
for i in range(10):
log.info("inserting row %d" % i)
bound = prepared.bind(("key%d" % i,
"b%d" % i,
"c%d" % i))
session.execute(bound)

Monday, October 14, 13
Query Results
future = session.execute_async("""
SELECT * FROM mytable WHERE thekey='key1'
""")
rows = future.result()
log.info("keytcol1tcol2")
log.info("---t----t----")
for row in rows:
log.info("t".join(row))

Monday, October 14, 13
Run It

Monday, October 14, 13
Cassandra Applications - Drivers
• DataStax Drivers for Cassandra
• Java
• C#
• Python
• more on the way

36
Monday, October 14, 13
Find Out More
Cassandra: http://cassandra.apache.org
DataStax Drivers: https://github.com/datastax
Documentation: http://www.datastax.com/docs
Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html
Developer Blog: http://www.datastax.com/dev/blog
Cassandra Community Site: http://planetcassandra.org
Download: http://planetcassandra.org/Download/DataStaxCommunityEdition
Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars
Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit

Monday, October 14, 13
©2013 DataStax Confidential. Do not distribute without consent.
Monday, October 14, 13

38

Contenu connexe

Tendances

Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
Axel Liljencrantz
 

Tendances (20)

Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
 
Do more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloudDo more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloud
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
 
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and PermissionsCassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
Cassandra Lunch #92: Securing Apache Cassandra - Managing Roles and Permissions
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
Scala in hulu's data platform
Scala in hulu's data platformScala in hulu's data platform
Scala in hulu's data platform
 
Caching In The Cloud
Caching In The CloudCaching In The Cloud
Caching In The Cloud
 

Similaire à Using Apache Cassandra: What is this thing, and how do I use it?

Similaire à Using Apache Cassandra: What is this thing, and how do I use it? (20)

Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on CassandraCassandra Lunch #23: Lucene Based Indexes on Cassandra
Cassandra Lunch #23: Lucene Based Indexes on Cassandra
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Devops kc
Devops kcDevops kc
Devops kc
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
 
Advanced data migration techniques for Amazon RDS
Advanced data migration techniques for Amazon RDSAdvanced data migration techniques for Amazon RDS
Advanced data migration techniques for Amazon RDS
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Leveraging chaos mesh in Astra Serverless testing
Leveraging chaos mesh in Astra Serverless testingLeveraging chaos mesh in Astra Serverless testing
Leveraging chaos mesh in Astra Serverless testing
 
Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Using Apache Cassandra: What is this thing, and how do I use it?

  • 1. Using Apache Cassandra for Big Data What is this thing, and how do I use it? Jeremiah Jordan Lead Software Engineer/Support @zanson ©2013 DataStax. Do not distribute without consent. Monday, October 14, 13 1
  • 2. Who I am • Jeremiah Jordan • Lead Software Engineer in Support at DataStax • Previously Senior Architect at Morningstar, Inc. • Using Cassandra since 0.6 • Before that, wrote code for the F22 Monday, October 14, 13
  • 3. Cassandra - An introduction Monday, October 14, 13
  • 4. Cassandra - Intro • Based on Amazon Dynamo and Google BigTable papers • Shared nothing • Distributed • Data safe as possible • Predictable scaling Dynamo BigTable 4 Monday, October 14, 13
  • 5. Cassandra - More than one server • All nodes participate in a cluster • Shared nothing • Add or remove as needed • More capacity? Add a server • Each node owns a number of tokens • Tokens denote a range of keys • 4 nodes? -> Key range/4 • Each node owns 1/4 the data 5 Monday, October 14, 13
  • 6. Cassandra - Locally Distributed • Client writes to any node • Node coordinates with others • Data replicated in parallel • Replication factor (RF): How many copies of your data? • RF = 3 here Each node stores 3/4 of clusters total data. 6 Monday, October 14, 13
  • 7. Cassandra - Geographically Distributed • Client writes local • Data syncs across WAN • Replication Factor per DC Single coordinator 7 Monday, October 14, 13
  • 8. Cassandra - Consistency • Consistency Level (CL) • Client specifies per read or write • ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks 8 Monday, October 14, 13
  • 9. Cassandra - Transparent to the application • A single node failure shouldn’t bring failure • Replication Factor + Consistency Level = Success • This example: • RF = 3 • CL = QUORUM >51% Ack so we are good! 9 Monday, October 14, 13
  • 10. Application Example - Layout • Active-Active • Service based DNS routing Cassandra Replication 10 Monday, October 14, 13
  • 11. Application Example - Uptime • Normal server maintenance • Application is unaware Cassandra Replication 11 Monday, October 14, 13
  • 12. Application Example - Failure • Data center failure Another happy user! • Data is safe. Route traffic. 12 33 Monday, October 14, 13
  • 13. Five Years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 May-10 0.7 Feb-11 1.0 Dec-11 DSE Monday, October 14, 13 1.2 Oct-12 2.0 Jul-13
  • 14. Cassandra 2.0 - Big new features Monday, October 14, 13
  • 15. Lightweight transactions: the problem Session 1 Session 2 SELECT * FROM users WHERE username = ’jbellis’ SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] [empty resultset] It’s a Race! INSERT INTO users (username,password) VALUES (’jbellis’,‘xdg44hh’) Who wins? Monday, October 14, 13 INSERT INTO users (userName,password) VALUES (’jbellis’,‘8dhh43k’)
  • 16. LWT: details • 4 round trips vs 1 for normal updates • Paxos - Paxos made easy • Immediate consistency with no leader election or failover • For reads, ConsistencyLevel.SERIAL • http://www.datastax.com/dev/blog/lightweight-transactions-incassandra-2-0 Monday, October 14, 13
  • 17. Using LWT • Don’t overwrite an existing record INSERT INTO USERS (username, email, ...) VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... ) IF NOT EXISTS; • Only update record if condition is met UPDATE USERS SET email = ’jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ’jbellis@datastax.com’; Monday, October 14, 13
  • 18. LWT: Use with caution • Great for 1% of your application • Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistencyhopeful-consistency-by-christos-kalantzis Monday, October 14, 13
  • 24. Setup Data and Log Directories Monday, October 14, 13
  • 27. Installing Cassandra Python Driver Monday, October 14, 13
  • 29. Install Python Cassandra Driver Monday, October 14, 13
  • 30. Connect and Create a Keyspace from cassandra.cluster import Cluster cluster = Cluster(['127.0.0.1']) session = cluster.connect() log.info("creating keyspace...") KEYSPACE = "testkeyspace" session.execute(""" CREATE KEYSPACE IF NOT EXISTS %s WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' } """ % KEYSPACE) Monday, October 14, 13
  • 31. Create a Table log.info("setting keyspace...") session.set_keyspace(KEYSPACE) log.info("creating table...") session.execute(""" CREATE TABLE IF NOT EXISTS mytable ( thekey text, col1 text, col2 text, PRIMARY KEY (thekey, col1) ) """) Monday, October 14, 13
  • 32. Insert a Row query = SimpleStatement(""" INSERT INTO mytable (thekey, col1, col2) VALUES ('key1', 'a', 'b') """, consistency_level=ConsistencyLevel.ONE) log.info("inserting row") session.execute(query) Monday, October 14, 13
  • 33. Insert Rows (Prepared Statement) prepared = session.prepare(""" INSERT INTO mytable (thekey, col1, col2) VALUES (?, ?, ?) """) for i in range(10): log.info("inserting row %d" % i) bound = prepared.bind(("key%d" % i, "b%d" % i, "c%d" % i)) session.execute(bound) Monday, October 14, 13
  • 34. Query Results future = session.execute_async(""" SELECT * FROM mytable WHERE thekey='key1' """) rows = future.result() log.info("keytcol1tcol2") log.info("---t----t----") for row in rows: log.info("t".join(row)) Monday, October 14, 13
  • 36. Cassandra Applications - Drivers • DataStax Drivers for Cassandra • Java • C# • Python • more on the way 36 Monday, October 14, 13
  • 37. Find Out More Cassandra: http://cassandra.apache.org DataStax Drivers: https://github.com/datastax Documentation: http://www.datastax.com/docs Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html Developer Blog: http://www.datastax.com/dev/blog Cassandra Community Site: http://planetcassandra.org Download: http://planetcassandra.org/Download/DataStaxCommunityEdition Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit Monday, October 14, 13
  • 38. ©2013 DataStax Confidential. Do not distribute without consent. Monday, October 14, 13 38