This is the presentation I gave at the Reflections | Projections conference at UIUC. http://www.acm.uiuc.edu/conference/2013/ It is an introduction to some of the basics of Apache Cassandra, followed by actually getting it up and running. This presentation goes over what Apache Cassandra is and how to get it up and running on your development machine. It then goes over using the DataStax Python Driver and the Cassandra Query Language (CQL) to create tables, write data to them, and then read it back out.
2. Who I am
• Jeremiah Jordan
• Lead Software Engineer in Support at DataStax
• Previously Senior Architect at Morningstar, Inc.
• Using Cassandra since 0.6
• Before that, wrote code for the F22
Monday, October 14, 13
4. Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable papers
• Shared nothing
• Distributed
• Data safe as possible
• Predictable scaling
Dynamo
BigTable
4
Monday, October 14, 13
5. Cassandra - More than one server
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server
• Each node owns a number of tokens
• Tokens denote a range of keys
• 4 nodes? -> Key range/4
• Each node owns 1/4 the data
5
Monday, October 14, 13
6. Cassandra - Locally Distributed
• Client writes to any node
• Node coordinates with others
• Data replicated in parallel
• Replication factor (RF): How
many copies of your data?
• RF = 3 here
Each node stores 3/4 of
clusters total data.
6
Monday, October 14, 13
7. Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
Single coordinator
7
Monday, October 14, 13
8. Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read or write
• ALL = All replicas ack
• QUORUM = > 51% of replicas ack
• LOCAL_QUORUM = > 51% in local DC ack
• ONE = Only one replica acks
8
Monday, October 14, 13
9. Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consistency Level = Success
• This example:
• RF = 3
• CL = QUORUM
>51% Ack so we are good!
9
Monday, October 14, 13
10. Application Example - Layout
• Active-Active
• Service based DNS routing
Cassandra Replication
10
Monday, October 14, 13
11. Application Example - Uptime
• Normal server maintenance
• Application is unaware
Cassandra Replication
11
Monday, October 14, 13
12. Application Example - Failure
• Data center failure
Another happy user!
• Data is safe. Route traffic.
12
33
Monday, October 14, 13
13. Five Years of Cassandra
0.1
Jul-08
...
0.3
Jul-09
0.6
May-10
0.7
Feb-11
1.0
Dec-11
DSE
Monday, October 14, 13
1.2
Oct-12
2.0
Jul-13
15. Lightweight transactions: the problem
Session 1
Session 2
SELECT * FROM users
WHERE username = ’jbellis’
SELECT * FROM users
WHERE username = ’jbellis’
[empty resultset]
[empty resultset]
It’s a Race!
INSERT INTO users
(username,password)
VALUES (’jbellis’,‘xdg44hh’)
Who wins?
Monday, October 14, 13
INSERT INTO users
(userName,password)
VALUES (’jbellis’,‘8dhh43k’)
16. LWT: details
• 4 round trips vs 1 for normal updates
• Paxos - Paxos made easy
• Immediate consistency with no leader election or failover
• For reads, ConsistencyLevel.SERIAL
• http://www.datastax.com/dev/blog/lightweight-transactions-incassandra-2-0
Monday, October 14, 13
17. Using LWT
• Don’t overwrite an existing record
INSERT INTO USERS (username, email, ...)
VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... )
IF NOT EXISTS;
• Only update record if condition is met
UPDATE USERS
SET email = ’jonathan@datastax.com’, ...
WHERE username = ’jbellis’
IF email = ’jbellis@datastax.com’;
Monday, October 14, 13
18. LWT: Use with caution
• Great for 1% of your application
• Eventual consistency is your friend
• http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistencyhopeful-consistency-by-christos-kalantzis
Monday, October 14, 13