- The document is an introduction to Cassandra presented by Patrick McFadin, a Cassandra expert and chief evangelist at DataStax. It provides an overview of Cassandra's origins, architecture, data distribution, fault tolerance, and example applications.
- Cassandra is based on Amazon Dynamo and Google BigTable and allows for shared-nothing, predictable scaling across multiple servers through data replication and configurable consistency levels.
- Popular companies like Netflix, Spotify, and Instagram rely on Cassandra to handle high volumes of user data and queries in a highly available and resilient manner.
2. Who I am
2
• Patrick McFadin
• Solution Architect at DataStax
• Cassandra MVP
• User for years
• Follow me for more:
I talk about Cassandra and building scalable, resilient apps ALL THE TIME!
@PatrickMcFadin
Dude.
Uptime == $$
3. Five Years of Cassandra
0 1 2 3 4 5
0.1 0.3 0.6 0.7 1.0 1.2
...
2.0
DSE
Jul-08
5. Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable paper
• Shared nothing
• Data safe as possible
• Predictable scaling
5
Dynamo
BigTable
6. Cassandra - More than one server
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server
6
7. Cassandra - Locally Distributed
• Client writes to any node
• Node coordinates with others
• Data replicated in parallel
• Replication factor: How many
copies of your data?
• RF = 3 here
7
8. Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
8
9. Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read or write
9
• ALL = All replicas ack
• QUORUM = > 51% of replicas ack
• LOCAL_QUORUM = > 51% in local DC ack
• ONE = Only one replica acks
10. Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consistency Level = Success
• This example:
• RF = 3
• CL = QUORUM
10
>51% Ack so we are good!
11. Cassandra Applications - Drivers
• DataStax Drivers for Cassandra
• Java
• C#
• Python
• more on the way
11
12. Cassandra Applications - Connecting
• Create a pool of local servers
• Client just uses session to interact with Cassandra
12
!
contactPoints = {“10.0.0.1”,”10.0.0.2”}!
!
keyspace = “videodb”!
!
public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {!
!
cluster = Cluster!
.builder()!
.addContactPoints(!
! contactPoints.toArray(new String[contactPoints.size()]))!
.withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy())!
.withRetryPolicy(Policies.defaultRetryPolicy())!
.build();!
!
session = cluster.connect(keyspace);!
}
13. Cassandra Applications - Load balancing
• Token aware - Request sent to primary node with data
• Calls can be asynchronous and in parallel
13
1
2
3
4
5
6
Client
Thread
Node
Node
Node
Client
Thread
Client
Thread
Node
Driver
14. Cassandra Applications - Fault tolerance
• Try first with a Consistency Level of QUORUM
• If fails, retry with Consistency Level ONE
14
Client
Node
Node Replica
Replica
Node
Replica
15. Application Example - Layout
• Active-Active
• Service based DNS routing
15
Cassandra Replication
16. Application Example - Uptime
16
• Normal server maintenance
• Application is unaware
Cassandra Replication
17. Application Example - Failure
17
• Data center failure
• Data is safe. Route traffic.
33
Another happy user!
19. Netflix!
• If you haven’t heard their story… where have you been?
• 18B market cap — Runs on Cassandra
• User accounts
• Play lists
• Payments
• Statistics
20. Spotify
• Millions of songs. Millions of users.
• Playlists
• 1 billion playlists
• 30+ Cassandra clusters
• 50+ TB of data
• 40k req/sec peak
20
http://www.slideshare.net/noaresare/cassandra-nyc
21. Instagram(Facebook)
• Loads and loads of photos. (Probably yours)
• All in AWS
• Security audits
• News feed
• 20k writes/sec. 15k reads/sec.
21