Contenu connexe
Similaire à Massively Scalable NoSQL with Apache Cassandra (20)
Massively Scalable NoSQL with Apache Cassandra
- 2. Big data
Analytics Realtime
?
(Hadoop) (“NoSQL”)
©2012 DataStax
- 4. eBay
Application/Use Case
• Social Signals: like/want/own features for
eBay product and item pages
• Hunch taste graph for eBay users and items
• Many time series use cases
Why Cassandra?
• Multi-datacenter
• Scalable
• Write performance
• Distributed counters
• Hadoop support
©2012 DataStax ACE
- 9. Disney
Application/Use Case
• Meet the data management needs of user
facing applications across The Walt Disney
Company with a single platform
Why Cassandra?
• DataStax Enterprise can tackle real-time
and search functions in the same cluster
• Scalability
• 24x7 uptime
©2012 DataStax NDI
- 13. SimpleReach
Application/Use Case
• SimpleReach tracks social actions for
content creators, from Twitter and
Facebook to Pinterest and Reddit, to deliver
detailed insights and clear metrics around
social behavior.
Why Cassandra?
• Very high velocity data ingest rate and
large data volumes
• Workload separation between realtime
and batch applications
©2012 DataStax NDE
- 14. SourceNinja
Application/Use Case
• SourceNinja notifies you to performance,
security, and bug fixes for the software you
depend on
Why Cassandra?
• Previous database system could not
handle load; HBase has too many points
of failure and was too slow
• Fast real time capabilities, batch analytics
on that data, and enterprise search
©2012 DataStax RDE
- 15. Netflix
Application/Use Case
• General purpose backend for large scale
highly available cloud based web services
supporting Netflix Streaming
Why Cassandra?
• Highly available, highly robust and no
schema change downtime
• Highly scalable, optimized for SSD
• Much lower cost than previous Oracle and
SimpleDB implementations
• Flexible data model
• Ability to directly influence/implement
OSS feature set
• Supports local and wide area distributed
operations, spanning US and Europe
©2012 DataStax RCE
- 18. Use case patterns
• Massively scalable
• High performance
• Reliable/Available
©2012 DataStax
- 20. reads/s writes/s
35000
30000
25000
20000
15000
10000
5000
Cassandra 0.6
0
©2012 DataStax
Cassandra 1.0
- 23. Availability
• “High availability implies that a single fault will not bring
down your system. Not ‘we’ll recover quickly.’”
-- Ben Coverston: DataStax
• “The biggest problem with failover is that you're almost
never using it until it really hurts. It's like backups that
you never test.”
-- Rick Branson: Instagram
©2012 DataStax
- 25. Partitioning
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
©2012 DataStax
- 26. Partitioning
Primary key determines placement*
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
©2012 DataStax
- 27. PK MD5 Hash
jim 5e02739678...
MD5* hash
carol a9a0198010... operation yields a
128-bit number
johnny f4eb27cea7... for keys
of any size.
suzy 78b421309e...
©2012 DataStax
- 29. Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 30. Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 31. Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 32. Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 33. Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 34. Replication
Node A Node B
Node D Node C
carol a9a0198010...
©2012 DataStax
- 35. Node A Node B
Node D Node C
carol a9a0198010...
©2012 DataStax
- 36. Node A Node B
Node D Node C
carol a9a0198010...
©2012 DataStax
- 37. Highlights
• Adding capacity is application-transparent and requires
no downtime
• No SPOF, not even temporarily
• No “primary” replica
• Configurable synchronous/asynchronous
• Tolerates node failure; never have to restart replication
“from scratch”
• “Smart” replication avoids correlated failures
©2012 DataStax
- 38. CQL: You got SQL in my NoSQL!
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE INDEX ON users(state);
SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
©2012 DataStax
- 42. Clustering in CQL3
CREATE TABLE sblocks (
block_id uuid,
subblock_id uuid,
data blob,
block_id subblock_id data
PRIMARY KEY (block_id,
subblock_id)
Block1 subblock A data A
);
Block1 subblock B data B
... ... ...
Block2 subblock C data C
Block2 subblock D data D
... ... ...
Block3 subblock E data E
Block3 subblock F data F
... ... ...
©2012 DataStax
- 43. Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;
©2012 DataStax
- 44. Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
X
birth_date int
);
CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;
©2012 DataStax
- 45. Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int,
email_addresses set<text>
);
UPDATE users
SET email_addresses = email_addresses + {‘jbellis@gmail.com’,
‘jbellis@datastax.com’};
©2012 DataStax
- 46. Big data
Analytics Realtime
?
(Hadoop) (“NoSQL”)
©2012 DataStax
- 50. Big data
Analytics Datastax Realtime
(Hadoop) Enterprise (Cassandra)
©2012 DataStax
- 52. Better Hadoop than Hadoop
• “Vanilla” Hadoop
• 8+ services to setup, monitor, backup, and recover
(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker,
Zookeeper, Region Server,...)
• Single points of failure
• Can't separate online and offline processing
• DataStax Enterprise
• Single, simplified component
• Self-organizes based on workload
• Peer to peer
• JobTracker failover
©2012 DataStax
- 53. Enterprise search with Solr
SELECT title FROM solr WHERE solr_query='title:natio*';
title
--------------------------------------------------------------------------
Bolivia national football team 2002
List of French born footballers who have played for other national teams
Lithuania national basketball team at Eurobasket 2009
Bolivia national football team 2000
Kenya national under-20 football team
Bolivia national football team 1999
Israel men's national inline hockey team
Bolivia national football team 2001
©2012 DataStax
- 55. Questions?
• http://www.datastax.com/docs
• http://www.datastax.com/dev/blog/whats-new-in-
cassandra-1-1
• http://www.datastax.com/dev/blog/schema-in-
cassandra-1-1
• http://www.datastax.com/products/enterprise
©2012 DataStax