Massively Scalable NoSQL with Apache Cassandra

Massively scalable NoSQL
with Apache Cassandra!
Jonathan Ellis
Project Chair, Apache Cassandra
CTO, DataStax
@spyced

Big data

Analytics Realtime
?
(Hadoop) (“NoSQL”)

©2012 DataStax

Some Casandra users

©2012 DataStax

eBay
Application/Use Case
• Social Signals: like/want/own features for
eBay product and item pages
• Hunch taste graph for eBay users and items
• Many time series use cases

Why Cassandra?
• Multi-datacenter
• Scalable
• Write performance
• Distributed counters
• Hadoop support

©2012 DataStax ACE

Time series data

©2012 DataStax

Multi-datacenter support

©2012 DataStax

Distributed counters

©2012 DataStax

Hadoop support

©2012 DataStax

Disney
• Meet the data management needs of user
facing applications across The Walt Disney
Company with a single platform

Why Cassandra?
• DataStax Enterprise can tackle real-time
and search functions in the same cluster
• Scalability
• 24x7 uptime

©2012 DataStax NDI

Multitenancy

©2012 DataStax

Enterprise search

©2012 DataStax

SimpleReach
• SimpleReach tracks social actions for
content creators, from Twitter and
Facebook to Pinterest and Reddit, to deliver
detailed insights and clear metrics around
social behavior.

Why Cassandra?
• Very high velocity data ingest rate and
large data volumes
• Workload separation between realtime
and batch applications

©2012 DataStax NDE

SourceNinja
• SourceNinja notiﬁes you to performance,
security, and bug ﬁxes for the software you
depend on

Why Cassandra?
• Previous database system could not
handle load; HBase has too many points
of failure and was too slow
• Fast real time capabilities, batch analytics
on that data, and enterprise search

©2012 DataStax RDE

Netflix
• General purpose backend for large scale
highly available cloud based web services
supporting Netflix Streaming

Why Cassandra?
• Highly available, highly robust and no
schema change downtime
• Highly scalable, optimized for SSD
• Much lower cost than previous Oracle and
SimpleDB implementations
• Flexible data model
• Ability to directly influence/implement
OSS feature set
• Supports local and wide area distributed
operations, spanning US and Europe

©2012 DataStax RCE

Optimized for SSD

©2012 DataStax

Open source

©2012 DataStax

Use case patterns
• Massively scalable
• High performance
• Reliable/Available

©2012 DataStax

reads/s writes/s

35000

30000

25000

20000

15000

10000

5000
Cassandra 0.6
0
©2012 DataStax
Cassandra 1.0

Classic partitioning with SPOF
partition 1 partition 2 partition 3 partition 4

router

client
©2012 DataStax

Availability
• “High availability implies that a single fault will not bring
down your system. Not ‘we’ll recover quickly.’”
-- Ben Coverston: DataStax

• “The biggest problem with failover is that you're almost
never using it until it really hurts. It's like backups that
you never test.”
-- Rick Branson: Instagram

©2012 DataStax

Fully distributed, no SPOF
client

p3
p6 p1
p1

p1

©2012 DataStax

Partitioning

jim age: 36 car: camaro gender: M

carol age: 37 car: subaru gender: F

johnny age:12 gender: M

suzy age:10 gender: F

©2012 DataStax

Partitioning
Primary key determines placement*

jim age: 36 car: camaro gender: M

carol age: 37 car: subaru gender: F

johnny age:12 gender: M

suzy age:10 gender: F

©2012 DataStax

PK MD5 Hash

jim 5e02739678...
MD5* hash
carol a9a0198010... operation yields a
128-bit number
johnny f4eb27cea7... for keys
of any size.
suzy 78b421309e...

©2012 DataStax

The “token ring”

Node A Node B

Node D Node C

©2012 DataStax

Start End
A 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

©2012 DataStax

Replication

Node A Node B

Node D Node C

carol a9a0198010...
©2012 DataStax

Node A Node B

Node D Node C

carol a9a0198010...
©2012 DataStax

Highlights
• Adding capacity is application-transparent and requires
no downtime
• No SPOF, not even temporarily
• No “primary” replica

• Configurable synchronous/asynchronous
• Tolerates node failure; never have to restart replication
“from scratch”
• “Smart” replication avoids correlated failures

©2012 DataStax

CQL: You got SQL in my NoSQL!
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);

CREATE INDEX ON users(state);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

©2012 DataStax

Strictly “realtime” focused
• No joins
• No subqueries
• No aggregation functions* or GROUP BY
• ORDER BY?

©2012 DataStax

Clustering in CQL3
CREATE TABLE sblocks (
    block_id uuid,
    subblock_id uuid,
    data blob,
block_id subblock_id data
    PRIMARY KEY (block_id,
subblock_id)
Block1 subblock A data A
);
Block1 subblock B data B
... ... ...

Block2 subblock C data C
Block2 subblock D data D
... ... ...

Block3 subblock E data E
Block3 subblock F data F
... ... ...
©2012 DataStax

Collections
name text,
state text,
birth_date int
);

CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);

SELECT *
FROM users NATURAL JOIN users_addresses;

©2012 DataStax

Collections
name text,
state text,

X
birth_date int
);

CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);

SELECT *
FROM users NATURAL JOIN users_addresses;

©2012 DataStax

Collections
name text,
state text,
birth_date int,
email_addresses set<text>
);

UPDATE users
SET email_addresses = email_addresses + {‘jbellis@gmail.com’,
‘jbellis@datastax.com’};

©2012 DataStax

Better Hadoop than Hadoop
• “Vanilla” Hadoop
• 8+ services to setup, monitor, backup, and recover
(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker,
Zookeeper, Region Server,...)

• Single points of failure
• Can't separate online and oﬄine processing

• DataStax Enterprise
• Single, simplified component
• Self-organizes based on workload
• Peer to peer
• JobTracker failover
©2012 DataStax

Enterprise search with Solr
SELECT title FROM solr WHERE solr_query='title:natio*';

title
--------------------------------------------------------------------------
Bolivia national football team 2002
List of French born footballers who have played for other national teams
Lithuania national basketball team at Eurobasket 2009
Kenya national under-20 football team
Israel men's national inline hockey team

©2012 DataStax

Questions?
• http://www.datastax.com/docs
• http://www.datastax.com/dev/blog/whats-new-in-
cassandra-1-1
• http://www.datastax.com/dev/blog/schema-in-
cassandra-1-1
• http://www.datastax.com/products/enterprise

©2012 DataStax

Massively Scalable NoSQL with Apache Cassandra

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Massively Scalable NoSQL with Apache Cassandra

Similaire à Massively Scalable NoSQL with Apache Cassandra (20)

Plus de jbellis

Plus de jbellis (20)

Dernier

Dernier (20)

Massively Scalable NoSQL with Apache Cassandra