SlideShare une entreprise Scribd logo
1  sur  84
Télécharger pour lire hors ligne
Cassandra 2.0 and 2.1
Jonathan Ellis
Project Chair, Apache Cassandra
CTO, DataStax
©2013 DataStax Confidential. Do not distribute without consent.

1
Five years of Cassandra

0.1
Jul-08

...

0.3
Jul-09

0.6
Jun-10

0.7

1.0
May-11

1.2
Apr-12

Mar-13

2.0
Mar-14

DSE

I’ve been working on Cassandra for five years now. Facebook open sourced it in July of 2008, and I started working on it at Rackspace in December. A year and a half later, I
started DataStax to commercialize it.
Core values
•Massive scalability
•High performance
•Reliability/Availabilty

For the first four years we focused on these three core values.

Cassandra
MySQL

HBase

Redis
New core value
•Massive scalability
•High performance
•Reliability/Availabilty
•Ease of use

CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);

CREATE INDEX ON
users(state);
SELECT * FROM users
WHERE state=‘Texas’
AND birth_date > 1950;

2013 saw us focus on a fourth value, ease of use, starting with the introduction of CQL3 in January with Cassandra 1.2.
CQL (Cassandra Query Language) is a dialect of SQL optimized for Cassandra. All the statements on the right of this slide are valid in both CQL and SQL.
Native Drivers
•CQL native protocol: efficient, lightweight, asynchronous
•Java (GA): https://github.com/datastax/java-driver
•.NET (GA): https://github.com/datastax/csharp-driver
•Python (Beta): https://github.com/datastax/pythondriver

•C++ (Beta): https://github.com/datastax/cpp-driver
•Coming soon: PHP, Ruby

We also introduced a native CQL protocol, cutting out the overhead and complexity of Thrift. DataStax has open sourced half a dozen native CQL drivers and is working on
more.
DataStax DevCenter

We’ve also released DevCenter, an interactive tool for exploring and querying your Cassandra
databases. DevCenter is the first tool of its kind for a NoSQL database.
Tracing
cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2);
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

activity
| timestamp
| source
| source_elapsed
-------------------------------------+--------------+-----------+---------------Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 |
540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 |
779
Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 |
63
Applying mutation | 00:02:37,016 | 127.0.0.2 |
220
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 |
250
Appending to commitlog | 00:02:37,016 | 127.0.0.2 |
277
Adding to memtable | 00:02:37,016 | 127.0.0.2 |
378
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 |
710
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 |
888
Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 |
2334
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 |
2550

Perhaps the biggest problem people have after deploying Cassandra is understanding what goes on under the hood. We introduced query tracing to shed some light on this.
One of the challenges is gathering information from all the nodes that participate in processing a query; here, the coordinator (in blue) receives the query from the client and
forwards it to a replica (in green) which then responds back to the coordinator.
Authentication
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator

We added authentication and authorization, following familiar patterns. Note that the default user and password is cassandra/cassandra, so good practice is to create a new
superuser and drop or change the password on the old one.
Apache Cassandra ships with password authentication built in; DSE (DataStax Enterprise) adds Kerberos single-sign-on integration.
Authentication
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator
CREATE USER robin
WITH PASSWORD 'manager' SUPERUSER;
ALTER USER cassandra
WITH PASSWORD 'newpassword';
LIST USERS;
DROP USER cassandra;
We added authentication and authorization, following familiar patterns. Note that the default user and password is cassandra/cassandra, so good practice is to create a new
superuser and drop or change the password on the old one.
Apache Cassandra ships with password authentication built in; DSE (DataStax Enterprise) adds Kerberos single-sign-on integration.
Authorization
[cassandra.yaml]
authorizer: CassandraAuthorizer
GRANT select ON audit TO jonathan;
GRANT modify ON users TO robin;
GRANT all ON ALL KEYSPACES TO lara;

select and modify privileges may be granted separately or together to users on a per-table or per-keyspace basis.
Cassandra 2.0

Everything I’ve talked about so far is “ancient history” from Cassandra 1.2, but I wanted to cover it again as a refresher. Now let’s talk about what we added for Cassandra 2.0,
released in September.
Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';

The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)

SELECT name
FROM users
WHERE username = 'pmcfadin';

The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)

SELECT name
FROM users
WHERE username = 'pmcfadin';

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00');

(0 rows)

The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)

SELECT name
FROM users
WHERE username = 'pmcfadin';

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00');

(0 rows)

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01');

The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
Race condition
SELECT name
FROM users
WHERE username = 'pmcfadin';
(0 rows)

SELECT name
FROM users
WHERE username = 'pmcfadin';

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00');

(0 rows)

This one wins

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01');

The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where
readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others.
Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt
to create the account, resulting in corruption.
Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;

Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will
get back the row that was created concurrently as an explanation.
UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
----------True

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;

Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will
get back the row that was created concurrently as an explanation.
UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
Lightweight transactions
INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ba27e03fd9...',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
----------True

INSERT INTO users
(username, name, email,
password, created_date)
VALUES ('pmcfadin',
'Patrick McFadin',
['patrick@datastax.com'],
'ea24e13ad9...',
'2011-06-20 13:50:01')
IF NOT EXISTS;

[applied] | username | created_date
| name
-----------+----------+----------------+---------------False | pmcfadin | 2011-06-20 ... | Patrick McFadin

Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will
get back the row that was created concurrently as an explanation.
UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
Paxos
•All operations are quorum-based
•Each replica sends information about unfinished
operations to the leader during prepare

•Paxos made Simple

Under the hood, lightweight transactions are implemented with the Paxos consensus protocol.
Details
•Paxos state is durable
•Immediate consistency with no leader election or failover
•ConsistencyLevel.SERIAL
•http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0

•4 round trips vs 1 for normal updates

Paxos has these implications for our implementation.
Use with caution
•Great for 1% of your application
•Eventual consistency is your friend
• http://www.slideshare.net/planetcassandra/c-summit-2013-

eventual-consistency-hopeful-consistency-by-christos-kalantzis

“4 round trips” is the big downside for Paxos. This makes lightweight transactions a big performance hit in single-datacenter deployments and wildly impractical for multidatacenter clusters. They should only be used for targeted pieces of an application when the alternative is corruption, like our account creation example.
Cursors (before)
CREATE TABLE timeline (
  user_id uuid,
  tweet_id timeuuid,
  tweet_author uuid,
tweet_body text,
  PRIMARY KEY (user_id, tweet_id)
);

SELECT *
FROM timeline
WHERE (user_id = :last_key
AND tweet_id > :last_tweet)
OR token(user_id) > token(:last_key)
LIMIT 100

Cassandra 2.0 introduced cursors to the native protocol. This makes paging through large resultsets much simpler. Note how we need one clause per component of the
primary key to fetch the next 100 rows here.
Cursors (after)
SELECT *
FROM timeline

Now Cassandra handles the details of getting extra results as you iterate through a resultset. In fact, our cursors are a little bit smarter than in your favorite RDBMS (relational
database management system) since they are failover-aware: if the coordinator in use fails, the cursor will pick up where it left off against a different node in the cluster.
Other CQL improvements

We made some other miscellaneous improvements in CQL for 2.0 as well.
Other CQL improvements
•SELECT DISTINCT pk

We made some other miscellaneous improvements in CQL for 2.0 as well.
Other CQL improvements
•SELECT DISTINCT pk
•CREATE TABLE IF NOT EXISTS table

We made some other miscellaneous improvements in CQL for 2.0 as well.
Other CQL improvements
•SELECT DISTINCT pk
•CREATE TABLE IF NOT EXISTS table
•SELECT ... AS
• SELECT

event_id, dateOf(created_at) AS creation_date

We made some other miscellaneous improvements in CQL for 2.0 as well.
Other CQL improvements
•SELECT DISTINCT pk
•CREATE TABLE IF NOT EXISTS table
•SELECT ... AS
• SELECT

event_id, dateOf(created_at) AS creation_date

•ALTER TABLE DROP column
•

We made some other miscellaneous improvements in CQL for 2.0 as well.
On-Heap/Off-Heap
On-Heap
Managed by GC

Off-Heap
Not managed by GC

Java Process

We’ve put a lot of effort into improveming how Cassandra manages its memory. You’re looking at a limit of about 8GB for a JVM heap, even though modern servers have much
more RAM available. So we’re optimizing heap use, pushing internal structures into off-heap memory where possible.
Read path (per sstable)
Bloom
filter

Memory
Disk

To understand what we’ve done, I need to explain how a read works in Cassandra.
Read path (per sstable)
Bloom
filter

Memory
Disk

To understand what we’ve done, I need to explain how a read works in Cassandra.

Partition
key cache
Read path (per sstable)
Bloom
filter
Partition
summary
Memory
Disk

To understand what we’ve done, I need to explain how a read works in Cassandra.

0X...
0X...
0X...

Partition
key cache
Read path (per sstable)
Bloom
filter
Partition
summary
0X...
0X...
0X...

Memory
Disk

0X...
0X...
0X...
0X...

Partition
index
To understand what we’ve done, I need to explain how a read works in Cassandra.

Partition
key cache
Read path (per sstable)
Bloom
filter
Compression
offsets

Partition
summary
0X...
0X...
0X...

Memory
Disk

0X...
0X...
0X...
0X...

Partition
index
To understand what we’ve done, I need to explain how a read works in Cassandra.

Partition
key cache
Read path (per sstable)
Bloom
filter
Compression
offsets

Partition
summary
0X...
0X...
0X...

Memory
Disk

0X...
0X...
0X...
0X...

Data

Partition
index

To understand what we’ve done, I need to explain how a read works in Cassandra.

Partition
key cache
Off heap in 2.0
Partition key bloom filter
1-2GB per billion partitions
Bloom
filter
Compression
offsets

Partition
summary
0X...
0X...
0X...

Memory
Disk

Partition
key cache

0X...
0X...
0X...
0X...

Data

Partition
index

These are the components that are allocated off-heap now. We use reference counting to deallocate them when the sstable (data file) they are associated with is obsoleted by
compaction.
Off heap in 2.0
Compression metadata
~1-3GB per TB compressed
Bloom
filter
Compression
offsets

Partition
summary
0X...
0X...
0X...

Memory
Disk

0X...
0X...
0X...
0X...

Data

Partition
index

Partition
key cache
Off heap in 2.0
Partition index summary
(depends on rows per partition)
Bloom
filter
Compression
offsets

Partition
summary
0X...
0X...
0X...

Memory
Disk

0X...
0X...
0X...
0X...

Data

Partition
index

Partition
key cache
Compaction
•Single-pass, always
•LCS performs STCS in L0

LCS = leveled compaction strategy
STCS = size-tiered compaction strategy
Healthy leveled compaction

L0
L1
L2
L3
L4
L5

The goal of leveled compaction is to provide a read performance guarantee. We divide the sstables up into levels, where each level has 10x as much data as the previous (so
the diagram here is not to scale!), and guarantee that any given row is only present in at most one sstable per level.
Newly flushed sstables start in level zero, which is not yet processed into the tiered levels, and the one-per-sstable rule does not apply there. So we need to check potentially
each sstable in L0.
Sad leveled compaction

L0
L1
L2
L3
L4
L5

The problem is that we can fairly easily flush new sstables to L0 faster than compaction can level them. That results in poor read performance since we need to check so many
sstables for each row. This in turn results in even less i/o available for compaction and L0 will fall even further behind.
STCS in L0

L0
L1
L2
L3
L4
L5

So what we do in 2.0 is perform size-tiered compaction when L0 falls behind. This doesn’t magically make LCS faster, since we still need to process these sstables into the
levels, but it does mean that we prevent read performance from going through the floor in the meantime.
A closer look at reads
90%
busy

Client

Coordinator
30%
busy

40%
busy

Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
A closer look at reads
90%
busy

Client

Coordinator
30%
busy

40%
busy

Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
A closer look at reads
90%
busy

Client

Coordinator
30%
busy

40%
busy

Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
A closer look at reads
90%
busy

Client

Coordinator
30%
busy

40%
busy

Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
A closer look at reads
90%
busy

Client

Coordinator
30%
busy

40%
busy

Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to
the client.
A failure
90%
busy

Client

Coordinator
30%
busy

40%
busy

What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
A failure
90%
busy

Client

Coordinator
30%
busy

40%
busy

What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
A failure
90%
busy

Client

Coordinator
30%
busy

40%
busy

What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
A failure
90%
busy

Client

X

Coordinator

30%
busy

40%
busy

What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
A failure
90%
busy

X

Coordinator

Client

30%
busy

timeout

40%
busy

What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
Rapid read protection
90%
busy

Client

Coordinator
30%
busy

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid read protection
90%
busy

Client

Coordinator
30%
busy

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid read protection
90%
busy

Client

Coordinator
30%
busy

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid read protection
90%
busy

Client

X

Coordinator

30%
busy

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid read protection
90%
busy

Client

X

Coordinator

30%
busy

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid read protection
90%
busy

Client

X

Coordinator

30%
busy

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid read protection
90%
busy

X

Coordinator

Client

30%
busy

success

40%
busy

In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
Rapid Read Protection

NONE
Here we have a graph of read performance over time in a small four-node cluster. One of the nodes is killed halfway through. You can see how the rapid read protection results
in a much lower impact on throughput. (There is still some drop since we need to repeat 25% of the queries against other relicas all at once.)
Latency (mid-compaction)

Rapid Read Protection can also reduce latency variance. Look at the 99.9th percentile numbers here. With no rapid read protection, the slowest 0.1% of reads took almost
50ms. Retrying the slowest 10% of queries brings that down to 14.5ms. If we only retry the slowest 1%, that’s 19.6ms. But note that issuing extra reads for all requests
actually results in a higher 99th percentile! Looking at the throughput number shows us why -- we’re running out of capacity in our cluster to absorb the extra requests.
Cassandra 2.1
User defined types
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<text, address>
)
SELECT id, name, addresses.city, addresses.phones FROM users;
id |
name | addresses.city |
addresses.phones
--------------------+----------------+-------------------------63bf691f | jbellis |
Austin | {'512-4567', '512-9999'}

We introduced collections in Cassandra 1.2, but they had a number of limitations. One is that collections could not contain other collections. User defined types in 2.1 allow
that. Here we have an address type, that holds a set of phone numbers. We can then use that address type in a map in the users table.
Collection indexing
CREATE TABLE songs (
id uuid PRIMARY KEY,
artist text,
album text,
title text,
data blob,
tags set<text>
);
CREATE INDEX song_tags_idx ON songs(tags);
SELECT * FROM songs WHERE tags CONTAINS 'blues';
id
| album
| artist
| tags
| title
----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind

2.1 also brings index support to
collections.
Inefficient bloom filters

+
=?
Inefficient bloom filters

+
=
Inefficient bloom filters

+
=
Inefficient bloom filters
HyperLogLog applied
HLL and compaction
HLL and compaction
HLL and compaction
More-efficient repair

We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we
only have to send actual rows across the network where the tree indicates an inconsistency.
More-efficient repair

We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we
only have to send actual rows across the network where the tree indicates an inconsistency.
More-efficient repair

We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we
only have to send actual rows across the network where the tree indicates an inconsistency.
More-efficient repair

The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair
ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
More-efficient repair

The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair
ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
More-efficient repair

The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair
ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
More-efficient repair

So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as
long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
More-efficient repair

So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as
long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
More-efficient repair

So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as
long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
Performance
•Memtable memory use cut by 85%
• larger sstables, less compaction
• ~50% better write performance

•Full results after beta1
Questions?

Contenu connexe

Tendances

Windows Azure Kick Start - Explore Storage and SQL Azure
Windows Azure Kick Start - Explore Storage and SQL AzureWindows Azure Kick Start - Explore Storage and SQL Azure
Windows Azure Kick Start - Explore Storage and SQL AzureEric D. Boyd
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
DDD, CQRS and testing with ASP.Net MVC
DDD, CQRS and testing with ASP.Net MVCDDD, CQRS and testing with ASP.Net MVC
DDD, CQRS and testing with ASP.Net MVCAndy Butland
 

Tendances (8)

Windows Azure Kick Start - Explore Storage and SQL Azure
Windows Azure Kick Start - Explore Storage and SQL AzureWindows Azure Kick Start - Explore Storage and SQL Azure
Windows Azure Kick Start - Explore Storage and SQL Azure
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
 
Farheen
Farheen Farheen
Farheen
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
REST
RESTREST
REST
 
DDD, CQRS and testing with ASP.Net MVC
DDD, CQRS and testing with ASP.Net MVCDDD, CQRS and testing with ASP.Net MVC
DDD, CQRS and testing with ASP.Net MVC
 

Similaire à Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis

DataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with JavaDataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with Javacarolinedatastax
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Vincent Royer
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Major relational database platforms available at the moment microsoft
Major relational database platforms available at the moment microsoftMajor relational database platforms available at the moment microsoft
Major relational database platforms available at the moment microsoftMy-Writing-Expert.org
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
YaJug - Cassandra for Java Developers
YaJug - Cassandra for Java DevelopersYaJug - Cassandra for Java Developers
YaJug - Cassandra for Java DevelopersMichaël Figuière
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandraNavanit Katiyar
 
Accessing my sql_from_java
Accessing my sql_from_javaAccessing my sql_from_java
Accessing my sql_from_javaTran Rean
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersoazabir
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL DatabaseMohammad Alghanem
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloudLiran Zelkha
 
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdfDBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdfAbhishekKumarPandit5
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersMichaël Figuière
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesDave Stokes
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
Introduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 PlatformIntroduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 PlatformSrinath Perera
 

Similaire à Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis (20)

DataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with JavaDataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with Java
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Major relational database platforms available at the moment microsoft
Major relational database platforms available at the moment microsoftMajor relational database platforms available at the moment microsoft
Major relational database platforms available at the moment microsoft
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
YaJug - Cassandra for Java Developers
YaJug - Cassandra for Java DevelopersYaJug - Cassandra for Java Developers
YaJug - Cassandra for Java Developers
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
Accessing my sql_from_java
Accessing my sql_from_javaAccessing my sql_from_java
Accessing my sql_from_java
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloud
 
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdfDBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
DBMS LAB FILE1 task 1 , task 2, task3 and many more.pdf
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for Developers
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New Features
 
Node.js with MySQL.pdf
Node.js with MySQL.pdfNode.js with MySQL.pdf
Node.js with MySQL.pdf
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Introduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 PlatformIntroduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 Platform
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Tokyo Cassandra Summit 2014: Apache Cassandra 2.0 + 2.1 by Jonathan Ellis

  • 1. Cassandra 2.0 and 2.1 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax ©2013 DataStax Confidential. Do not distribute without consent. 1
  • 2. Five years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 Jun-10 0.7 1.0 May-11 1.2 Apr-12 Mar-13 2.0 Mar-14 DSE I’ve been working on Cassandra for five years now. Facebook open sourced it in July of 2008, and I started working on it at Rackspace in December. A year and a half later, I started DataStax to commercialize it.
  • 3. Core values •Massive scalability •High performance •Reliability/Availabilty For the first four years we focused on these three core values. Cassandra MySQL HBase Redis
  • 4. New core value •Massive scalability •High performance •Reliability/Availabilty •Ease of use CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950; 2013 saw us focus on a fourth value, ease of use, starting with the introduction of CQL3 in January with Cassandra 1.2. CQL (Cassandra Query Language) is a dialect of SQL optimized for Cassandra. All the statements on the right of this slide are valid in both CQL and SQL.
  • 5. Native Drivers •CQL native protocol: efficient, lightweight, asynchronous •Java (GA): https://github.com/datastax/java-driver •.NET (GA): https://github.com/datastax/csharp-driver •Python (Beta): https://github.com/datastax/pythondriver •C++ (Beta): https://github.com/datastax/cpp-driver •Coming soon: PHP, Ruby We also introduced a native CQL protocol, cutting out the overhead and complexity of Thrift. DataStax has open sourced half a dozen native CQL drivers and is working on more.
  • 6. DataStax DevCenter We’ve also released DevCenter, an interactive tool for exploring and querying your Cassandra databases. DevCenter is the first tool of its kind for a NoSQL database.
  • 7. Tracing cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550 Perhaps the biggest problem people have after deploying Cassandra is understanding what goes on under the hood. We introduced query tracing to shed some light on this. One of the challenges is gathering information from all the nodes that participate in processing a query; here, the coordinator (in blue) receives the query from the client and forwards it to a replica (in green) which then responds back to the coordinator.
  • 8. Authentication [cassandra.yaml] authenticator: PasswordAuthenticator # DSE offers KerberosAuthenticator We added authentication and authorization, following familiar patterns. Note that the default user and password is cassandra/cassandra, so good practice is to create a new superuser and drop or change the password on the old one. Apache Cassandra ships with password authentication built in; DSE (DataStax Enterprise) adds Kerberos single-sign-on integration.
  • 9. Authentication [cassandra.yaml] authenticator: PasswordAuthenticator # DSE offers KerberosAuthenticator CREATE USER robin WITH PASSWORD 'manager' SUPERUSER; ALTER USER cassandra WITH PASSWORD 'newpassword'; LIST USERS; DROP USER cassandra; We added authentication and authorization, following familiar patterns. Note that the default user and password is cassandra/cassandra, so good practice is to create a new superuser and drop or change the password on the old one. Apache Cassandra ships with password authentication built in; DSE (DataStax Enterprise) adds Kerberos single-sign-on integration.
  • 10. Authorization [cassandra.yaml] authorizer: CassandraAuthorizer GRANT select ON audit TO jonathan; GRANT modify ON users TO robin; GRANT all ON ALL KEYSPACES TO lara; select and modify privileges may be granted separately or together to users on a per-table or per-keyspace basis.
  • 11. Cassandra 2.0 Everything I’ve talked about so far is “ancient history” from Cassandra 1.2, but I wanted to cover it again as a refresher. Now let’s talk about what we added for Cassandra 2.0, released in September.
  • 12. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others. Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt to create the account, resulting in corruption.
  • 13. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others. Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt to create the account, resulting in corruption.
  • 14. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others. Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt to create the account, resulting in corruption.
  • 15. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01'); The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others. Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt to create the account, resulting in corruption.
  • 16. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) This one wins INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01'); The first such feature is Lightweight Transactions. This is motivated by the fact that while Cassandra’s eventually consistent model can provide “strong consistency,” where readers always see the most recent writes, it cannot provide “linearizable consistency,” where some writes are guaranteed to happen sequentially with respect to others. Consider the case of user account creation. If two users attempt to create the same name simultaneously, they will both see that it does not yet exist and proceed to attempt to create the account, resulting in corruption.
  • 17. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will get back the row that was created concurrently as an explanation. UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
  • 18. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS; Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will get back the row that was created concurrently as an explanation. UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
  • 19. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS; [applied] | username | created_date | name -----------+----------+----------------+---------------False | pmcfadin | 2011-06-20 ... | Patrick McFadin Lightweight transactions roll the “check” and “modify” stages into a single atomic operation, so we can guarantee that only one user will create a given account. The other will get back the row that was created concurrently as an explanation. UPDATE can similarly take an IF ... clause checking that no modifications have been made to a set of columns since they were read.
  • 20. Paxos •All operations are quorum-based •Each replica sends information about unfinished operations to the leader during prepare •Paxos made Simple Under the hood, lightweight transactions are implemented with the Paxos consensus protocol.
  • 21. Details •Paxos state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0 •4 round trips vs 1 for normal updates Paxos has these implications for our implementation.
  • 22. Use with caution •Great for 1% of your application •Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013- eventual-consistency-hopeful-consistency-by-christos-kalantzis “4 round trips” is the big downside for Paxos. This makes lightweight transactions a big performance hit in single-datacenter deployments and wildly impractical for multidatacenter clusters. They should only be used for targeted pieces of an application when the alternative is corruption, like our account creation example.
  • 23. Cursors (before) CREATE TABLE timeline (   user_id uuid,   tweet_id timeuuid,   tweet_author uuid, tweet_body text,   PRIMARY KEY (user_id, tweet_id) ); SELECT * FROM timeline WHERE (user_id = :last_key AND tweet_id > :last_tweet) OR token(user_id) > token(:last_key) LIMIT 100 Cassandra 2.0 introduced cursors to the native protocol. This makes paging through large resultsets much simpler. Note how we need one clause per component of the primary key to fetch the next 100 rows here.
  • 24. Cursors (after) SELECT * FROM timeline Now Cassandra handles the details of getting extra results as you iterate through a resultset. In fact, our cursors are a little bit smarter than in your favorite RDBMS (relational database management system) since they are failover-aware: if the coordinator in use fails, the cursor will pick up where it left off against a different node in the cluster.
  • 25. Other CQL improvements We made some other miscellaneous improvements in CQL for 2.0 as well.
  • 26. Other CQL improvements •SELECT DISTINCT pk We made some other miscellaneous improvements in CQL for 2.0 as well.
  • 27. Other CQL improvements •SELECT DISTINCT pk •CREATE TABLE IF NOT EXISTS table We made some other miscellaneous improvements in CQL for 2.0 as well.
  • 28. Other CQL improvements •SELECT DISTINCT pk •CREATE TABLE IF NOT EXISTS table •SELECT ... AS • SELECT event_id, dateOf(created_at) AS creation_date We made some other miscellaneous improvements in CQL for 2.0 as well.
  • 29. Other CQL improvements •SELECT DISTINCT pk •CREATE TABLE IF NOT EXISTS table •SELECT ... AS • SELECT event_id, dateOf(created_at) AS creation_date •ALTER TABLE DROP column • We made some other miscellaneous improvements in CQL for 2.0 as well.
  • 30. On-Heap/Off-Heap On-Heap Managed by GC Off-Heap Not managed by GC Java Process We’ve put a lot of effort into improveming how Cassandra manages its memory. You’re looking at a limit of about 8GB for a JVM heap, even though modern servers have much more RAM available. So we’re optimizing heap use, pushing internal structures into off-heap memory where possible.
  • 31. Read path (per sstable) Bloom filter Memory Disk To understand what we’ve done, I need to explain how a read works in Cassandra.
  • 32. Read path (per sstable) Bloom filter Memory Disk To understand what we’ve done, I need to explain how a read works in Cassandra. Partition key cache
  • 33. Read path (per sstable) Bloom filter Partition summary Memory Disk To understand what we’ve done, I need to explain how a read works in Cassandra. 0X... 0X... 0X... Partition key cache
  • 34. Read path (per sstable) Bloom filter Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Partition index To understand what we’ve done, I need to explain how a read works in Cassandra. Partition key cache
  • 35. Read path (per sstable) Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Partition index To understand what we’ve done, I need to explain how a read works in Cassandra. Partition key cache
  • 36. Read path (per sstable) Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index To understand what we’ve done, I need to explain how a read works in Cassandra. Partition key cache
  • 37. Off heap in 2.0 Partition key bloom filter 1-2GB per billion partitions Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk Partition key cache 0X... 0X... 0X... 0X... Data Partition index These are the components that are allocated off-heap now. We use reference counting to deallocate them when the sstable (data file) they are associated with is obsoleted by compaction.
  • 38. Off heap in 2.0 Compression metadata ~1-3GB per TB compressed Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index Partition key cache
  • 39. Off heap in 2.0 Partition index summary (depends on rows per partition) Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index Partition key cache
  • 40. Compaction •Single-pass, always •LCS performs STCS in L0 LCS = leveled compaction strategy STCS = size-tiered compaction strategy
  • 41. Healthy leveled compaction L0 L1 L2 L3 L4 L5 The goal of leveled compaction is to provide a read performance guarantee. We divide the sstables up into levels, where each level has 10x as much data as the previous (so the diagram here is not to scale!), and guarantee that any given row is only present in at most one sstable per level. Newly flushed sstables start in level zero, which is not yet processed into the tiered levels, and the one-per-sstable rule does not apply there. So we need to check potentially each sstable in L0.
  • 42. Sad leveled compaction L0 L1 L2 L3 L4 L5 The problem is that we can fairly easily flush new sstables to L0 faster than compaction can level them. That results in poor read performance since we need to check so many sstables for each row. This in turn results in even less i/o available for compaction and L0 will fall even further behind.
  • 43. STCS in L0 L0 L1 L2 L3 L4 L5 So what we do in 2.0 is perform size-tiered compaction when L0 falls behind. This doesn’t magically make LCS faster, since we still need to process these sstables into the levels, but it does mean that we prevent read performance from going through the floor in the meantime.
  • 44. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to the client.
  • 45. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to the client.
  • 46. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to the client.
  • 47. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to the client.
  • 48. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy Now let’s look at reads from the perspective of the whole cluster. A client sends a query to a coordinator, which forwards it to the least-busy replica, and returns the answer to the client.
  • 49. A failure 90% busy Client Coordinator 30% busy 40% busy What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
  • 50. A failure 90% busy Client Coordinator 30% busy 40% busy What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
  • 51. A failure 90% busy Client Coordinator 30% busy 40% busy What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
  • 52. A failure 90% busy Client X Coordinator 30% busy 40% busy What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
  • 53. A failure 90% busy X Coordinator Client 30% busy timeout 40% busy What happens if that replica fails before replying? In earlier versions of Cassandra, we’d return a timeout error.
  • 54. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 55. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 56. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 57. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 58. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 59. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 60. Rapid read protection 90% busy X Coordinator Client 30% busy success 40% busy In Cassandra 2.0, the coordinator will detect slow responses and retry those queries to another replica to prevent timing out.
  • 61. Rapid Read Protection NONE Here we have a graph of read performance over time in a small four-node cluster. One of the nodes is killed halfway through. You can see how the rapid read protection results in a much lower impact on throughput. (There is still some drop since we need to repeat 25% of the queries against other relicas all at once.)
  • 62. Latency (mid-compaction) Rapid Read Protection can also reduce latency variance. Look at the 99.9th percentile numbers here. With no rapid read protection, the slowest 0.1% of reads took almost 50ms. Retrying the slowest 10% of queries brings that down to 14.5ms. If we only retry the slowest 1%, that’s 19.6ms. But note that issuing extra reads for all requests actually results in a higher 99th percentile! Looking at the throughput number shows us why -- we’re running out of capacity in our cluster to absorb the extra requests.
  • 64. User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) SELECT id, name, addresses.city, addresses.phones FROM users; id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------63bf691f | jbellis | Austin | {'512-4567', '512-9999'} We introduced collections in Cassandra 1.2, but they had a number of limitations. One is that collections could not contain other collections. User defined types in 2.1 allow that. Here we have an address type, that holds a set of phone numbers. We can then use that address type in a map in the users table.
  • 65. Collection indexing CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); CREATE INDEX song_tags_idx ON songs(tags); SELECT * FROM songs WHERE tags CONTAINS 'blues'; id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind 2.1 also brings index support to collections.
  • 74. More-efficient repair We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we only have to send actual rows across the network where the tree indicates an inconsistency.
  • 75. More-efficient repair We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we only have to send actual rows across the network where the tree indicates an inconsistency.
  • 76. More-efficient repair We’re making some big improvements to repair for 2.1. Repair is very network-efficient because we build a hash tree of the data to compare across different replicas. Then we only have to send actual rows across the network where the tree indicates an inconsistency.
  • 77. More-efficient repair The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
  • 78. More-efficient repair The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
  • 79. More-efficient repair The problem is that this tree is constructed at repair time, so when we add some new sstables and repair again, merkle tree (hash tree) construction has to start over. So repair ends up taking time proportional to the amount of data in the cluster, not because of network transfers but because of tree construction time.
  • 80. More-efficient repair So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
  • 81. More-efficient repair So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
  • 82. More-efficient repair So what we’re doing in 2.1 is allowing Cassandra to mark sstables as repaired and only build merkle trees from sstables that are new since the last repair. This means that as long as you run repair regularly, it will stay lightweight and performant even as your dataset grows.
  • 83. Performance •Memtable memory use cut by 85% • larger sstables, less compaction • ~50% better write performance •Full results after beta1