Cassandra Overview

What Is It?
● It is a persistent database, but not an
RDBMS – more on API later
● It can run as a single instance or as a part
of a cluster.
● All nodes are equal, no master, no slaves
● The cluster can be distributed within a
single DC or across multiple DCs.
● Multiple DCs can be Active-Active for
performance or Active-Passive for DR

Simple API
● Get, Put, Delete – all by key
● Batch put and delete – save wire time
● Range queries (iterate over sequence of
keys)
● Target individual columns within a row –
Get and Put
● Native integration available for Hadoop
MapReduce
● CQL – SQL like language

Consistent Hash Ring
● Conceptually all nodes in a cluster are on
a ring of hash values, “tokens”
● Each node is assigned a token range on
the ring
● A key's hash (token) places it on the ring,
within a specific node's token range
● The hash is consistent, meaning the
location of data is consistent and
predictable

0 => 2127 (Random
Partitoner)
K1 => H1 (token) 2127 0
H1 => R4 (primary = N4)
N = 3
N1
RS = N4, N5, N6 N8 R1
R2
R8
N7
N2

R7
R3

N6 N3
R6 R4

N5 R5 N4
H1

Replication
● Replication Factor (N) determines how
many replicas exist for each key
● Location of replicas is determined by
consistent hash ring and the “partitioner”
● Generally, N=3 means data will be placed
on node N, N+1, N+2 on the ring (This can
vary based on placement strategy, but is
predictable)
● Powerful because no query required to
find the node(s) containing a key

Consistency
● Consistency is “eventual” in Cassandra –
it will always work to create N (Replication
Factor) replicas
● Write Consistency (W) defines how many
replicas are guaranteed per “put” request
● Read Consistency (R) defines how many
replicas are consulted before responding
● W and R are tunable per request,
therefore consistency is tunable as well

Schema Overview
● Keyspace (“database”) contains one or
more ColumnFamilies
● ColumnFamily (“table”) contains zero or
more rows
● A Row must contain one or more columns
● ColumnFamilies are indexed by key
(“rows”, but more like hash map)
● Rows within the same CF may have
different number of columns, and different

column names!!

Example
UserData (Keyspace)
UserAttributes (ColumnFamily, sort = UTF8)
Age Sex Weight
Ellie
4 Female 32
Age Sex
Sammy
2 Male
Age EyeColor Height Sex
Henry
2 Blue 30 Male

UserAccessLog (ColumnFamily, sort = Long)
7/20/2010 7/22/2010
Sammy

7/22/2010 7/23/2010 7/24/2010
Henry

Columns
● Column names (not values) are sorted,
per key
● 32 bit limit to number of columns per key –
entire column must fit in RAM, on one
machine
● Can retrieve/update/delete all columns,
columns by name, or range of columns
● A key (or row) must contain at least one
Column, otherwise considered deleted

Thrift Read Methods
● get – return a single column for a single
key
● get_slice – return multiple columns for a
single key
● multiget_slice – return multiple columns
for a list of keys
● get_range_slices – return multiple
columns for a “range” of keys
● Most use “high level” client (Hector,

Pycassa, etc)

Thrift Write Methods
● insert – insert/update a single column for a
single key (most call this method, “put”)
● batch_mutate – insert/update/remove
multiple columns for multiple keys in
multiple ColumnFamilies
● remove – remove a single column (or
entire row) for a single key

Useful References
● http://www.allthingsdistributed.com/2007/1
0/amazons_dynamo.html
● http://www.allthingsdistributed.com/2008/1
2/eventually_consistent.html
● http://wiki.apache.org/cassandra/
● - "A description of the cassandra data
model"
● - "Architecture Overview"
● - “Operations”

● - "Articles and Presentations"

Cassandra Overview

Recommandé

Recommandé

Contenu connexe

Similaire à Cassandra Overview

Similaire à Cassandra Overview (20)

Dernier

Dernier (20)

Cassandra Overview

Notes de l'éditeur