cassandra

Apache Cassandra

Vova Miguro
THE END
trnl.me@gmail.com

Thursday, September 22, 11

What is Cassandra?

• key-value store with some structure

• fault-tolerant

• scalable

• eventual consistent

• tunable

- consistency level

- replication


Where did it come from?

• created at Facebook

- Dynamo: distribution architecture

- BigTable: data model

• open-sourced in 2008

• Apache incubator in early 2009

• graduation in March 2010


Who uses it?

• Facebook (of cource)

• Rackspace

• Twitter

• Digg

• Reddit

• IBM

• others...


What problems does it solve?

• reliability at scale

- no single point of failure (all nodes are
identical)

• simple scaling (linear)

• high write throughput

• large data sets


What problems it can’t solve?

• no ﬂexible indices (later about this)

• not good for big binary data (>64mb) unless
you chunk

• row contents must ﬁt in available memory


Clustering: CAP

• CAP Theorem

- Consistency

- Availability

- Partition tolerance

• choose two

• Cassandra chooses A and P but allows them
to be tunable to have more C


Clustering: Replication & Consistency

• replication factor

- how many nodes data is replicated on

• consistency level

- zero (async write)

- any

- one

- quorum (rf/2+1)

- all


Clustering: Consistency Level

zero none write
(async write)

any 1st response write
(included hinted handoff)

one 1st response read/write

quorum rf/2 + 1 read/write

all all read/write


Clustering: Ring

• every node gets a token

- deﬁnes its place
in the ring

- and which keys it
is responsible
for (ranges)


Clustering:Ring

• every node gets a token

- deﬁnes its place
in the ring

- and which keys it
is responsible
for (ranges)


Clustering:Ring

• new node

- token assignment

- ranges adjusted

- bootstrap

- only neighbor
nodes affected


Clustering:Ring

• node dies or becomes
isolated

• hinting handoff


Data Model

• keyspace

• column family

• row (indexed)

• key

• columns

• name (sorted)

• value


Data Model: ColumnFamily families
Column


Supercolumn families
Data Model: SuperColumnFamily


Easier to start from the bottom up


Data Model: Column


Data Model: Row


Data Model: Column comparators

• TimeUUID

• LexicalUUID

• UTF8

• Long

• Bytes

• ...


Data Model: ColumnFamily


Writing
• simple: put(key,col,value)

• complex: put(key,[col,value,...col,value])

• batch: multi key


Writes
Writing


Reading
• get(): retrieve column by name

• multiget(): by column name for a number of keys

• get_slice(): by column name or a range of names

- returning columns

- returning supercolumns

• multiget_slice(): a subset of columns for a set of keys

• get_count(): number of columns or subcolumns

• get_range_slice(): subset of columns for a range of keys


Reads

Reading


Clients
Python:
•Pycassa: http://github.com/pycassa/pycassa
•Telephus: http://github.com/driftx/Telephus (Twisted)
• Java:
•Hector: http://github.com/rantav/hector
•Kundera http://github.com/impetus-opensource/Kundera
•Pelops: http://github.com/s7/scale7-pelops
•Cassandrelle (Demoiselle Cassandra): http://demoiselle.sf.net/
component/demoiselle-cassandra/
• .NET
•Aquiles: http://aquiles.codeplex.com/
• Ruby:
•Cassandra: http://github.com/fauna/cassandra
• PHP:
•PHP Client Library: https://github.com/kallaspriit/Cassandra-PHP-
Client-Library
•phpcassa: http://github.com/thobbs/phpcassa


CQL (from 0.8)
• USE

• SELECT

• INSERT/UPDATE

• DELETE

• TRUNCATE/DROP

• BATCH

• CREATE KEYSPACE

• CREATE COLUMNFAMILY

• CREATE INDEX


CQL: Example
CREATE COLUMNFAMILY users (
... KEY varchar PRIMARY KEY,
... password varchar,
... gender varchar,
... session_token varchar,
... state varchar,
... birth_year bigint);

INSERT INTO users (KEY, password) VALUES ('jsmith',
'ch@ngem3a');

SELECT * FROM users WHERE KEY='jsmith';
u'jsmith' | u'password',u'ch@ngem3a'

DROP COLUMNFAMILY users;


CQL: Example
CREATE INDEX birth_year_key ON users (birth_year);
CREATE INDEX state_key ON users (state);

SELECT * FROM users
... WHERE gender='f' AND
... state='TX' AND
... birth_year='1968';
u'user1' | u'birth_year',1968 | u'gender',u'f' |
u'password',u'ch@ngem3' | u'state',u'TX'

DROP COLUMNFAMILY users;


Indexing

• secondary indexes

- hashed

- equality predicates (where column x = y)

- speciﬁed on creation or later

- best when many rows with similar columns

• self-managed indexes


Indexing: Self-managed: one-to-one

indexed indexed
value #1 value #2
index
name
related related
key key


Indexing: Self-managed: one-to-several

indexed indexed
value #1 value #2
index
name
related related related related
key key key key


Indexing: Self-managed: one-to-many

related key related key
indexed
value #1
- -

indexed
value #2
- -


Indexing: Self-managed: one-to-many

ordering ordering
indexed value value
value #1

ordering ordering
indexed value value
value #2


Let’s practice: Twitter
Get a user record by username
• Get the friends of a username
• Get the followers of a username
• Get a timeline for a user
• Get a timeline of a specific user’s tweets
• Get a tweet from a tweet ID
• Create a tweet
• Create a user
• Add friends to a user
• Remove friends from a user


Facebook messaging


cassandra

Recommandé

Recommandé

Contenu connexe

Similaire à cassandra

Similaire à cassandra (8)

cassandra