3. Main points
Structured log storage
Columns ordered by name inside key
Rows ordered by hash of row key *
Column family storage
Fully distributed peer-to-peer
Partitioned by row key
Dynamo consistency
4. Structured log storage
No writes in place for you
JVM heap is reserved for memtables
Memtables are sorted
Memtables reach a specific size they are
flushed to disk
− Creates sstable file
− Bloom filter file
− Index file
6. Commit logs
Every write/delete operation goes to commit
log
If a node were to shutdown with un-flushed
memtables (every shutdown really)
Replay the commit logs
7. Columns ordered inside key
Cassandra likes wide rows
− Up to 2 billion
− (but not really would be a 32GB row)
set mystuff['ecapriolo']['a']='1'
set mystuff['ecapriolo']['b']='2'
set mystuff['ecapriolo']['c']='3'
...
slice mystuff['ecapriolo'] ['b'] ['g']
8. Rows ordered by hash of row key
All columns of row 'a1' on the same node
But all columns of row 'a2' may not be on
same node
Reduces hot spots
But there is no total ordering based on row
keys
9. Peer to Peer
Node list and token range is gossip-ed
Each node responsible for local storage and
requests
When a new node joins it take some token
range away from other nodes.
11. Dynamo consistency
Operations have a requested Consistency
Level
− ONE
− QUORUM
CL nodes ack the operation before the user
receives ack
If an operation fails it is safe to retry *
15. Hadoop and Cassandra
ColumnFamilyInputFormat
− Takes a ColumnFamily as input
− Map(ByteBuffer[] key,
SortedMap<ByteBuffer,Column>
ColumnFamilyOutputFormat
− Writes out to a column family
− OutputFormat ByteBuffer,List<Mutation>
16. Hadoop optimizations
Tasks run with locality if c* and h same node
InputFormat can leverage c* secondary
indexes
OutputFormat can use bulk loader
− C* writes are helluva fast anyway
17. Hive and Cassandra
Hive support similar to the hbase handler
support
Create a hive table specifying properties
similar to those in map reduce
hive> CREATE EXTERNAL TABLE
Users(userid string, name string, email
string, phone string)
STORED BY
'org.apache.hadoop.hive.cassandra.Ca
ssandraStorageHandler' WITH
18. Other support out there
github.com/edwardcapriolo/hive-cassandra-
udfs
− Delete UDF
− Composite splitter/builder UDFS
Not very hard to roll your own input format
− OneRowInputFormat
− ListOfRowsInputFormat
19. Pig Cassandra
Nice support for pig/cassandra
Pigmalian library
But I don't use it
− Cause I use hive
− You should as well
− And get my book :)
20. Comparison between c*
and “other noSQL”
I know your talking about hbase :)
Cassandra does not store multiple versions of
column
− Last update wins
− Use UUID as part of column name instead
The row keys are not globally ordered *
− Unless you are using ByteOrderPartitioner (no one
should use this)
21. Comparison between c*
and “other noSQL”
Each c* replica actively servers reads & writes
Cassandra directly manages its storage
Shards are pre-defined tokens (no auto-split)
Qualifier/column name can NOT be null
23. Know your data
Design for the long tail scenarios
− With design x our largest customer will have
10000000000000 columns in one row
How large will this column family be in 5
months?
What is the request rate?
How random is the read pattern
24. Understanding write-once files
Deletes are writes that get compacted away
later
Can you optimize from blind writes?
What percent of your application is
update/insert?
27. Hardware
Fast disk (you almost always want SSD)
RAM
− Caches, bloom filters, young gen
CPU
− Garbage collector, deserialization + compaction
needs cpu to work
28. Anti patterns
Using one row key as a queue
Doing N reads to satisfy a request
Read before write
Using collection support in place of wide rows
Encoding