3. 3 www.ExigenServices.com
Pros
1. Good balance between functionality and usability.
Powerful tools support.
2. SQL has feature rich syntax.
3. Set of widely accepted standards.
4. ACID
4. 4 www.ExigenServices.com
Scalability
RDBMS were mainstream for tens of years till
requirements for scalability increased
dramatically;
complexity of processed data structures increased
dramatically;
7. 7 www.ExigenServices.com
Cons
Cost of distributed transactions
a) Lower availability. Two DB with 99.9% have
availability.
99.9% * 99.9% ~ 99.8% (43 min. downtime per month).
b) Additional synchronization overhead.
c) As slow as slowest DB node + network latency.
d) 2PC is blocking protocol.
e) It is possible to lock resources forever.
8. 8 www.ExigenServices.com
Cons
Usage of master - slave replication.
Makes write side (master) performance
bottleneck and requires additional CPU/IO
resources.
There is no partition tolerance.
12. 12 www.ExigenServices.com
Cassandra sharding
Cassandra uses hash code load balancing
Cassandra better fits for reporting than for business
logic processing.
Cassandra + Hadoop == OLAP server with high
performance and availability.
16. 16 www.ExigenServices.com
DHT
Distributed hash table
lookup service similar to a hash table - (key, value)
any participating node can efficiently retrieve the value associated
with a given key
23. 23 www.ExigenServices.com
Overlay network
For any key k, each node either has a node ID
that owns k or has a link to a node whose node ID
is closer to k
Greedy algorithm: at each step, forward the
message to the neighbor whose ID is closest to k
26. 26 www.ExigenServices.com
Tunable consistency
Replication factor (number of copies of each piece
of data)
Consistency level (number of replicas to access
on every read/write operation)
Consistency level Read / Write
ONE 1 replica
QUORUM N/2 + 1
ALL N
28. 28 www.ExigenServices.com
Hybrid orientation
Column orientation
– columns aren’t fixed
– columns can be sorted
– columns can be queried for a certain range
Row orientation
– each row is uniquely identifiable by key
– columns are grouped into rows
33. 34 www.ExigenServices.com
Keyspace
Keyspace is close to a relational database
Basic attributes:
– replication factor
– replica placement strategy
– column families (tables from relational model)
Possible to create several keyspaces per application (for
example, if you need different replica placement strategy
or replication factor)
34. 35 www.ExigenServices.com
Column family
Container for collection of rows
Column family is close to a table from relational
data model
Column Family
Row
RowKey
Column1 Column2 Column3
Value3Value2Value1
36. 37 www.ExigenServices.com
Column family vs. Table
The columns are not strictly defined
A column family can hold columns or super
columns (collection of subcolumns)
39. 40 www.ExigenServices.com
Skinny and wide rows
Wide rows – huge number of columns and
several rows (are used to store lists of things)
Skinny rows – small number of columns and
many different rows (close to the relational model)
40. 41 www.ExigenServices.com
Disadvantages of wide rows
Badly work with RowCash
If you have many rows and many columns you
end up with larger indexes
(~ 40GB of data and 10GB index)
41. 42 www.ExigenServices.com
Column sorting
Column sorting is typically important only with
wide model
Comparator – is an attribute of column family that
specifies how column names will be compared for
sort order
43. 44 www.ExigenServices.com
Super column
Super column
name: byte[] cols: Map<byte[], Column>
• Cannot store map of super columns (only one
level deep)
• Five-dimensional hash:
[Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]
Stores map of subcolumns
44. 45 www.ExigenServices.com
Super column family
Column families:
– Standard (default)
Can combine columns and super columns
– Super
More strict schema constraints
Can store only super columns
Subcomparator can be specified for
subcolumns