Nosql

NOSQL Data Stores
Not Only SQL

Tuesday, September 21, 2010

Data Store

Super Set
Relational Databases
Key Value Stores
Document Stores
Column Family Stores


Design This Schema
Student Course
Student
Address
Address Score
Course
Score


Scalable huh??

Use Case : This schema has to serve the whole student
community in this world
One Big Server?? How Big?
More than 1 Servers. How will that work?


WHY NOSQL ?

Scalability : Horizontal
Relational Databases do no good when distributed
NOSQL : Distributed, Flexible Schema, Relaxing Consistency


Issues with Relational DB

Scalability
Replication : Scaling by duplication
Partitioning(Sharding) : Scaling by division


Replication
Master - Slave
1 write = N * writes (N is number of slaves)
Faster reads ( Can Read from N nodes)
Critical Reads Go to Master (Application Aware)
Limitations of high volumes of data


Replication
Multi - Master
Adding more masters
Conﬂict resolution O(n^3) or O(n^2)


Partitioning(Sharding)
Scales Read as well as Writes
Application needs to be Partition Aware
Broken Relationships : Cartesian products across shards ??
Referential Integrity is no more
Rebalancing


Consistent Hashing
Hash Ring (Or Clock Face)

Balanced Distribution After Adding a new Node

Common Sharding Schemes
Vertical Partitioning
Range Based Partitioning
Hash Based Partitioning
Directory Based Partitioning


Can live without !!
UPDATE and DELETE
Loss of Information
Can be modeled as INSERT with versioning
Filter out inactive records


Avoid JOINS
Expensive, Fails with partitions
How to avoid?
De - normalize
Storage is cheap now
Burden of Consistency shifts to application


Still need ACID ??
Atomicity : Only Single key is enough
Consistency : CAP Theorem
Can only get any two of Consistency, Availability,
Partition Tolerance
Isolation : Not more than Read - Committed (Single Key)
Durability : Node failures. Peer Replication


Fixed Schema
Schema comes before Data
Modifying Schema is essential
Adding new features
Modifying Schema is hard
Locking of rows(Add/Modify a column)
Locking of table(Add/Remove index)


Model this!!
Hierarchal Data
Graphs


Desired Characteristics
High Scalability
Add nodes incrementally
No Diminishing Returns
High Availability
No single point of failure
Node Failures agnostic


High Performance
Fast operations
Non - Blocking Writes
Consistency
No need of Strong consistency
Eventual Consistency, Read - Your - Write Consistency


Deployment Flexibility
Add/Remove node automatically
NO DFS or shared storage
Should work with commodity heterogenous hardware
Modeling Flexibility
Key - Value Pairs, Hierarchal and Graph Data


Query Flexibility
Multi Gets
Range Queries
Upserts


Inspiration
Memcached
In-memory Key Value
Blazing Fast
Inﬁnite Horizontal Scalability


Key Value Stores
Simple Data Model
Amazon Dynamo
Amazon S3
Project Voldemort
Redis
Scalaris and lot others


Amazon Dynamo
Internal to Amazon
Distributed K-V store
Opaque Values
Partitioning
A variant of consistent hashing
Hash Ring division


Amazon Dynamo
Partitioning
Mapping Communication via Gossip protocol
Eventually consistent view of mappings
Replication
Each key is replicated on N nodes
Preference List


Amazon Dynamo
Replication
Read/Write through Coordinator nodes
Conﬁgurations
N = number of replicas
W = min. nodes that must ACK the receipt of a WRITE
R = min. nodes contacted for a READ
R+W > N will ensure Quorum

Amazon Dynamo
Tuning (N,R,W)
Increased W means more replication
Increased R mean high consistency low performance
Typical values for Amazon Apps (N,R,W)= (3,2,2)


Amazon Dynamo
Consistency
Eventually consistent
Uses Object versioning via Vector Clocks
Consistency Protocol
Return all versions
Reconcile divergent versions
Reconciled version superseding the current is written

Amazon Dynamo
Handling Temporary Failures
Hinted Handoff
Handling Permanent Failures
Node Sync


Amazon Dynamo
Ring membership
Add/Remove node needs rebalancing
Failure Detection
Gossip about failures
Check periodically about availability and gossip


Other K-V Stores
Check out others too. Worth a read and try.
S3,Voldemort,Redis,Scalaris.


Document Stores
Step further from K-V stores
Value is full blown record(document)
Document is not Opaque(Expose a structure to perform
operations)
Each document can have different schema e.g JSON
Relations are possible
One to Many and Many to Many

Document Stores
Mostly Similar to relational db(except upfront Schema)
Amazon Simple DB
Apache CouchDB
Riak
Mongo DB


Mongo DB
We use mongo in a large automated translation software
Data Model
Key - Value, value being binary serialized JSON(BSON)
4 Mb limit on BSON
For larger object use GridFS.
Collections : more of like a table
B-trees used for indexes

Mongo DB
Storage
Uses Memory Mapped Files(Cache controlled by OS VMM)
Writes
In place updates
partial updates
Single Document Atomic updates


Mongo DB
Queries
JSON style based syntax (powered by js engine)
Support for conditional operators,regex etc
Cursor support
Query optimizers
Map-Reduce over a collection


Mongo DB
Replication
Master Slave
Replica Pairs
Master - Master


Mongo DB
Partitioning
Auto Sharding Done through chunks(50 Mb max)
Easy node addition
Auto balancing
ZERO single point of failure
Automatic Failover


Column Family Stores
Sparse, Distributed, Persistent, Multi-Dimensional sorted Map
Column Keys are grouped into sets called column-families
BigTable
HBase
Cassandra


Big Table Column Family


Cassandra
Combines distributed architecture of Dynamo with column-
family data model of Big Table


Cassandra
Data Model : Multi Dimensional Map indexed by a key
Each app has its own key-space
Key can be any long string. Indexed by cassandra
Column - an attribute of record. Time Stamped
Column-Family: Grouping of columns. Similar to
relational table
Super Columns: List of columns

Cassandra
Data Model
Column family can contain any one of column/super
column
KeySpace.ColumnFamily.Key.[SuperColumn].Column
Sorting
Data is sorted at write time
Columns are sorted within their row by column name
(pluggable sorting providers)

Cassandra
Partitioning : Mostly Like Dynamo
Consistent hashing under order preserving hash function
Uses Chord approach to load balance(dynamo used v-
node)


Cassandra
Replication
Coordinator nodes and preference list as Dynamo
DataCenter aware, rack aware, rack-unaware
Rack aware uses Zookeeper
Membership based on ScuttleButt- anti-entropy gossip


Cassandra
Failure Detection
Modiﬁed version of Accrual failure detection
Failure Handling
Same as hinted handoff in Dynamo


Cassandra
Write
Writing to commit log, followed by an update to
memtable.
Dedicated disk for commit log(Makes write sequential)
No seeks-always sequential, so blazing fast
Atomic With in column family


Cassandra
Read
Similar to dynamo to ﬁgure out which nodes will serve
Similar to Big Table for storage level


Thanks!!!
Due regards to Reddy Raja for this invite.


Nosql

Recommandé

Recommandé

Contenu connexe

Similaire à Nosql

Similaire à Nosql (16)

Nosql