2. Agenda
Background
What’s NoSQL
Why NoSQL
How to make a
selection of NoSQL
Data type
Data model
Architecture
Key technologies
Summary
3. Not
What is NoSQL Only SQL
Definition
NoSQL ,sometimes expanded to "not only SQL“.
It is a broad class of database management systems that
differ from classic relational database management
SQL
systems(RDBMSs).
These data stores may NOT require FIXED table
schemas, usually avoid join operations, and typically scale
horizontally.
Academia typically refers to these databases
as structured storage, a term that would include classic
relational databases as a subset. Refer to Wiki page: http://en.wikipedia.org/wiki/NoSQL
4. SQL
SQL Vs. NoSQL NoSQL
Transactional ACID
semantics Restricted ACID
Query Model Complex & Functionality
Simple & App Oriented
Data Model Relational& Row storage
Key-Value, Column Oriented, Document Oriented &graph
Schemas Fixed
Schema Free/Schema less
Data Storage Limited & Costly
Horizontal Scalability & Massive
Failure tolerance failure recovery slow
Native & fast recovery
Hardware Reliable & Expensive
Commodity & Inexpensive
6. Come From Requirement
Fast Increasing & Development
Increasing number of servers
Scale out
Inexpensive & unreliable
servers
Increasing data volume
Big Data
Scalability
Increasing user number
High throughputs All about INCREASING
High workload
7. Come From Requirement
Different application & Ecosystem
Rapid change
Always beta
Flexible data schema
Abundant web applications
Complex data
Larger record size
Typically read more and write less
Low transaction and consistency requirements
Online services
Failure tolerance
Fast recovery
High availability
10. Data type Classification
What kind of data should be stored
Unstructured data
Dynamo Voldemort
• Does not have a pre-defined data model
Berkeley DB Memcache DB
• And/or does not fit well into relational tables
Tokyo cabinet Redis
Structured data UNSTRUCTURED,
e.g. Documents, Videos,
Query
• The entities belongs to the same class should
Audios, Images
have same attributes and attributes order
• The data structure should be predefined and ?
couldn’t changed
Semi-structured data
• Is a form of structured data My SQL Oracle
• The entities belongs to the same class may STRUCTURED, e.g.
have different attributes CRM,ERP
• Contains tags or other markers to separate
semantic elements and enforce hierarchies of
Store
records and fields within the data BigTableHBase Cassandra
• the entities belongs to the same class may Hyper Table
have different attributes even though they are Mongo Couch
grouped together, and the attributes order is SEMI-STRUCTURED, e.g.
not important. Logs, mails, web pages,
Blogs
• Is also known as schema less or self-
describing structure.
11. Summary
Flexible Flexible
Record size
Efficiency Scalability Record size
Unstructured
Transaction Structured
Semi-structured
al
Scalability Transactional Efficiency
13. Key-Value pair based
Simple read and write data item is uniquely identified by a key
Key-value stores allow the application to
store its data in a schema-less way. The
data could be stored in a data type of a Advantages
programming language or an object. • Efficiency
A key indicates a unique Value
Anything can be stored in a value, image, document, even a • Easy to use
complex data structure( array, list …)
• Flexible data storage
Disadvantages
• Simple query model
Many cloud based databases
can be classified to Key-
Value store, such as most of
column oriented databases.
14. Column Oriented store
A Simple :Column store Vs. Row store
Queries
Empty cells
are stored
Name Language Notes
Neo4j Java High-
High-performance, scalable,
Neo4j
Neo4j Java
Java performance, scalable, distribute
distributed Graph Database
OrientDB Java d Graph Database
OrientDB
OrientDB Java
Java
FlockDB Scala Graph database with query
FlockDB
FlockDB Scala
Scala language called GraphQL
Sones
C#
Null is
Sones
GraphDB
Sones C#
C#
Graph database with query
Graph database with query free
GraphDB
GraphDB language called GraphQL
language called GraphQL
Query 1
Query 2
15. Column Oriented store
BigTable data model
Column Families
Cell contents
( Row, Column , Timestamp )
Sorted RowKey,
Storing Storing Versioned
Row Key Content Anchor
pages from
the same domain
near each other
Content: Anchor: cnnsi.com Anchor: my.look.ca
t3
“com.cnn.www” t5
t7
“<HTML>…” t6 “CNN” t8 “CNN.COM” t9
“com.cnn.www/index.htm”
16. Column Oriented store
One to Many relationship
Vertical Extension
RDMS model
Row Key Content Row Key Anchor Reference text
com.cnn.www <HTML>… 1 0…n com.cnn.www cnnsi.com CNN
JOIN
… … com.cnn.www my.look.ca CNN.COM
com.cnn.www … …
BigTable model
Row Key content anchor
content: anchor:cnnsi.com anchor:my.look.cn anchor:…
com.cnn.www <HTML>… CNN CNN.COM …
Horizontals Extension
17. Column Oriented store
BigTable liked data model
Stores content by column rather •Advantages
– Versioned
than by row. – Query oriented
A key identifies a row, which contains data stored in – Good for OLAP Applications
one or more Column Families(CF)
– Null is free
Within a CF, each row can contain multiple columns – Compression efficient
Columns can be added dynamically – Dynamic Columns
Distributed multi-dimensional sparse map •Disadvantages
(row, column, timestamp) → cell contents – Read entire row is not
efficient
– Contains tags or other
markers to separate
semantic elements
– Not well-suited for OLTP-like
workloads
– Simple query model
18. Document Oriented store
The idea is to replace the concept •Advantages
of a “row” with a more flexible – Rich RDBMS-like functions
– Freedom in modeling
model documents
The “document.” By allowing embedded documents and •Disadvantages
arrays – Query logic complex.
the document-oriented approach makes it possible to – Documents are limited in size
represent complex hierarchical relationships with a single
record.
Documents have some similar
information and some different
Usually store documents in a
JSON or JSON-like format
21. Summary
Key-Value Column oriented Document oriented Graph
Schema Schema less Dynamic columns Complex and hierarchical Graph
data model, JSON-like
format
Query model Key-value pair Key-value Affluent and complex
Data type Unstructured Semi-structured Semi-structured
Advantage Efficiency, Easy Query oriented, null is free Functionality and Freedom
in modeling
Disadvantage Sample Simple query model Complex
Systems
23. Master-Slave architecture
An example: HBase Architecture
Control flaw
Zookeeper
• One Master and many Slaves
• Master manages meta data
Data flaw
in charge of all slaves,
dispatch tasks do load balance
Region Server and so on
• Slaves, Slaves report status to
the master and take over the real
data management
Region Server
• Usually with Data flow and
HDFS Control flow detach
• Typically with global storage
Region Server system(e.g. DFS) for data durability
HMaster
and fast recovery
• Especially some with a distributed
coordination mechanism to do master
election, maintain configuration, failure
detection and synchronization
24. Master-Slave architecture
Is a model of communication where •Advantages
one device or process has unidirectional control – Clear Architect
over one or more other devices. In some systems
a master is elected from a group of eligible – Easy to provide Strong
Consistency
devices, with the other devices acting in the role
of slaves. – Easy for Management
– Easy for scalability
•Disadvantages
– Single Point Failure risk
– Hotspot problems
25. P2P Architecture
An example: Cassandra
4 • Peers are equally privileged
• Node replica as a factor
3
5 • Gossip protocol for failure
detection and maintaining cluster
(node in/out)
• Every member act as a proxy for
2 one hop routing
6
Client
1
7
8
26. P2P architecture
•Advantages
Computing or networking is a distributed – High availability
application architecture
– Efficient for Random Read/write
Peers are equally privileged, equipotent
participants in the application. – Nature data distribute
Peers make a portion of their resources, – Usually One-hop lookup
such as processing power, disk storage or – Minimal Administration
network bandwidth, directly available to other •Disadvantages
network participants, without the need for – Weak of global status
central coordination by servers or stable
hosts. – More network communications
to maintain cluster(log(n))
Usually used in conjunction with
the consistent hash
27. Hierarchy architecture
An example: mongodb Architecture
shard1 shard2 shard3
• Clients send queries to mongos
Mongod Mongod Mongod Mongod Mongod Mongod servers
secondary Arbiter secondary Arbiter secondary Arbiter
Replica set Replica set Replica set
• Mongoses act as routing servers,
queries are automatically routed
Mongod Mongod Mongod
primary primary primary to the appropriate shard
• Each shard consists of multiple
replicated servers per shard to
Config ensure availability and automated
server1
failover. The set of servers within
the shard comprise a replica set.
Config
server2
mongos mongos … • The config servers store the
cluster's metadata, each config
server has a complete copy of all
Config
server3 metadata, and if meta data is
changed, it will sent to Mongos for
client client client … update routing information.
28. Hierarchy architecture
An example: mongo db Architecture(2)
client Data storage layer
client
Routing server is grouped into
replica sets, not only
Meta data
storage
act as data serving
Routing server …. Routing server
also as data and
Meta data service availability
storage mechanism
Meta data storage
Meta data
Routing server storage
Data storage Routing servers Data storage Meta data storage
is scalable and
is not a single point, Mongod Mongod
store nothing Routing servers
secondary Arbiter
Distinct hierarchy two phase submit
dependency can be deployed is used, and the Mongod
up to client/APP, responsibilities of primary
or down to data meta data servers
storage decrease
29. Hierarchy architecture
Distinct hierarchy dependency •Advantages
– High availability
Especially with a routing layer – No single point failure
– Each layer scalable alone
Less responsibility of client – Flexible routing layer
•Disadvantages
No clear data flow and control
– Lower efficiency
flow – Complex administrate
32. What about the performance with the system?
What about the key features of the system?
33. CAP Classification
• Consistency ,means all nodes
see the same data at the same
time
•Availability ,a guarantee that
every request receives a
response about whether it was
successful or failed
•Partition tolerance ,the system
continues to operate despite
arbitrary message loss
34. All about Redundancy
What’s the problems come from?
Request Request Request
Redundancy is anywhere in
distributed
Service
systems, especially with Service
Service
Commodity hardware
Consistency
Availability
Partitioning
Data storage Data storage Data storage
Reliability
Concurrency
Throughputs
35. Consistency mechanism
Two phase submit
Strong consistency • Consistency is opposite with
Performance and Availability
Master-slave
Master-Slave architecture
Eventual consistency systems (such as HBase, BigTable)
adopted lower availability and strong
Strong consistency
consistency
Quorum Hierarchy & P2P systems choose to
Eventual consistency do strong consistency at the
expense of decreasing reading
Strong consistency performance
Paxos
Strong consistency
36. Two-phase commit
An example: GFS lease implementation
•The commit-request phase :
client push all data to replicas(step3), and
send submit request to primary replica (step4)
•The commit phase:
Primary replica request replica A and replica B
to submit the data(step 5),
replica A & replica B response “yes”(step 6),
the submit is successful(step 7).
37. Master-slave
An Example: MongoDB replica sets
Read only
Write Read • Master can be read and write
•Replicas/slaves are read only
Sync Replica
Master Eventually Consistency But
Read only Performance and Availability higher
Sync
Replica
Write • Only Master can be read and write
Read
• Replicas/slaves only for backup
Sync Replica
Master Strong Consistency
Sync
Replica
38. Quorum
• Configurable consistency
N: number of replicas
R: minimum number of successful read
W: minimum number of successful write
• Usually with anti-entropy using Merkle trees for replica synchronization and Read Repair for Keep
consistency
• (N, R, W) Tradeoff between consistency and performance
– Typical configuration: R(2) + W(2) > N(3),
– R + W > N yields a quorum-like system, ensure an application can always read the newest data
39. Quorum
An example: Cassandra Read repair
Client
Query Result
Cassandra Cluster
Closest replica Result Read repair if
digests differ
Replica A
Digest Query
Digest Response Digest Response
Replica B Replica C
40. Availability mechanism
Routing mechanism
Typically used in hierarchy architecture
See MongoDB mongos implementation, hide the back end server changing
Failure detection
Distributed coordination.
Usually used in master-slave architecture, such as zookeeper in Hbase and
chubby in BigTable
Gossip protocol
Usually used in P2P architecture, e.g. Dynamo & Cassandra
Master election
Hinted handoff
41. Availability mechanism
Master election
Is Used for failover MongoDB replica set
When a cluster consist of a Negotiate
New master
Mongod
Mongod Mongod
group of n and one of them act secondary
primary Arbiter
as master/primary node.
If the node fails, the cluster will
elect a new master/primary Mongod
Mongod
Mongod
secondary
down
primary
recovering
node.
•Each node can be primary
•Secondary nodes can only act as
arbiter or data nodes and arbiter
42. HBase Master election
Zookeeper
•Zookeeper act as a Arbiter, and keep a
“token” for Hbase master, The node which
get the “token” will act as master.
Region Server
•If HMaster fails, the “token” that
it toke form zookeeper will be released ,
the secondary HMaster will act as
Hmaster
•Then, Zookeeper will send the change to
Every nodes in the cluster
HMaster
Secondary HMaster
43. Hinted Handoff
For temporary failure Hash(k)
A
Writes are performed on the first N healthy nodes
found by the coordinator. G B
If a node is down, data will be sent to the next
node in the ring.
F C
This node will keep track of the intended recipient
and send later.
Replicas are stored at multiple data centers for E D
handing the failure of the whole data center
• So called always writeable in
Cassandra
44. Data partitioning & Scalability mechanism
Hierarchically structure
Multi-levels hierarchy organization
3 levels in BigTable, HBase and Hypertable(root->meta->user)
2 levels in mongo DB(meta->user) •Advantages
Key range split/auto sharding for data partitioning – Automatic balancing for changes
in data distribution
– High performance in range
query
– Nearly unlimited data storage
•Disadvantages
– Sequence write not efficient
45. Scalability mechanism
Consistent hash h(key1)
1 0
E •Advantages
– Nature balancing for data
A N=3 partitioning &distribution
– High performance in
C
random operations
•Disadvantages
– Non-uniform data/load
h(key2) F distribution
– Disregard of the
heterogeneity of node
performance
– Moving data when nod
B
D in/out
– Not good for sequence
operations and range query
1/2 45
46. Data Durability mechanism
Write ahead log
Is a family of techniques for providing atomicity and durability (two of the ACID properties)
in database systems.
In a system using WAL, all modifications are written to a log before they are applied. Usually both
redo and undo information is stored in the log.
Data replica
DFS (Hbase, hypertable,bigtable)
Embedded Redundancy(cassandra, mongo DB)
47. Data Durability mechanism
An example: HBase WAL
• Log Flushing
Data streams written to a file system
• Log Rolling Back
check database persistence and
the logs, then remove all the logs
before last database persistence
operations.
• Log Replaying
Replaying a log is simply done by reading
a log and adding its entries to the database
and then flush the data to disks.
It can be used for fault recovery
48. Summary
Consistency Avalaibility Data Partitioning Data Durability Scalability failover
Hierarchically
Two phase submit Routing mechanism Table split/auto sharding DFS structure Reassign
Master-slave Failure detection consistent Hash Data Redundancy Consistent Hash Master election
Multi-routing
Quorum Master election process
Hinted handoff Hinted handoff
replica set/group replica factor
Notes de l'éditeur
The data could be stored in a data type of a programming language or an object.
Column-oriented systems are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.Column-oriented systems are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching any other columns for the rows.
For example here's a document:FirstName="Bob", Address="5 Oak St.", Hobby="sailing".Another document could be:FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
AdvantagesHigh availabilityEfficient for Random Read/writeNature data distributeUsually One-hop lookupDisadvantagesWeek of global statusEventually ConsistencyNeed to move data when node joined/quitHeavy bandwidth usage to maintain clusterNot efficient for sequence Read/Write
AdvantagesHigh availabilityEfficient for Random Read/writeNature data distributeUsually One-hop lookupDisadvantagesWeek of global statusEventually ConsistencyNeed to move data when node joined/quitHeavy bandwidth usage to maintain clusterNot efficient for sequence Read/Write