No sql introduction_v1.1.1

Technical overview of cloud storage
NoSQL

Agenda
 Background
 What’s NoSQL
 Why NoSQL
 How to make a
selection of NoSQL
 Data type
 Data model
 Architecture
 Key technologies
 Summary

Not
What is NoSQL Only SQL
 Definition
 NoSQL ,sometimes expanded to "not only SQL“.
 It is a broad class of database management systems that
differ from classic relational database management
SQL
systems(RDBMSs).
 These data stores may NOT require FIXED table
schemas, usually avoid join operations, and typically scale
horizontally.
 Academia typically refers to these databases
as structured storage, a term that would include classic
relational databases as a subset. Refer to Wiki page: http://en.wikipedia.org/wiki/NoSQL

SQL

SQL Vs. NoSQL NoSQL

Transactional ACID
semantics Restricted ACID

Query Model Complex & Functionality
Simple & App Oriented

Data Model Relational& Row storage
Key-Value, Column Oriented, Document Oriented &graph

Schemas Fixed

Schema Free/Schema less

Data Storage Limited & Costly
Horizontal Scalability & Massive

Failure tolerance failure recovery slow
Native & fast recovery

Hardware Reliable & Expensive
Commodity & Inexpensive

Come From Requirement
Fast Increasing & Development
 Increasing number of servers
 Scale out
 Inexpensive & unreliable

servers
 Increasing data volume
 Big Data
 Scalability

 Increasing user number
 High throughputs All about INCREASING
 High workload

Come From Requirement
Different application & Ecosystem
 Rapid change
 Always beta
 Flexible data schema
 Abundant web applications
 Complex data
 Larger record size
 Typically read more and write less
 Low transaction and consistency requirements
 Online services
 Failure tolerance
 Fast recovery
 High availability

How to select a NoSQL
system?

memcachedb

What kinds of data can I store with?

Data type Classification
What kind of data should be stored
Unstructured data
Dynamo Voldemort
• Does not have a pre-defined data model
Berkeley DB Memcache DB
• And/or does not fit well into relational tables
Tokyo cabinet Redis
Structured data UNSTRUCTURED,
e.g. Documents, Videos,
Query
• The entities belongs to the same class should
Audios, Images
have same attributes and attributes order
• The data structure should be predefined and ?
couldn’t changed
Semi-structured data
• Is a form of structured data My SQL Oracle
• The entities belongs to the same class may STRUCTURED, e.g.
have different attributes CRM,ERP

• Contains tags or other markers to separate
semantic elements and enforce hierarchies of
Store
records and fields within the data BigTableHBase Cassandra
• the entities belongs to the same class may Hyper Table
have different attributes even though they are Mongo Couch
grouped together, and the attributes order is SEMI-STRUCTURED, e.g.
not important. Logs, mails, web pages,
Blogs
• Is also known as schema less or self-
describing structure.

Summary
 Flexible Flexible

 Record size

 Efficiency Scalability Record size

Unstructured

 Transaction Structured
Semi-structured

al
 Scalability Transactional Efficiency

How can I express my business model?

Key-Value pair based
Simple read and write data item is uniquely identified by a key
 Key-value stores allow the application to
store its data in a schema-less way. The
data could be stored in a data type of a Advantages
programming language or an object. • Efficiency
 A key indicates a unique Value
 Anything can be stored in a value, image, document, even a • Easy to use
complex data structure( array, list …)
• Flexible data storage
Disadvantages
• Simple query model
Many cloud based databases
can be classified to Key-
Value store, such as most of
column oriented databases.

Column Oriented store
A Simple :Column store Vs. Row store
Queries
Empty cells
are stored

Name Language Notes

Neo4j Java High-
High-performance, scalable,
Neo4j
Neo4j Java
Java performance, scalable, distribute
distributed Graph Database
OrientDB Java d Graph Database
OrientDB
OrientDB Java
Java
FlockDB Scala Graph database with query
FlockDB
FlockDB Scala
Scala language called GraphQL
Sones
C#
Null is
Sones
GraphDB
Sones C#
C#
Graph database with query
Graph database with query free
GraphDB
GraphDB language called GraphQL
language called GraphQL

Query 1

Query 2

BigTable data model
Column Families
Cell contents
( Row, Column , Timestamp )
Sorted RowKey,
Storing Storing Versioned
Row Key Content Anchor
pages from
the same domain
near each other
Content: Anchor: cnnsi.com Anchor: my.look.ca

t3
“com.cnn.www” t5
t7
“<HTML>…” t6 “CNN” t8 “CNN.COM” t9

“com.cnn.www/index.htm”

One to Many relationship
Vertical Extension
RDMS model

Row Key Content Row Key Anchor Reference text
com.cnn.www <HTML>… 1 0…n com.cnn.www cnnsi.com CNN
JOIN
… … com.cnn.www my.look.ca CNN.COM
com.cnn.www … …

BigTable model

Row Key content anchor
content: anchor:cnnsi.com anchor:my.look.cn anchor:…
com.cnn.www <HTML>… CNN CNN.COM …

Horizontals Extension

BigTable liked data model

 Stores content by column rather •Advantages
– Versioned
than by row. – Query oriented
 A key identifies a row, which contains data stored in – Good for OLAP Applications
one or more Column Families(CF)
– Null is free
 Within a CF, each row can contain multiple columns – Compression efficient
 Columns can be added dynamically – Dynamic Columns
 Distributed multi-dimensional sparse map •Disadvantages
 (row, column, timestamp) → cell contents – Read entire row is not
efficient
– Contains tags or other
markers to separate
semantic elements
– Not well-suited for OLTP-like
workloads
– Simple query model

Document Oriented store
 The idea is to replace the concept •Advantages
of a “row” with a more flexible – Rich RDBMS-like functions
– Freedom in modeling
model documents
 The “document.” By allowing embedded documents and •Disadvantages
arrays – Query logic complex.
 the document-oriented approach makes it possible to – Documents are limited in size
represent complex hierarchical relationships with a single
record.

 Documents have some similar
information and some different
 Usually store documents in a
JSON or JSON-like format

Document Oriented store
Examples
Row Key Content Row Key Anchor Reference text
com.cnn.www <HTML>… 1 0…n com.cnn.www cnnsi.com CNN
… … com.cnn.www my.look.ca CNN.COM
com.cnn.www … …

Document 1 //rowkey == " com.cnn.www "
{ find({" Rowkey" : " com.cnn.www "})
“Rowkey” : “com.cnn.www”, // 20<age <30
“content”: “<HTML>…”, find({"age" : {"$lt" : 30, "$gt" : 20}})
// id_num % 5 ==1
“Anchor”: {
find({"id_num" : {"$mod" : [5, 1]}})
“cnnsi.com”:”CNN”, // id_num % 5 !=1
“my.look.ca”:” CNN.COM” find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})
} // regular expression :name == joe and case insensitive
} find({"name" : /joe/i})

Summary
Key-Value Column oriented Document oriented Graph
Schema Schema less Dynamic columns Complex and hierarchical Graph
data model, JSON-like
format

Query model Key-value pair Key-value Affluent and complex

Data type Unstructured Semi-structured Semi-structured

Advantage Efficiency, Easy Query oriented, null is free Functionality and Freedom
in modeling

Disadvantage Sample Simple query model Complex

Systems

How can I deploy and administrate the
system?

Master-Slave architecture
An example: HBase Architecture
Control flaw
Zookeeper
• One Master and many Slaves
• Master manages meta data
Data flaw

in charge of all slaves,
dispatch tasks do load balance
Region Server and so on
• Slaves, Slaves report status to
the master and take over the real
data management
Region Server
• Usually with Data flow and
HDFS Control flow detach
• Typically with global storage
Region Server system(e.g. DFS) for data durability
HMaster
and fast recovery
• Especially some with a distributed
coordination mechanism to do master
election, maintain configuration, failure
detection and synchronization

 Is a model of communication where •Advantages
one device or process has unidirectional control – Clear Architect
over one or more other devices. In some systems
a master is elected from a group of eligible – Easy to provide Strong
Consistency
devices, with the other devices acting in the role
of slaves. – Easy for Management
– Easy for scalability
•Disadvantages
– Single Point Failure risk
– Hotspot problems

P2P Architecture
An example: Cassandra

4 • Peers are equally privileged
• Node replica as a factor
3
5 • Gossip protocol for failure
detection and maintaining cluster
(node in/out)
• Every member act as a proxy for
2 one hop routing
6
Client

1
7
8

P2P architecture
•Advantages
 Computing or networking is a distributed – High availability
application architecture
– Efficient for Random Read/write
 Peers are equally privileged, equipotent
participants in the application. – Nature data distribute
 Peers make a portion of their resources, – Usually One-hop lookup
such as processing power, disk storage or – Minimal Administration
network bandwidth, directly available to other •Disadvantages
network participants, without the need for – Weak of global status
central coordination by servers or stable
hosts. – More network communications
to maintain cluster(log(n))
 Usually used in conjunction with
the consistent hash

Hierarchy architecture
An example: mongodb Architecture
shard1 shard2 shard3
• Clients send queries to mongos
Mongod Mongod Mongod Mongod Mongod Mongod servers
secondary Arbiter secondary Arbiter secondary Arbiter
Replica set Replica set Replica set
• Mongoses act as routing servers,
queries are automatically routed
Mongod Mongod Mongod
primary primary primary to the appropriate shard
• Each shard consists of multiple
replicated servers per shard to
Config ensure availability and automated
server1
failover. The set of servers within
the shard comprise a replica set.
Config
server2
mongos mongos … • The config servers store the
cluster's metadata, each config
server has a complete copy of all
Config
server3 metadata, and if meta data is
changed, it will sent to Mongos for
client client client … update routing information.

An example: mongo db Architecture(2)
client Data storage layer
client
Routing server is grouped into
replica sets, not only
Meta data
storage
act as data serving
Routing server …. Routing server
also as data and
Meta data service availability
storage mechanism
Meta data storage

Meta data
Routing server storage
Data storage Routing servers Data storage Meta data storage
is scalable and
is not a single point, Mongod Mongod
store nothing Routing servers
secondary Arbiter
Distinct hierarchy two phase submit
dependency can be deployed is used, and the Mongod
up to client/APP, responsibilities of primary
or down to data meta data servers
storage decrease

 Distinct hierarchy dependency •Advantages
– High availability
 Especially with a routing layer – No single point failure
– Each layer scalable alone
 Less responsibility of client – Flexible routing layer
•Disadvantages
 No clear data flow and control
– Lower efficiency
flow – Complex administrate

Summary
Availability

 Availability
 Scalability

 Efficiency Administrative Scalability

Master-Slave
 Concise P2P
Hierarchy

 Administrative

Functionality Efficiency

Summary
Failover

 Master-slave architecture
 Master fails -> Master election
 Slaves fails -> Reassign by Master

 P2P Architecture
 Replica factor
 Hinted Handoff

 Hierarchy Architecture
 Master election & Hinted Handoff
 Multi-routing process

What about the performance with the system?
What about the key features of the system?

CAP Classification
• Consistency ,means all nodes
see the same data at the same
time
•Availability ,a guarantee that
every request receives a
response about whether it was
successful or failed
•Partition tolerance ,the system
continues to operate despite
arbitrary message loss

All about Redundancy
What’s the problems come from?
Request Request Request
 Redundancy is anywhere in
distributed
Service
systems, especially with Service
Service

Commodity hardware
 Consistency
 Availability

 Partitioning

Data storage Data storage Data storage
 Reliability

 Concurrency

 Throughputs

Consistency mechanism

 Two phase submit
 Strong consistency • Consistency is opposite with
Performance and Availability
 Master-slave
 Eventual consistency systems (such as HBase, BigTable)
adopted lower availability and strong
 Strong consistency
consistency
 Quorum Hierarchy & P2P systems choose to
 Eventual consistency do strong consistency at the
expense of decreasing reading
 Strong consistency performance

 Paxos
 Strong consistency

Two-phase commit
An example: GFS lease implementation
•The commit-request phase :
client push all data to replicas(step3), and
send submit request to primary replica (step4)

•The commit phase:
Primary replica request replica A and replica B
to submit the data(step 5),
replica A & replica B response “yes”(step 6),
the submit is successful(step 7).

Master-slave
An Example: MongoDB replica sets

Read only
Write Read • Master can be read and write
•Replicas/slaves are read only
Sync Replica
Master Eventually Consistency But
Read only Performance and Availability higher

Sync
Replica

Write • Only Master can be read and write
Read
• Replicas/slaves only for backup
Sync Replica
Master Strong Consistency

Sync
Replica

Quorum

• Configurable consistency
N: number of replicas
R: minimum number of successful read
W: minimum number of successful write

• Usually with anti-entropy using Merkle trees for replica synchronization and Read Repair for Keep
consistency
• (N, R, W) Tradeoff between consistency and performance
– Typical configuration: R(2) + W(2) > N(3),
– R + W > N yields a quorum-like system, ensure an application can always read the newest data

Quorum
An example: Cassandra Read repair
Client

Query Result

Cassandra Cluster

Closest replica Result Read repair if
digests differ
Replica A

Digest Query
Digest Response Digest Response

Replica B Replica C

Availability mechanism
 Routing mechanism
 Typically used in hierarchy architecture
 See MongoDB mongos implementation, hide the back end server changing

 Failure detection
 Distributed coordination.
 Usually used in master-slave architecture, such as zookeeper in Hbase and
chubby in BigTable
 Gossip protocol
 Usually used in P2P architecture, e.g. Dynamo & Cassandra
 Master election
 Hinted handoff

Availability mechanism
Master election

 Is Used for failover MongoDB replica set

 When a cluster consist of a Negotiate
New master
Mongod
Mongod Mongod
group of n and one of them act secondary
primary Arbiter

as master/primary node.
 If the node fails, the cluster will
elect a new master/primary Mongod
Mongod
Mongod
secondary
down
primary
recovering

node.
•Each node can be primary
•Secondary nodes can only act as
arbiter or data nodes and arbiter

HBase Master election

Zookeeper
•Zookeeper act as a Arbiter, and keep a
“token” for Hbase master, The node which
get the “token” will act as master.
Region Server
•If HMaster fails, the “token” that
it toke form zookeeper will be released ,
the secondary HMaster will act as
Hmaster

•Then, Zookeeper will send the change to
Every nodes in the cluster

HMaster
Secondary HMaster

Hinted Handoff
For temporary failure Hash(k)

A
 Writes are performed on the first N healthy nodes
found by the coordinator. G B
 If a node is down, data will be sent to the next
node in the ring.
F C
 This node will keep track of the intended recipient
and send later.
 Replicas are stored at multiple data centers for E D
handing the failure of the whole data center
• So called always writeable in
Cassandra

Data partitioning & Scalability mechanism
Hierarchically structure
 Multi-levels hierarchy organization
 3 levels in BigTable, HBase and Hypertable(root->meta->user)
 2 levels in mongo DB(meta->user) •Advantages
 Key range split/auto sharding for data partitioning – Automatic balancing for changes
in data distribution
– High performance in range
query
– Nearly unlimited data storage
•Disadvantages
– Sequence write not efficient

Scalability mechanism
Consistent hash h(key1)
1 0
E •Advantages
– Nature balancing for data
A N=3 partitioning &distribution
– High performance in
C
random operations
•Disadvantages
– Non-uniform data/load
h(key2) F distribution
– Disregard of the
heterogeneity of node
performance
– Moving data when nod
B
D in/out
– Not good for sequence
operations and range query

1/2 45

Data Durability mechanism
 Write ahead log
 Is a family of techniques for providing atomicity and durability (two of the ACID properties)
in database systems.
 In a system using WAL, all modifications are written to a log before they are applied. Usually both
redo and undo information is stored in the log.

 Data replica
 DFS (Hbase, hypertable,bigtable)
 Embedded Redundancy(cassandra, mongo DB)

Data Durability mechanism
An example: HBase WAL
• Log Flushing
Data streams written to a file system
• Log Rolling Back
check database persistence and
the logs, then remove all the logs
before last database persistence
operations.
• Log Replaying
Replaying a log is simply done by reading
a log and adding its entries to the database
and then flush the data to disks.
It can be used for fault recovery

Summary
Consistency Avalaibility Data Partitioning Data Durability Scalability failover
Hierarchically
Two phase submit Routing mechanism Table split/auto sharding DFS structure Reassign

Master-slave Failure detection consistent Hash Data Redundancy Consistent Hash Master election
Multi-routing
Quorum Master election process

Hinted handoff Hinted handoff

replica set/group replica factor

No sql introduction_v1.1.1

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (6)

Similaire à No sql introduction_v1.1.1

Similaire à No sql introduction_v1.1.1 (20)

Dernier

Dernier (20)

No sql introduction_v1.1.1

Notes de l'éditeur