General Principles of Intellectual Property: Concepts of Intellectual Proper...
Data models in NoSQL
1. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Dr. Dipali P. Meher
MCS, M.Phil, NET, Ph.D
Modern College of Arts, Science and Commerce,
Ganeshkhind, Pune 16
mailtomeher@gmail.com/dipalimeher@moderncollegegk.org
DATA MODELS
2. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
NOSQL had ability to run databases on large cluster.
When size of data increases it will become difficult to
scale up with data – always we need to buy bigger ser
ver as data increases.
One solution to this is to run the databases on cluster
of servers.
Running databases on server increases complexity of
databases
3. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
2 paths for DATA DISTRIBUTION
Replication and Sharding
Replication
4. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Data Distribution
Sharding
5. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Replication
Replication takes the same data
and copies it over multiple nodes.
Replication copies data across multiple servers, so each
bit of data can be found in multiple places
6. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Replication : Two Forms
master-slave and peer-to-peer
7. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Replication: Master Slave
Replicate data across multiple nodes
One node is designated as the master, or primary.
MASTER
It is the authoritative source for the data
It is usually responsible for processing any updates to that data.
other nodes are slaves, or secondaries.
A replication process synchronizes the slaves with the master.
It can be appointed manually or automatically.
SLAVE
A replication process synchronizes the slaves with the master.
After a failure of the master, a slave can be appointed as new m
aster very quickly.
8. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
MASTER
Can be appointed automatically or manually
Manually: Manual appointing typically means that when
you configure your cluster, you configure one node as the
master.
Automatically: you create a cluster of nodes and they elect
one of themselves to be the master.
automatic appointment means that the cluster can automatically
appoint a new master when a master fails, reducing downtime.
9. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Replication: Master Slave Replication
10. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Pros and cons of Master-Slave Replication
PROS
More read requests
Add more slave nodes
Ensure that all read requests are routed to slaves
Should the master fail, the slaves can still handle read
requests
Good for datasets with a read intensive dataset (read
resilience)
CONS
The master is a bottleneck
Limited by its ability to process updates and to
pass those updates on Its failure does eliminate the
ability to handle writes until: the master is restored
or a new master is appointed
Inconsistency due to slow propagation of changes to
the slaves
Bad for datasets with heavy write traffic
11. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Read resilience
More and More Read requests
In order to get read Resilience user has to ensure that read and write
paths in your application are different.
In case of failure in write path that can be handled separately and read
can continue
Read Path
Write Path
12. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Master slave replication is good for read resil
ience.
Does not scale well for write resilience.
It also faces bottleneck problem for updates
(write requests).
To solve above issues peer-to-peer replication
is there.
13. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Replication: peer-to-peer
All the replicas have equal weight,
All replicas process write requests
Loss of any one replica does not prevent
access to data store.
14. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Of any ode will fail then working is continu
ed with other nodes. i.e user can ride over
node failures without losing access to data.
Nodes can be easily added to improve the
performance(complications may increase).
15. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Complications in peer-to-peer
Biggest complication: consistency
When you can write to two different places
you run the risk that two people will
attempt to update the same record at the
same time—a write-write conflict.
Inconsistencies on read lead to problems
but at least they are relatively transient.
Inconsistent writes are forever.
16. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Peer-to-peer replication
17. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Peer-to-peer replication
we can ensure that whenever we write data, the
replicas coordinate to ensure we avoid a conflict.
We don’t need all the replicas to agree on the write,
just a majority, so we can still survive losing a
minority of the replica nodes.
we can decide to cope with an inconsistent write.
18. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Sharding
A busy data store is busy because different peo
ple are accessing different parts of the dataset.
In these circumstances we can support
horizontal scalability by putting different
parts of the data onto different
servers —a technique that’s called sharding
19. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Sharding
Sharding puts different data on different nodes
20. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Sharding: Ideal Case
We have different users all talking to different
server nodes.
Each user only has to talk to one server, so
gets rapid responses from that server.
The load is balanced out nicely between
servers—for example, if we have ten servers,
each one only has to handle 10% of the load.
21. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
we have to ensure that data that’s accessed together is clump
ed together on the same node and that these clumps are arran
ged on the nodes to provide the best data access.
How to clump the data? using aggregate aggregates are
used to m to combine data that’s commonly accessed
together—so aggregates leap out as an obvious unit of distri
bution.)
To increase performance of aggregates:
1) physical location where aggregates are stored is important
2) Even loading: try to arrange aggregates so they are evenly
distributed across the nodes which all get equal amounts of the
load
22. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Replication and sharding are ortho
gonal techniques: You can use
either or both of them
23. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Many NoSQL databases offer
auto-sharding, where the database
takes on the responsibility of allocatin
g data to shards and ensuring that dat
a access goes to the right shard. This
can make it much easier to use
sharding in an application.
24. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Sharding is particularly valuable for
performance. It can improve both
read and write performance.
A way to horizontally scale writes.
25. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Combining Sharding and Replication
If we use both master-slave replication
and sharding, this means that we have
multiple masters, but each data item
only has a single master Depending on
your configuration, you may choose a
node to be a master for some data and
slaves for others, or you may dedicate no
des for master or slave duties.
26. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Combining Sharding and Replication
27. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Combining Sharding and Replication
It is good for column-family databases.
Example : tens or hundreds of nodes in a
cluster with data sharded over them.
A good starting point for peer-to-peer
replication is to have a replication factor of 3,
so each shard is present on three nodes.
Should a node fail, then the shards on that
node will be built on the other nodes
28. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Peer to peer replication with sharding
29. Prepared by Dr. Dipali Meher
Source: NoSQL Distilled
Thank You
Notes de l'éditeur
Read intensive: more and more read requests
Write intensive: more and more write requests
read resilience:Should the master fail, the slaves can still handle read requests.
Again, this is useful if most of your data access is reads.
The failure of the master does eliminate the ability to handle writes until either the master is restored or a new master is appointed.
However, having slaves as replicates of the master does speed up recovery after a failure of the master since a slave can be appointed a
new master very quickly.
meaning of Resilience : the capacity to recover quickly from difficulties(toughness)
A transient database object exists only as long as an application has an open connection to the database.
All transient objects disappear when the application shuts down the database.
This means that a persistent database does not need to be re-indexed after re-opening.
data clump" is a name given to any group of variables which are passed around together (in a clump) throughout various parts of the program.
a clump is a grouping.