Mongo db roma replication and sharding

MongoDB – Roma
12 Luglio 2012
Replication and Sharding:
Hands on

Guglielmo Incisa

Replication
• What is it
– Data is replicated (cloned) into at least two nodes
– Updates are sent to one node (Primary) and automatically propagated
to the others (Secondary)
– Connection can through a router or directly to the Primary (Secondary
is read only)
• If we connect our app server to the Primary we must deal with its failure and
reconnect to the new Primary

Primary

App server DB

Router

Replication
• Why we need it
– If one node fails the application server can still work without any
impact
– The router will automatically manage the connection to the rest of the
nodes (router may be subject to failure though)

Primary

App server DB

Router

Replication
• Why we need it
– More and more IT departments are moving from
• Big, proprietary, reliable and expensive servers
– To
• Commodity Hardware (smaller, less reliable, inexpensive servers: PC)
– Commodity hardware is less reliable but our users demand that our
applications be always available: the replication can help.
– Example: how many servers do I need to have 99,999% of availability?
• If for example a PC has 98% availability (8 days if downtime in a year, or 98%
probability to be down)
• -> Two replicated PC have 99,96% of availability
• -> Three replicated PC have more than 99,999% (Telecom Grade / Core Network).

Sharding
• What is it
– Data is partitioned and distributed to different nodes
• Some records are in node 1, others in node 2 etc…
– MongoDB Sharding: the partition is based on a field.
• Database: test2
– Table: testSchema1
– Fields:
» owner: owner of the file, key and shard key (string)
» date (string)
» tags (list of string)
» keywords: words in the document, created by java code below (list of string)
» fileName (string)
» content: the file (binary)
» ascii: the file (string)

Sharding
• Why we need it
– Servers with smaller storage
– To increase responsiveness by increasing parallelism

Router

Owner: A-H Owner: I-O Owner: P-Z

Replication and Sharding
• Can we have both?
– MongoDB: yes!
• Our example:

Shard A: 2 + arbiter

Config process

Shard B: 2 + arbiter

Router
mongos
Shard C: 2 + arbiter

Replication and Sharding
• Replication:
– Two nodes and an arbiter
• The arbiter is needed when a number of even nodes are used, it decides which server is Primary and which
one is secondary, manages the upgrade when one is down

• Sharding
– Three sets: A, B, C
– Config Process:
• <<The config servers store the cluster's metadata, which includes basic information on each shard server and
the chunks contained therein.>>
– Routing Process:
• <<The mongos process can be thought of as a routing and coordination process that makes the various
components of the cluster look like a single system. When receiving client requests, the mongos process routes
the request to the appropriate server(s) and merges any results to be sent back to the client.>>

Setup 1
• Start Servers and arbiters
– Create /data/db, db2, db3, db4, db5, db6, db7, db8 ,db9, configdb
– --nojournal speeds up the startup (journalling is default in 64 bit)
• Replica set A
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA –nojournal
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db2
--port 27021 –nojournal
– Arbiter:
Shard A: 2 + arbiter
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSA --dbpath /data/db7
--port 27031 –nojournal

• Replica set B
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db3 -- Shard B: 2 + arbiter
port 27023 –nojournal
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db4 --
– Arbiter: Shard C: 2 + arbiter
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSB --dbpath /data/db8 --

• Replica set C
– ./mongodb-linux-x86_64-2.0.4/bin/mongod --shardsvr --replSet DSSC --dbpath /data/db5 --
– Arbiter:
port 27039 --nojournal

Setup 2
• Set the replicas, connect to each primary and set the configuration
• Set replica A
./mongodb-linux-x86_64-2.0.4/bin/mongo --port 27018
cfg = {
_id : "DSSA",
members : [
{_id : 0, host : “hostname:27018"},
{_id : 1, host : "hostname:27021"},
{_id : 2, host : "hostname:27031", arbiterOnly:true}
]
}
rs.initiate(cfg)
db.getMongo().setSlaveOk()

• Set replica B
cfg = {
_id : "DSSB",
members : [
{_id : 2, host : "hostname:27035", arbiterOnly:true}
]
}
rs.initiate(cfg)

• Set replica C
cfg = {
_id : "DSSC",
members : [
{_id : 2, host : "hostname:27039", arbiterOnly:true},
]
}
rs.initiate(cfg)

Setup 3
• Star config server
./mongodb-linux-x86_64-2.0.4/bin/mongod --configsvr --nojournal

• Start router
./mongodb-linux-x86_64-2.0.4/bin/mongos --configdb grog:27019 --chunkSize 1

• Configure Shards
./mongodb-linux-x86_64-2.0.4/bin/mongo admin
db.runCommand( { addshard : "DSSA/hostname:27018, hostname:27021"})
db.runCommand( { addshard : "DSSB/hostname:27023, hostname:27025"})
db.runCommand( { addshard : "DSSC/hostname:27027, hostname:27029"})
db.runCommand( { enablesharding : "test2"})
db.runCommand( { shardcollection : "test2.testSchema1",key : { owner : 1}})

• Load data…

– We load 11 documents, sharding is done over the “owner”

MapReduce
• "Map" step: The master node takes the input, divides it into smaller sub-
problems, and distributes them to worker nodes. A worker node may do
this again in turn, leading to a multi-level tree structure. The worker node
processes the smaller problem, and passes the answer back to its master
node.
• "Reduce" step: The master node then collects the answers to all the sub-
problems and combines them in some way to form the output – the
answer to the problem it was originally trying to solve.
• Source: Wikipedia
•

MapReduce
• map = function(){
if(!this.keywords){
return;
}
for (index in this.keywords){
emit(this.keywords[index],1);
}
}
• reduce = function(previous,current){
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}
• result = db.runCommand({
"mapreduce" : "testSchema1",
"map":map,
"reduce":reduce,
"out":"keywords"})
db.keywords.find()
mongos> db.keywords.find({_id:“hello"})

Check Sharding
• Connect to router and count the records:
./mongodb-linux-x86_64-2.0.4/bin/mongo admin
mongos>use test2
mongos>db,testSchema1.count()
11
• Connect to each primary (and see the number of records in each shard):
mongo>use test2
Mongo>db,testSchema1.count()
4
mongo>use test2
mongo>db,testSchema1.count()
4
mongo>use test2
mongo>db,testSchema1.count()
3

Check Replication
• Kill Server 1 (=Primary A)
• Connect to router and count the records:
mongos>use test2
mongos>db,testSchema1.count()
11
• Check if (Server 2) Secondary A in now primary
• Load a new chunck
• Counting will be 22
• Restart killed server (Server 1) , wait
• Kill the other one (Server 2), Primary A
• Check that Server 1 is Primary again
• Counting will still be 22
• Restart Server 2

Mongo db roma replication and sharding

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Mongo db roma replication and sharding

Similaire à Mongo db roma replication and sharding (20)

Dernier

Dernier (20)

Mongo db roma replication and sharding