2. What is replication?
• Replication is a way of keeping identical copies
of data on multiple servers and it is
recommended for all production deployments.
• Primary Server: Making standalone as primary
server with no: 27017
Secondary Servers: Started two servers with
port numbers: 27020 and 27021
3.
4.
5.
6. Best practices for replication
Replica sets are MongoDB's mechanism to provide redundancy, high availability, and higher read throughput
under the right conditions.
Replication in MongoDB is easy to configure and light in operational terms:
Always use replica sets: Even if your dataset is at the moment small and you don't expect it to grow
exponentially, you never know when that might happen. Also, having a replica set of at least three servers
helps design for redundancy, separating work loads between real time and analytics (using the
secondaries)and having data redundancy built from day one.
Use a replica set to your advantage: A replica set is not just for data replication. We can and should in most
cases use the primary server for writes and preference reads from one of the secondaries to offload the
primary server. This can be done by setting read preference for reads, together with the correct
writeconcern to ensure writes propagate as needed.
Use an odd number of replicas in a MongoDB replica set: If a server does downor loses connectivity with the
rest of them (network partitioning), the rest have tovote as to which one will be elected as the primary
server. If we have an odd number of replica set members, we can guarantee that each subset of servers
knows if they belong to the majority or the minority of the replica set members. If we can't have an odd
number of replicas, we need to have one extra host set as anarbiter with the sole purpose of voting in the
election process. Even a microinstance in EC2 could serve this purpose
7. Step by Step MongoDB Replication setup on Windows
Important Notes: Before going to setup MongoDB replication, please take backup all important.
1. Start standalone server as shown below.
mongod --dbpath "C:Program FilesMongoDBServer4.0data" --logpath "C:Program FilesMongoDBServer4.0logmongod.log" --port 27017 --storageEngine=wiredTiger --journal --replSet testdb
2. Connect to the server with port number 27017
mongo --port 27017
3. Then, create variable rsconf.
rsconf={_id:"testdb",members:[{_id:0,host:"localhost:27017"}]}
rs.initiate(rsconf)
Note: Here i am configuring replication on single windows machine. If you have three different machines, then localhost with name or IP address and port number.
4. Start secondary server on the port 27020.
mongod --dbpath "C:data1db" --logpath "C:data1logmongod.log" --port 27020 --storageEngine=wiredTiger --journal --replSet testdb
5. Logon to secondary server.
mongo --port 27020
6. Start secondary server on the port 27021.
mongod --dbpath "C:data2db" --logpath "C:data2logmongod.log" --port 27021 --storageEngine=wiredTiger --journal --replSet testdb
7. Logon to secondary server.
mongo --port 27021
8. Run the following commands on Primary server.
rs.add("localhost:27020")
rs.add("localhost:27020")
9. Now go to secondary servers and run below command on both the secondary servers.
rs.slaveOk()
8. Replication Setup verification
Create a collection primary server and verify this change will reflect on secondary servers or not.
1. Connect to primary server.
use ecom
2. Create a collection in Primary
db.test.insert({name:"MongoDB"})
1. Now connect to secondary servers and check the list of the database by running command
show dbs
2. Switch to the newly created database.
use ecom
3. run the command against ecom database.
db.test.find().pretty().
9. MongoDB replication setup :
1. Keep data backup /etc/hosts and /etc/mongod.conf
2. Configure hosts/
3. Configure firewall
4. Configure MongoDB Replica Set
5. Initiate Replication
6. Test the replication
In this mongodb replication setup step by step on linux, following are my ip addresses and
their host names respectively.
• 192.168.152.135 mongodb1
• 192.168.152.141 mongodb2
• 192.168.152.142 mongodb3
Mongodb replication setup step
by step on linux
10. Step1: Keep data backup /etc/hosts and /etc/mongod.conf
cp /etc/hosts hosts_before_mod
cp /etc/mongod.conf mongod_before_mod
Step2: Configure hosts/ - all servers
Edit the /etc/hosts file in all replica servers and add below lines.
vi /etc/hosts
Then add below lines and save hosts file.
192.168.152.135 mongodb1
192.168.152.141 mongodb2
192.168.152.142 mongodb3
OS command : Hostname & ifconfig
Step3: Configure firewall in all nodes in replica set
Now install ufw(uncomplicated firewall), if not installed by using below command.
sudo apt install ufw
sudo ufw enable
Now enable the port 27017 on all replica nodes.
sudo ufw allow 27017
Verify firewall opened for port 27017 or not
sudo ufw statsu verbose
Step1: Keep data backup
/etc/hosts and /etc/mongod.conf
Step2: Configure hosts/ - all
servers
Step3: Configure firewall in all
nodes in replica set
11. Step4: Configure MongoDB Replica Set
This can be done by modifying the /etc/mongod.conf configuration file. In this step, we add bindIp and replica set name.
vi /etc/mongod.conf
Then add the ip address of your host to the field bindIp
Before modification of bindIp
# network interfaces
net:
port: 27017
bindIp: 127.0.0.1
After modification of bindIp
# network interfaces
net:
port: 27017
bindIp: 192.168.152.135 #127.0.0.1
Now enable replication and add replica set name:
Before replica set name added:
#replication:
After replica set name added: Remove the hash mark and field replSetName. replSetName is case sensitive.
replication:
replSetName: "testdb"
Important note: There should be single space after colon(:) and two spaces before the replSetName
repeat the above steps of 4th step in all replica nodes.
Step4: Configure MongoDB
Replica Set
12. Step5: Initiate Replication on three nodes
We have to restart MongoDB servers on all three nodes by using below command:
systemctl restart mongod.service
systemctl status mongod.service
Connect to the mongodb servers.
Then execute the command:
rs.initiate()
Press Enter twice.
After this add another nodes to replica set.
rs.add("192.168.152.141")
rs.add("192.168.152.142")
Now verify the replication status:
rs.status()
Step5: Initiate Replication on
three nodes
13. Now create a database on primary server. Then create a collection in that database and test this
changes have been updated in the secondary servers or not.
Goto Primary node.
a) Create a database
testdb>:PRIMARY> use test
switched to db test
testdb>:PRIMARY> db
test
>:PRIMARY>
b) Create a collection in the above database:
db.eclhur.insert({"name":"Elchuru"})
Then run the command
show dbs
c) Now switch to Secondary nodes and type show dbs. If the new database reflected in the seconday
nodes then replication setup is successfull otherwise something went wrong and need to
troubleshooted.
testdb>:SECONDARY> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
test 0.000GB
New database replicated in the secondary note. So replication test successful.
Test the replication
14. Check the status of Replication -- Run below command from the primary replica node to get complete info of the replica
set.
rs.conf()
rs.status()
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 2,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 0,
"host" : "localhost:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "localhost:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "localhost:27019",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
Check the status of Replication
16. Add new MongoDB instance to a replica set
Start Primary MongoDB client and run below command
Syntax: rs.add(“hostname:port”)
Example:
rs0:PRIMARY> rs.add("localhost:27016")
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1591094195, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1591094195, 1)
}
17. Remove existing MongoDB instance from the replica set
The below command will remove the required secondary host from the replica set.
Syntax: rs.remove("localhost:27017")
Example:
rs0:PRIMARY> rs.remove("localhost:27016")
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1591095681, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1591095681, 1)
}
rs0:PRIMARY>
18. Make Primary as Secondary replica set
MongoDB provides a command to instruct primary replica to become a secondary replica set.
Syntax: rs.stepDown( stepDownSecs , secondaryCatchupSecs )
Example:
rs0:PRIMARY> rs.stepDown(12)
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1591096055, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1591096055, 1)
}
rs0:SECONDARY>
19. Check the Replica Lag between primary and Secondary
The below command will be used to check the replication lag between all replica set from the primary.
Syntax: rs.printSlaveReplicationInfo()
Example:
rs0:PRIMARY> rs.printSlaveReplicationInfo()
source: localhost:27018
syncedTo: Tue Jun 02 2020 16:14:04 GMT+0530 (India Standard Time)
0 secs (0 hrs) behind the primary
source: localhost:27019
syncedTo: Thu Jan 01 1970 05:30:00 GMT+0530 (India Standard Time)
1591094644 secs (441970.73 hrs) behind the primary
source: localhost:27016
syncedTo: Tue Jun 02 2020 16:14:04 GMT+0530 (India Standard Time)
0 secs (0 hrs) behind the primary
rs0:PRIMARY>
20. Monitoring- Replication
• Replication lag. Replication lag refers to delays in copying data from the
primary node to a secondary node.
• Replica state. The replica state is a method of tracking if secondary nodes
have died, and if there was an election of a new primary node.
• Locking state. The locking state shows what data locks are set, and the
length of time they have been in place.
• Disk utilization. Disk utilization refers to disk access.
• Memory usage. Memory usages refers to how much memory is being
used, and how it is being used.
• Number of connections. The number of connections the database has
open in order to serve requests as quickly as possible.
22. MongoDB
Sharding
• Sharding is the architecture to store big data in distributed servers.
• In MongoDB, sharding maintains a huge data and is mostly used for
massively growing space requirement. Now big applications are
based on the end to end transactional data, which is growing day by
day and the requirement of space is rapidly increasing.
• Just because of the increase in information storage, a single machine
is not able to deal with the huge storage capacity. We have to share
the information in chunks between different servers.
• In mongo, sharding provides horizontal scale-up application
architecture by which we can divide information upon different
servers.
• With the help of sharding, we can connect multiple servers with the
current instance of the database to support growing information
easily. This architecture maintains a load of information
automatically upon connected servers.
• A single shard represents as a single instance of the database and
collectively it becomes a logical database. As much the cluster
grows-up with a combination of the different shard, accordingly the
responsibility of each shard becomes lesser.
• For Example, we have to store 1GB of information in MongoDB. In
the Sharding architecture, if we have four shards, then each will hold
250MB and if we have two shards then each will hold 512MB.
23. MongoDB
Sharding Key
While implementing sharding in MongoDB we have to
define the key which will be treated as the primary key
for the shared instance.
For Example, if we have a collection of student
information of a particular class consisting of 14
students, along with which, we have two shard
instances.
Sharding Key
Then the same collection is divided between these
shards having 7/7 documents. To bind these two shard
instances we have a common key which will reflect the
relationship between these documents that will be
known as the shard key. It may be numeric, compound
or based on any hash.
24. What is
Sharding in
MongoDB?
Sharding is a concept in MongoDB, which splits
large data sets into small data sets across
multiple MongoDB instances.
Sometimes the data within MongoDB will be so
huge, that queries against such big data sets can
cause a lot of CPU utilization on the server. To
tackle this situation, MongoDB has a concept of
Sharding, which is basically the splitting of data
sets across multiple MongoDB instances.
The collection which could be large in size is
actually split across multiple collections or Shards
as they are called. Logically all the shards work as
one collection.
25. How to Implement Sharding
Shards are implemented by using clusters which are nothing but a group of MongoDB instances.
The components of a Shard include
1.A Shard – This is the basic thing, and this is nothing but a MongoDB instance which holds the subset of the data. In production environments, all
shards need to be part of replica sets.
2.Config server – This is a mongodb instance which holds metadata about the cluster, basically information about the various mongodb instances
which will hold the shard data.
3.A Router – This is a mongodb instance which basically is responsible to re-directing the commands send by the client to the right servers.
Step by Step Sharding Cluster Example
Step 1) Create a separate database for the config server.
mkdir /data/configdb
Step 2) Start the mongodb instance in configuration mode. Suppose if we have a server named Server D which would be our configuration server, we
would need to run the below command to configure the server as a configuration server.
mongod –configdb ServerD: 27019
Step 3) Start the mongos instance by specifying the configuration server
mongos –configdb ServerD: 27019
Step 4) From the mongo shell connect to the mongo's instance
mongo –host ServerD –port 27017
Step 5) If you have Server A and Server B which needs to be added to the cluster, issue the below commands
sh.addShard("ServerA:27017")
sh.addShard("ServerB:27017")
Step 6) Enable sharding for the database. So if we need to shard the Employeedb database, issue the below command
sh.enableSharding(Employeedb)
Step 7) Enable sharding for the collection. So if we need to shard the Employee collection, issue the below command
Sh.shardCollection("db.Employee" , { "Employeeid" : 1 , "EmployeeName" : 1})
Summary:
•As explained in tutorial, Sharding is a concept in MongoDB, which splits large data sets into small data sets across multiple MongoDB instances.
•https://www.guru99.com/mongodb-vs-mysql.html
26. Sharding
• What is Sharding in MongoDB?
• Sharding is a concept in MongoDB, which splits large data sets into
small data sets across multiple MongoDB instances.
• Sometimes the data within MongoDB will be so huge, that queries
against such big data sets can cause a lot of CPU utilization on the
server. To tackle this situation, MongoDB has a concept of Sharding,
which is basically the splitting of data sets across multiple MongoDB
instances.
• The collection which could be large in size is actually split across
multiple collections or Shards as they are called. Logically all the
shards work as one collection.
27. Why Sharding?
• In replication, all writes go to master node
• Latency sensitive queries still go to master
• Single replica set has limitation of 12 nodes
• Memory can't be large enough when active dataset is big
• Local disk is not big enough
• Vertical scaling is too expensive
28. Sharding in MongoDB
The following diagram shows the Sharding in MongoDB using sharded cluster.
30. Sharding in MongoDB
• In the following diagram, there are three main components −
• Shards − Shards are used to store data. They provide high availability and
data consistency. In production environment, each shard is a separate
replica set.
• Config Servers − Config servers store the cluster's metadata. This data
contains a mapping of the cluster's data set to the shards. The query router
uses this metadata to target operations to specific shards. In production
environment, sharded clusters have exactly 3 config servers.
• Query Routers − Query routers are basically mongo instances, interface
with client applications and direct operations to the appropriate shard. The
query router processes and targets the operations to shards and then
returns results to the clients. A sharded cluster can contain more than one
query router to divide the client request load. A client sends requests to
one query router. Generally, a sharded cluster have many query routers.
31. Step by Step Sharding Cluster Example
Step 1) Create a separate database for the config server.
mkir /data/configdb
Step 2) Start the mongodb instance in configuration mode.
mongod –configdb ServerD: 27019
Step 3) Start the mongos instance by specifying the configuration server
mongos –configdb ServerD: 27019
Step 4) From the mongo shell connect to the mongo's instance
mongo –host ServerD –port 27017
Step 5) If you have Server A and Server B which needs to be added to the cluster, issue the below commands
sh.addShard("ServerA:27017")
sh.addShard("ServerB:27017")
Step 6) Enable sharding for the database. So if we need to shard the Employeedb database, issue the below
command
sh.enableSharding(Employeedb)
Step 7) Enable sharding for the collection. So if we need to shard the Employee collection, issue the below
command
Sh.shardCollection("db.Employee" , { "Employeeid" : 1 , "EmployeeName" : 1})