SlideShare une entreprise Scribd logo
1  sur  186
MongoDB – Sharded Cluster
Tutorial
Antonios Giannopoulos and Jason Terpko
DBA’s @ Rackspace/ObjectRocket
linkedin.com/in/antonis/ | linkedin.com/in/jterpko/
1
Introduction
www.objectrocket.com
2
Antonios Giannopoulos Jason Terpko
Overview
• Cluster Components
• Shard Key
• Shard Key Selection
• Balancing
• Chunk Management
• Troubleshooting
• Zones
• Backups & Disaster Recovery
• Use Cases
• Labs
www.objectrocket.com
3
www.objectrocket.com
4
Labs
1. Unzip the provided .vdi file
2. Install and or Open VirtualBox
3. Select New
4. Enter A Name
5. Select Type: Linux
6. Select Version: Red Hat (64-bit)
7. Set Memory to 4096 (if possible)
8. Select "Use an existing ... disk file", select the provided .vdi file.
9. Select Create
10. Select Start
11. CentOS will boot and auto-login
12. Navigate to /Percona2017/Lab01/exercises for the first lab.
https://goo.gl/ddaVdS
Key Terms
www.objectrocket.com
5
Replication:
Replica-set: A group of mongod processes that maintain the same data set
Primary: Endpoint for writes
Secondary: Pull and Replicates changes from Primary (oplog)
Election: The process that changes the Primary on a replica-set
Partitioning:
Partition: A logical separation of a data set (may also be physical) on a server
Partition key: A field (or a combination of fields) used to distinct partition
Sharding:
Similar to partitioning with the difference that partitions(chunks) may located on different servers
(shards)
Sharding: A field (or a combination of fields) used to distinct chunks
Sharded Cluster
www.objectrocket.com
6
s1 s2
Sharded Cluster
Deployment ● Deploying
○ Shards
○ Configuration Servers
○ Mongos
● Major / Minor Upgrade
● Major / Minor Downgrade
www.objectrocket.com
7
Deploying - Recommended setup
www.objectrocket.com
8
Minimum requirements: 1 mongos, 1 config server , 1 shard (1 mongod process)
Recommended setup: 2+ mongos, 3 config servers, 1 shard (3 node replica-set)
Ensure connectivity between machines involved in the sharded cluster
Ensure that each process has a DNS name assigned - Not a good idea to use IPs
If you are running a firewall you must allow traffic to and from mongoDB instances
Example using iptables:
iptables -A INPUT -s <ip-address> -p tcp --destination-port <port> -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -d <ip-address> -p tcp --source-port <port> -m state --state ESTABLISHED -j ACCEPT
Deploying - Config Servers
www.objectrocket.com
9
Create a keyfile:
● keyfile must be between 6 and 1024 characters long
● Generate a 1024 characters key openssl rand -base64 756 > <path-to-keyfile>
● Secure the keyfile chmod 400 <path-to-keyfile>
● Copy the keyfile to every machine involves to the sharded cluster
Create the config servers:
- Before 3.2 config servers could only run on SCCC topology
- On 3.2 config servers could run on either SCCC or CSRS
- On 3.4 only CSRS topology supported
- CSRS mode requires WiredTiger as the underlying storage engine
- SCCC mode may run on either WiredTiger or MMAPv1
Minimum configuration for CSRS
mongod --keyFile <path-to-keyfile> --configsvr --replSet <setname> --dbpath <path>
Deploying - Config Servers
www.objectrocket.com
10
Baseline configuration for config server (CSRS)
net:
port: '<port>'
processManagement:
fork: true
security:
authorization: enabled
keyFile: <keyfile location>
sharding:
clusterRole: configsvr
replication:
replSetName: <replicaset name>
storage:
dbPath: <data directory>
systemLog:
destination: syslog
Deploying - Config Servers
www.objectrocket.com
11
Login on one of the config servers using the localhost exception.
Initiate the replica set:
rs.initiate(
{
_id: "<replSetName>",
configsvr: true,
members: [
{ _id : 0, host : "<host:port>" },
{ _id : 1, host : "<host:port>" },
{ _id : 2, host : "<host:port>" }
]})
Check the status of the replica-set using rs.status()
Deploying - Shards
www.objectrocket.com
12
Create shard(s):
- For production environments use a replica set with at least three members
- For test environments replication is not mandatory
- Start three (or more) mongod process with the same keyfile and replSetName
- ‘sharding.clusterRole’: shardsrv is mandatory in MongoDB 3.4
- Note: Default port for mongod instances with the shardsvr role is 27018
Minimum configuration for shard
mongod --keyFile <path-to-keyfile> --shardsvr --replSet <replSetname> --dbpath <path>
- Default storage engine is WiredTiger
- On a production environment you have to populate more configuration variables , like oplog size
Deploying - Shards
www.objectrocket.com
13
Baseline configuration for shard
net:
port: '<port>'
processManagement:
fork: true
security:
authorization: enabled
keyFile: <keyfile location>
sharding:
clusterRole: shardsrv
replication:
replSetName: <replicaset name>
storage:
dbPath: <data directory>
systemLog:
destination: syslog
Deploying - Shards
www.objectrocket.com
14
Login on one of the shard members using the localhost exception.
Initiate the replica set:
rs.initiate(
{
_id: "<replSetName>",
members: [
{ _id : 0, host : "<host:port>" },
{ _id : 1, host : "<host:port>" },
{ _id : 2, host : "<host:port>" }
]})
Check the status of the replica-set using rs.status()
Create local user administrator (shard scope): { role: "userAdminAnyDatabase", db: "admin" }
Create local cluster administrator (shard scope): roles: { "role" : "clusterAdmin", "db" : "admin" }
Be greedy with "role" [ { "resource" : { "anyResource" : true }, "actions" : [ "anyAction" ] }]
Deploying - Mongos
www.objectrocket.com
15
Deploy mongos:
- For production environments use more than one mongos
- For test environments a single mongos is fine
- Start three (or more) mongos process with the same keyfile
Minimum configuration for mongos
mongos --keyFile <path-to-keyfile> --config <path-to-config>
net:
port: '50001'
processManagement:
fork: true
security:
keyFile: <path-to-keyfile>
sharding:
configDB: <path-to-config>
systemLog:
destination: syslog
Deploying - Mongos
www.objectrocket.com
16
Login on one of the mongos using the localhost exception.
- Create user administrator (shard scope): { role: "userAdminAnyDatabase", db: "admin" }
- Create cluster administrator (shard scope): roles: { "role" : "clusterAdmin", "db" : "admin" }
Be greedy with "role" [ { "resource" : { "anyResource" : true }, "actions" : [ "anyAction" ] }]
What about config server user creation?
- All users created against the mongos are saved on the config server’s admin database
- The same users may be used to login directly on the config servers
- In general (with few exceptions), config database should only be accessed through the mongos
Deploying - Sharded Cluster
www.objectrocket.com
17
Login on one of the mongos using the cluster administrator
- sh.status() prints the status of the cluster
- At this point shards: should be empty
- Check connectivity to your shards
Add a shard to the sharded cluster:
- sh.addShard("<replSetName>/<host:port>")
- You don’t have to define all replica set members
- sh.status() should now display the newly added shard
- Hidden replica-set members are not apear on the sh.status() output
You are now ready to add databases and shard collections!!!
Deploying - Recommended Setup
www.objectrocket.com
18
- Use Replica Set for shards
- Use at least three data nodes for the Replica Sets
- Use three config servers on SCCC topology
- Use more than one mongos
- Use DNS names instead of IP
- Use a consistent network topology
- Make sure you have redundant NTP servers
- Always use authentication
- Always use authorization and give only the necessary privileges to users
Upgrading - Categories
www.objectrocket.com
19
Upgrade minor versions
- Without changing config servers topology
- For example: 3.2.15 to 3.2.16 or 3.4.6 to 3.4.7
- Change config servers topology from SCCC to CSRS
- For example: 3.2.15 to 3.2.16
Upgrade major versions
- Without changing config servers topology
- For example: 3.0.14 to 3.2.12
- Change config servers topology from SCCC to CSRS
- For example: 3.2.12 to 3.4.2
Upgrading – Best practices
www.objectrocket.com
20
Best Practices (not mandatory)
- Upgrades heavily involve binary swaps
- Keep all mongoDB releases under /opt
- Create a symbolic link of /opt/mongodb point on your desired version
- Binary swap may be implemented by changing the symbolic link
> ll
lrwxrwxrwx. 1 mongod mongod 34 Mar 24 14:16 mongodb -> mongodb-linux-x86_64-rhel70-3.4.1/
drwxr-xr-x. 3 mongod mongod 4096 Mar 24 12:06 mongodb-linux-x86_64-rhel70-3.4.1
drwxr-xr-x. 3 mongod mongod 4096 Mar 21 14:12 mongodb-linux-x86_64-rhel70-3.4.2
> unlink mongodb;ln -s mongodb-linux-x86_64-rhel70-3.4.2/ mongodb
>ll
lrwxrwxrwx. 1 root root 34 Mar 24 14:19 mongodb -> mongodb-linux-x86_64-rhel70-3.4.2/
drwxr-xr-x. 3 root root 4096 Mar 24 12:06 mongodb-linux-x86_64-rhel70-3.4.1
drwxr-xr-x. 3 root root 4096 Mar 21 14:12 mongodb-linux-x86_64-rhel70-3.4.2
>echo 'pathmunge /opt/mongodb/bin' > /etc/profile.d/mongo.sh; chmod +x /etc/profile.d/mongo.sh
Upgrading - Checklist
www.objectrocket.com
21
- Configuration Options changes: For example, sharding.chunkSize configuration file setting and --chunkSize
command-line option removed from the mongos
- Deprecated Operations: For example, Aggregate command without cursor deprecated in 3.4
- Topology Changes: For example, Removal of Support for SCCC Config Servers
- Connectivity changes: For example, Version 3.4 mongos instances cannot connect to earlier versions of
mongod instances
- Tool removals: For example, In MongoDB 3.4, mongosniff is replaced by mongoreplay
- Authentication and User management changes: For example, dbAdminAnyDatabase can't be applied to
local and config databases
- Driver Compatibility Changes: Please refer to the compatibility matrix for your driver version and language
version.
Upgrading – Minor Versions
www.objectrocket.com
22
Upgrade minor versions - Without changing config servers topology
Upgrade 3.2.x to 3.2.y or 3.4.x to 3.4.y
1) Stop the balancer sh.stopBalancer() and check that is not running
2) Upgrade the shards. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start
3) Upgrade the shards. Perform a stepdown and upgrade the ex-Primaries by stop;replace binaries;start
4) Upgrade the config servers. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start
5) Upgrade the config servers. Perform a stepdown and upgrade the ex-Primary by stop;replace binaries;start
6) Upgrade the mongos in a rolling fashion by stop;replace binaries;start
Rollback 3.2.y to 3.2.x or 3.4.y to 3.4.x
1) Stop the balancer sh.stopBalancer() and check that is not running
2) Downgrade the mongos in a rolling fashion by stop;replace binaries;start
3) Downgrade the config servers. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start
4) Downgrade the config servers. Perform a stepdown and downgrade the ex-Primary by stop;replace binaries;start
5) Downgrade the shards. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start
6) Downgrade the shards. Perform a stepdown and Downgrade the ex-Primaries by stop;replace binaries;start
Minor Versions Upgrade (3.4.x to 3.4.y)
www.objectrocket.com
23
s1 s2
3. Upgrade mongos
2. Upgrade config servers
1. Upgrade shards
Stop the balancer
Start the balancer
Minor Versions Downgrade (3.4.x to 3.4.y)
www.objectrocket.com
24
s1 s2
1. Downgrade mongos
2. Downgrade config servers
3. Downgrade shards
Stop the balancer
Start the balancer
Upgrading - Major Versions
www.objectrocket.com
25
Upgrade major versions - Without changing config servers topology
Upgrade 3.2.x to 3.4.y
1) Stop the balancer sh.stopBalancer() and check that is not running
2) Upgrade the config servers. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start
3) Upgrade the config servers. Perform a stepdown and upgrade the ex-Primary by stop;replace binaries;start
4) Upgrade the shards. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start
5) Upgrade the shards. Perform a stepdown and upgrade the ex-Primaries by stop;replace binaries;start
6) Upgrade the mongos in a rolling fashion by stop;replace binaries;start
Enable backwards-incompatible 3.4 features db.adminCommand( { setFeatureCompatibilityVersion: "3.4" } )
It is recommended to wait for a small period of time before enable the backwards-incompatible feautures.
Includes:
- Views
- Collation option on collection and indexes
- Data type decimal
- New version indexes (v:2 indexes)
Major Versions Upgrade (3.2.x to 3.4.y)
www.objectrocket.com
26
s1 s2
3. Upgrade mongos
1. Upgrade config servers
2. Upgrade shards
Stop the balancer
Start the balancer
Downgrade - Major Versions
www.objectrocket.com
27
Upgrade major versions - Without changing config servers topology
Rollback 3.4.y to 3.2.x
1) Stop the balancer sh.stopBalancer() and check that is not running
2) Downgrade the mongos in a rolling fashion by stop;replace binaries;start
3) Downgrade the shards. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start
4) Downgrade the shards. Perform a stepdown and Downgrade the ex-Primaries by stop;replace binaries;start
5) Downgrade the config servers. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start
6) Downgrade the config servers. Perform a stepdown and downgrade the ex-Primary by stop;replace binaries;start
If backwards-incompatible 3.4 features are enabled before downgrade:
- db.adminCommand({setFeatureCompatibilityVersion: "3.2"})
- Remove Views
- Remove Collation Option from Collections and Indexes
- Convert Data of Type Decimal
- Downgrade Index Versions
Major Versions Downgrade (3.4.x to 3.2.y)
www.objectrocket.com
28
s1 s2
1. Downgrade mongos
3. Downgrade config servers
2. Downgrade shards
Stop the balancer
Start the balancer
Downgrading - Major Versions
www.objectrocket.com
29
Rollback 3.4.y to 3.2.x
Remove Views
Find views on a sharded cluster:
db.adminCommand("listDatabases").databases.forEach(function(d){
let mdb = db.getSiblingDB(d.name);
mdb.getCollectionInfos({type: "view"}).forEach(function(c){
print(mdb[c.name]);
});
});
In each database that contains views, drop the system.views collection to drop all views in that database.
Downgrading - Major Versions
www.objectrocket.com
30
Rollback 3.4.y to 3.2.x
Remove Collation Option from Collections and Indexes
Find collections with non-simple collations
db.adminCommand("listDatabases").databases.forEach(function(d){
let mdb = db.getSiblingDB(d.name);
mdb.getCollectionInfos( { "options.collation": { $exists: true } } ).forEach(function(c){
print(mdb[c.name]);});});
Find indexes with collation specification
db.adminCommand("listDatabases").databases.forEach(function(d){
let mdb = db.getSiblingDB(d.name);
mdb.getCollectionInfos().forEach(function(c){
let currentCollection = mdb.getCollection(c.name);
currentCollection.getIndexes().forEach(function(i){
if (i.collation){
printjson(i);}});});});
Downgrading - Major Versions
www.objectrocket.com
31
Rollback 3.4.y to 3.2.x
Remove Collation Option from Collections and Indexes
Indexes: Drop and recreate the indexes
Collection: Rewrite the entire collection:
- Stream the collection using $out
- Stream the collection using dump/restore
- Change the definition on one of the secondaries
Downgrading - Major Versions
www.objectrocket.com
32
Rollback 3.4.y to 3.2.x
Convert Data of Type Decimal
Diagnostic utility: db.collection.validate(full)
Could be impactful as it performs a full scan.
"errmsg" : "Index version does not support NumberDecimal"
How you are going to convert the Type Decimal depends on your application.
- Change it to string
- Change to a two fields format (using a multiplier)
- Change to a two fields format (store each part of the decimal on different fields)
Downgrading - Major Versions
www.objectrocket.com
33
Rollback 3.4.y to 3.2.x
Find version 2 indexes
db.adminCommand("listDatabases").databases.forEach(function(d){
let mdb = db.getSiblingDB(d.name);
mdb.getCollectionInfos().forEach(function(c){
let currentCollection = mdb.getCollection(c.name);
currentCollection.getIndexes().forEach(function(i){
if (i.v === 2){
printjson(i);}});});});
Must be performed on both shards and config servers
Re-index the collection after change the setFeatureCompatibilityVersion to 3.2
- _id index in the foreground
- Must be executed on both Primary and Secondaries
Upgrading - Major Versions
www.objectrocket.com
34
Upgrade major versions - Change the config servers topology - 3.2.x to 3.4.y
1) Use MongoDB version 3.2.4 or higher
2) Disable the balancer
3) Connect a mongo shell to the first config server listed in the configDB setting of the mongos and run rs.initiate()
rs.initiate( {
_id: "csReplSet",
configsvr: true,
version: 1,
members: [ { _id: 0, host: "<host>:<port>" } ]
} )
4) Restart this config server as a single member replica set with:
mongod --configsvr --replSet csReplSet --configsvrMode=sccc --storageEngine <storageEngine> --port <port> --
dbpath <path> or the equivalent config file settings
Upgrading - Major Versions
www.objectrocket.com
35
Upgrade major versions - Change the config servers topology - 3.2.x to 3.4.y
5) Start the new mongod instances to add to the replica set:
- must use the WiredTiger storage engine
- Do not add existing config servers to the replica set
- Use new dbpaths for the new instances
- If the config server is using MMAPv1, start 3 new mongod instances
- If the config server is using WiredTiger, start 2 new mongod instances
6) Connected to the replica set config server and add the new mongod instances as non-voting, priority 0 members:
- rs.add( { host: <host:port>, priority: 0, votes: 0 } )
- Wait for the initial sync to complete (SECONDARY state)
7) Shut down one of the other non-replica set config servers (2nd or 3rd)
8) Reconfigure the replica set to allow all members to vote and have default priority of 1
Upgrading - Major Versions
www.objectrocket.com
36
Upgrade major versions - Change the config servers topology - 3.2.x to 3.4.y
9) Step down the first config server and restart without the sccc flag
10) Restart mongos instances with updated --configdb or sharding.configDB setting
11) Verify that the restarted mongos instances are aware of the protocol change
12) Cleanup:
- Remove the first config server using rs.remove()
- Shutdown 2nd and 3rd config servers
Next steps are similar to slide Upgrade major versions - Without changing config servers topology
Change from SCCC to CSRS
www.objectrocket.com
37
1) rs.initiate
2) Restart with --replSet
3) Start 3 new cfgsrv
4) Add new cfgsrv to RS
5) Shutdown a cfgsrv
6) rs.reconfig
7) rs.stepdown and restart
8) Restart mongos
9) Cleanup
Downgrading - Major Versions
www.objectrocket.com
38
Downgrade major versions - Change the config servers topology (3.2.x to 3.0.y)
1) Disable the balancer
2) Remove or replace incompatible indexes:
- Partial indexes
- Text indexes version 3
- Geo Indexes version 3
3) Check minOpTimeUpdaters value on every shard
- Must be zero
- If it's different to zero without any active migration ongoing, a stepdown needed
4) Keep only two config servers secondaries and set their votes and priority equal to zero , using rs.reconfig()
5) Stepdown config server primary
db.adminCommand( { replSetStepDown: 360, secondaryCatchUpPeriodSecs: 300 })
6) Stop the world - shut-down all mongos/shards/config servers
Downgrading - Major Versions
www.objectrocket.com
39
Downgrade major versions - Change the config servers topology(3.2.x to 3.0.y)
7) Restart each config server as standalone
8) Start all shards and change the protocolVersion=0
9) Downgrade the mongos to 3.0 and change the configsrv to SCCC
10) Downgrade the config servers to 3.0
11) Downgrade each shard - one at a time
- Remove the minOpTimeRecovery document from the admin.system.version
- Downgrade the secondaries and then issue a stepdown
- Downgrade former primaries
Change from CSRS to SCCC
www.objectrocket.com
40
1)Remove minOpTimeUpdaters
2) rs.reconfig() set priority=0
3) rs.stepdown()
4) Stop the world
5) Start cfgsrv as standalone
6) Start all shards
7) Start the mongos with SCCC
www.objectrocket.com
41
Lab 01 1. Unzip the provided .vdi file
2. Install and or Open VirtualBox
3. Select New
4. Enter A Name
5. Select Type: Linux
6. Select Version: Red Hat (64-bit)
7. Set Memory to 4096 (if possible)
8. Select "Use an existing ... disk file", select the provided .vdi file.
9. Select Create
10. Select Start
11. CentOS will boot and auto-login
12. Navigate to /Percona2017/Lab01/exercises for the first lab.
https://goo.gl/ddaVdS
Setup A Sharded Cluster
Shard Key
● What is it?
● Limitations
● Chunks
● Metadata
○ Change Log
○ Collections
○ Chunks
www.objectrocket.com
42
Shard Key
www.objectrocket.com
43
This is how MongoDB distributes and finds documents for a sharded collection.
Considerations of a good shard key:
• Fields exist on every document
• Indexed field or fields for fast lookup
• High cardinality for good distribution of data across the cluster
• Random to allow for increased throughput
• Usable by the application for targeted operations
Shard Key Limitations
www.objectrocket.com
44
There are requirements when defining a shard key that can affect both the cluster and application.
• Key is immutable
• Value is also immutable
• For collections containing data an index must be present
• Prefix of a compound index is usable
• Ascending order is required
• Update and findAndModify operations must contain shard key
• Unique constraints must be maintained by shard key or prefix of shard key
• Hashed indexes can not enforce uniqueness, therefore are not allowed
• A non-hashed indexed can be added with the unique option
• A shard key cannot contain special index types (i.e. text)
Shard Key Limitations
www.objectrocket.com
45
Shard key must not exceed the 512 bytes
The following script will reveal documents with long shard keys:
db.<collection>.find({},{<shard_key>:1}).forEach(function(shardkey){
size = Object.bsonsize(shardkey) ;
if (size>512){print(shardkey._id)}
})
Mongo will allow you to shard the collection even if you have existing shard keys over 512 bytes buton
the next insert with a shard key > 512 bytes:
"code" : 13334,"errmsg" : "shard keys must be less than 512 bytes, but key <shard key> is ... bytes"
Shard Key Limitations
www.objectrocket.com
46
Shard Key Index Type
A shard key index can be an ascending index on the shard key, a compound index that start with
the shard key and specify ascending order for the shard key, or a hashed index.
A shard key index cannot be an index that specifies a multi-key index, a text index or a geospatial
index on the shard key fields.
If you try to shard with a -1 index you will get an error:
“Unsupported shard key pattern. Pattern must either be a single hashed field, or a list of ascending fields”
If you try to shard with “text”, “multi-key” or “geo” you will get an error:
“couldn't find valid index for shard key”
Shard Key Limitations
www.objectrocket.com
47
Unique Indexes
Sharded collections may support up to one unique index. The shard key MUST be a prefix of the
unique index
If you attempt to shard a collection with more than one unique indexes or using a field different than
the unique index an error will be produced:
”Can't shard collection <col name> with unique index on <field1:1> and proposed shard key <field2:2>.
Uniqueness can't be maintained unless shard key is a prefix"
To maintain more than one unique indexes use a proxy collection.
Client generated _id is unique by design if you are using custom _id you must preserve uniqueness
from the application tier.
Shard key Limitations
www.objectrocket.com
48
Field(s) must exist on every document
If you try to shard a collection with null on shard key an exception will be produced:
"found missing value in key { : null } for doc: { _id: <value>}"
On compound shard keys none of fields is allowed to have null values.
A handy script to identify NULL values is the following. You need to execute it for each of the shard
key fields:
db.<collection_name>.find({<shard_key_element>:{$exists:false}})
A potential solution is to replace NULL with a dummy value that your application will read as NULL.
Be careful because “dummy NULL” might create a hotspot.
Shard key Limitations
www.objectrocket.com
49
Updates and FindAndModify must use the shard key
Updates and FindAndmodify must use the shard key on the query predicates
If an update of FAM executed on a field different than the shard key or _id the following error is been
produced
"A single update on a sharded collection must contain an exact match on _id (and have the collection default collation) or
contain the shard key (and have the simple collation). Update request: { q: { <field1> }, u: { $set: { <field2> } }, multi: false,
upsert: false }, shard key pattern: { <shard_key>: 1 }”
For update operations the workaround is to use the {multi:true} flag.
For FindAndModify {multi:true} flag doesn’t exist.
For upserts _id instead of <shard key> is not applicable.
Shard key Limitations
www.objectrocket.com
50
Sharding Existing Collection Data Size
You can’t shard collections that their size violate the maxCollectionSize as defined below:
maxSplits = 16777216 (bytes) / <average size of shard key values in bytes>
maxCollectionSize (MB) = maxSplits * (chunkSize / 2)
Maximum Number of Documents Per Chunk to Migrate
MongoDB cannot move a chunk if the number of documents in the chunk exceeds:
• 250000 documents or 1.3 times the result of dividing the configured chunk size by the
avgObjSize
For example, with an avgObjSize of 512 bytes and chunk size of 64MB a chunk is considered Jumbo
with 170394 documents.
Shard key Limitations
www.objectrocket.com
51
Operation not allowed on a sharded collection
• group collection method Deprecated since version 3.4 - Use mapReduce or
aggregate instead
• db.eval() Deprecated since version 3.0
• $where does not permit references to the db object from the $where function. This
is uncommon in un-sharded collections.
• $isolated update modifier
• $snapshot queries
• The geoSearch command
Chunks
www.objectrocket.com
52
The purpose of shard key is to create chunks.
The logic partition your collection is divided into and how data is distributed across the cluster.
• Maximum size is defined in config.settings
• Default 64MB
• Hardcoded maximum document count of 250,000
• Chunk map is stored in config.chunks
• Continuous range from MinKey to MaxKey
• Chunk map is cached at both the mongos and mongod
• Query Routing
• Sharding Filter
• Chunks distributed by the Balancer
• Using moveChunk
• Up to maxSize
Chunk
www.objectrocket.com
53
{
"_id" : "mydb.mycoll-uuid_"00005cf6-1217-4414-935b-cf1bde09cc77"",
"lastmod" : Timestamp(1, 5),
"lastmodEpoch" : ObjectId("570733145f2bf94777a62155"),
"ns" : "mydb.mycoll",
"min" : {
"uuid" : "00005cf6-1217-4414-935b-cf1bde09cc77"
},
"max" : {
"uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e"
},
"shard" : ”s1"
}
config.chunks
Additional Sharding Metadata
www.objectrocket.com
54
Collections
• Historical Sharding and Balancing Activity
• config.changelog
• config.actionlog
• Additional Balancing Settings
• config.settings
• All databases sharded (and non-sharded)
• config.databases
• Balancer Activity
• config.locks
• Shards
• config.shards
Shard Key Selection
● Profiling
● Cardinality
● Throughput
○ Targeted Operations
○ Broadcast Operations
● Anti-Patterns
● Data Distribution
www.objectrocket.com
55
Shard key selection
www.objectrocket.com
56
Famous quotes about sharding:
“Sharding is an art”
“There is no such thing as the perfect shard key”
“Friends don't let friends use sharding”
Shard key selection
www.objectrocket.com
57
Profiling
Identify Patterns
Constraints/Best
practices
Code changes
Apply shard
key(s)
Shard key selection
www.objectrocket.com
58
Profiling
Identify Patterns
Constraints/Best
practices
Code changes
Apply shard
key(s)
Shard key selection–Profiling
www.objectrocket.com
59
Profiling will help you identify your workload.
Enable statement profiling on level 2 (collects profiling data for all database operations)
db.getSiblingDB(<database>).setProfilingLevel(2)
To collect a representative sample you might have to increase profiler size
db.getSiblingDB(<database>).setProfilingLevel(0)
db.getSiblingDB(<database>).system.profile.drop()
db.getSiblingDB(<database>).createCollection( "system.profile", { capped: true, size: <size>} )
db.getSiblingDB(<database>).setProfilingLevel(2)
*size is in bytes
Shard key selection–Profiling
www.objectrocket.com
60
If you don’t have enough space you may either:
- Periodically dump the profiler and restore to a different instance
- Use a tailable cursor and save the output on a different instance
Profiler adds overhead to the instance due to extra inserts
When analyzing the output is recommended to:
- Dump the profiler to another instance using a non-capped collection
- Keep only the useful dataset
- Create the necessary indexes
Shard key selection–Profiling
www.objectrocket.com
61
{op:"query" , ns:<db.col>} , reveals find operations
Example: { "op" : "query", "ns" : "foo.foo", "query" : { "find" : "foo", "filter" : { "x" : 1 } ...
{op:"insert" , ns:<db.col>} , reveals insert operations
Example: { "op" : "insert", "ns" : "foo.foo", "query" : { "insert" : "foo", "documents" : [ { "_id" :
ObjectId("58dce9730fe5025baa0e7dcd"), "x" : 1} ] ...
{op:"remove" , ns:<db.col>} , reveals remove operations
Example: {"op" : "remove", "ns" : "foo.foo", "query" : { "x" : 1} ...
{op:"update" , ns:<db.col>} , reveals update operations
Example: { "op" : "update", "ns" : "foo.foo", "query" : { "x" : 1 }, "updateobj" : { "$set" : { "y" : 2 } } ...
Shard key selection–Profiling
www.objectrocket.com
62
{ "op" : "command", "ns" : <db.col>, "command.findAndModify" : <col>} reveals FAMs
Example: { "op" : "command", "ns" : "foo.foo", "command" : { "findAndModify" : "foo", "query" : { "x" : "1" },
"sort" : { "y" : 1 }, "update" : { "$inc" : { "z" : 1 } } }, "updateobj" : { "$inc" : { "z" : 1 } } ...
{"op" : "command", "ns" : <db.col>, "command.aggregate" : <col>}: reveals aggregations
Example: { "op" : "command", "ns" : "foo.foo", "command" : { "aggregate" : "foo", "pipeline" : [ { "$match" : {
"x" : 1} } ] ...
{ "op" : "command", "ns" : <db.col>, "command.count" : <col>} : reveals counts
Example: { "op" : "command", "ns" : "foo.foo", "command" : { "count" : "foo", "query" : { "x" : 1} } ...
More commands: mapreduce, distinct, geoNear are following the same pattern
Be aware that profiler format may be different from major version to major version
Shard key selection
www.objectrocket.com
63
Profiling
Identify Patterns
Constraints/Best
practices
Code changes
Apply shard
key(s)
Shard key selection–Patterns
www.objectrocket.com
64
Identify the workload nature (type of statements)
db.system.profile.aggregate([{$match:{ns:<db.col>}},{$group: {_id:"$op",number : {$sum:1}}},{$sort:{number:-1}}])
var cmdArray = ["aggregate", "count", "distinct", "group", "mapReduce", "geoNear", "geoSearch", "find", "insert", "update", "delete",
"findAndModify", "getMore", "eval"];
cmdArray.forEach(function(cmd) {
var c = "<col>";
var y = "command." + cmd;
var z = '{"' + y + '": "' + c + '"}';
var obj = JSON.parse(z);
var x = db.system.profile.find(obj).count();
if (x>0) {
printjson(obj);
print("Number of occurrences: "+x);}
});
Shard key selection–Patterns
www.objectrocket.com
65
Identify statement patterns
var tSummary = {}
db.system.profile.find( { op:"query",ns : {$in : ['<ns>']}},{ns:1,"query.filter":1}).forEach(
function(doc){
tKeys=[];
if ( doc.query.filter === undefined) {
for (key in doc.query.filter){
tKeys.push(key)
}
}
else{
for (key in doc.query.filter){
tKeys.push(key)
}
}
sKeys= tKeys.join(',')
if ( tSummary[sKeys] === undefined){
tSummary[sKeys] = 1
print("Found new pattern of : "+sKeys)
print(tSummary[sKeys])
}else{
tSummary[sKeys] +=1
print("Incremented "+sKeys)
print(tSummary[sKeys])
}
print(sKeys)
tSummary=tSummary
}
)
Shard key selection–Patterns
www.objectrocket.com
66
At this stage you will be able to create a report similar to:
Collection <Collection Name> - Profiling period <Start time> , <End time> - Total statements: <num>
Number of Inserts: <num>
Number of Queries: <num>
Query Patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Number of Updates: <num>
Update patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Number of Removes: <num>
Remove patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Number of FindAndModify: <num>
FindandModify patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Shard key selection–Patterns
www.objectrocket.com
67
Using the statement’s patterns you may identify shard key candidates
Shard key candidates may be a single field ({_id:1}) or a group of fields ({name:1, age:1}).
Note down the amount and percentage of operations that each shard key candidate can “scale”
Note down the important operations on your cluster
Order the shard key candidates by descending <importance>,<scale%>
Identify exceptions and constraints as described on shard key limitations
If a shard key candidate doesn’t satisfy a constraint you may either:
- Remove it from the list, OR
- Make the necessary changes on your schema
Shard key selection
www.objectrocket.com
68
Profiling
Identify Patterns
Constraints/Best
practices
Code changes
Apply shard
key(s)
Shard key selection- Constraints
www.objectrocket.com
69
Shard key must not have NULL values:
db.<collection>.find({<shard_key>:{$exists:false}}) , in the case of a compound key each element must
be checked for NULL
Shard key is immutable:
Check if collected updates or FindAndModify have any of the shard key components on “updateobj”:
db.system.profile.find({"ns" : <col>,"op" : "update" },{"updateobj":1}).forEach(function(op) { if
(op.updateobj.$set.<shard_key> != undefined ) printjson(op.updateobj);})
db.system.profile.find({"ns" : <col>,"op" : "update" },{"updateobj":1}).forEach(function(op) { if
(op.updateobj.<shard_key> != undefined ) printjson(op.updateobj);})
Assuming you are running a replica set the oplog may also proven useful
Shard key selection- Constraints
www.objectrocket.com
70
Updates must use the shard key or _id on the query predicates:
We can identify the query predicates using the exported patterns
Shard key must have good cardinality
db.<col>.distinct(<shard_key>).length OR
db.<col>.aggregate([{$group: { _id:"$<shard_key>" ,n : {$sum:1}}},{$count:"n"}])
We define cardinality as <total docs> / <distinct values>
The closer to 1 the better, for example a unique constraint has cardinality of 1
Cardinality is important for splits. Fields with cardinality close to 0 may create indivisible jumbo chunks
Shard key selection- Constraints
www.objectrocket.com
71
Check for data hotspots
db.<col>.aggregate([{$group: { _id:"$<shard_key>" ,number : {$sum:1}},{$sort:{number:-
1}},{$limit:100}},{allowDiskUse:true}])
We don’t want a single range to exceed the 250K document limit
We have to try and predict the future based on the above values
Check for operational hotspots
db.system.profile.aggregate([{$group: { _id:"$query.filter.<shard_key>" ,number :
{$sum:1}},{$sort:{number:-1}},{$limit:100}},{allowDiskUse:true}])
We want uniformity as much as possible. Hotspoted ranges may affect scaling
Shard key selection- Constraints
www.objectrocket.com
72
Check for monotonically increased fields:
Monotonically increased fields may affect scaling as all new inserts are directed to maxKey chunk
Typical examples: _id , timestamp fields , client-side generated auto-increment numbers
db.<col>.find({},{<shard_key>:1}).sort({$natural:1})
Workarounds: Hashed shard key or use application level hashing or compound shard keys
Shard key selection- Constraints
www.objectrocket.com
73
Throughput:
Targeted operations
- Operations that using the shard key and will access only one shard
- Adding shards on targeted operations will increase throughput linearly (or almost linearly)
- Ideal type of operation on a sharded cluster
Scatter-gather operations
- Operations that aren’t using the shard key or using a part of the shard key
- Will access more than one shard - sometimes all shards
- Adding shards won’t increase throughput linearly
- Mongos opens a connection to all shards / higher connection count
- Mongos fetches a larger amount of data on each iteration
- A merge phase is required
- Unpredictable statement execution behavior
- Queries, Aggregations, Updates, Removes (Inserts and FAMs are always targeted)
Shard key selection-Constraints
www.objectrocket.com
74
Throughput:
Scatter-gather operations are not always “evil’’
- Sacrifice reads to scale writes on a write intensive workload
- Certain read workloads like Geo or Text searches are scatter-gather anyway
- Maybe is faster to divide an operation into N pieces and execute them in parallel
- Smaller indexes & dataset
- Other datastores like Elastic are using scatter-gather for queries
Shard key selection-Constraints
www.objectrocket.com
75
Data distribution
Factors that may cause an imbalance:
- Poor cardinality
- Good cardinality with data hotspots
- TTL indexes
- Data pruning
- Orphans
Always plan ahead
- Try to simulate the collection dataset in 3,6 and 12 months
- Consider your deletion strategy before capacity becomes an issue
Shard key selection-Anti-Patterns
www.objectrocket.com
76
- Monotonically increased fields
- _id or timestamps
- Use hashed or compound keys instead
- Poor cardinality field(s)
- country or city
- Eventually leads to jumbos
- Use Compound keys instead
- Operational hotspots
- client_id or user_id
- Affect scaling
- Use Compound keys instead
- Data hotspots
- client_id or user_id
- Affect data distribution may lead to jumbo
- Use Compound keys instead
Shard key selection
www.objectrocket.com
77
Profiling
Identify Patterns
Constraints/Best
practices
Code changes
Apply shard
key(s)
Shard key selection-Code Changes
www.objectrocket.com
78
Remove unique constraints
- Sharded collections can only support one unique index
- Use a satellite collection for multiple unique indexes
- Change your updates to use the unique index rather than the “_id”
- Careful of custom _id
Change statements
- Make sure FAM are using the shard key
- Make sure update are using the shard key or the {multi:true}
- Remove geoSearch, eval(), $isolate, $snapshot, group and $where
Schema changes
- Change null on shard key with “dummy” values
- Implement an Insert+Delete functionality for changing shard keys
Shard key selection
www.objectrocket.com
79
Profiling
Identify Patterns
Constraints/Best
practices
Code changes
Apply shard
key(s)
www.objectrocket.com
80
Lab 02
Lab 02. Navigate to /Percona2017/Lab02/exercises
https://goo.gl/ddaVdS
Sharding Analysis
Balancing & Chunk
Management ● Sizing and Limits
● moveChunk
● Pre-Splitting
● Splitting
● Merging
www.objectrocket.com
81
Sizing and Limits
www.objectrocket.com
82
Under normal circumstances the default size is sufficient for most workloads.
• Maximum size is defined in config.settings
• Default 64MB
• Hardcoded maximum document count of 250,000
• Chunks that exceed either limit are referred to as Jumbo
• Can not be moved unless split into smaller chunks
• Ideal data size per chunk is (chunk_size/2)
• Chunks can have a zero size with zero documents
• Jumbo’s can grow unbounded
• Unique shard key guarantees no Jumbos (i.e. max document size is 16MB) post
splitting
dataSize
www.objectrocket.com
83
Estimated Size and Count (Recommended)
db.adminCommand({
dataSize: "mydb.mycoll",
keyPattern: { "uuid" : 1 },
min: { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" },
max: { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" }, estimate=true });
Actual Size and Count
db.adminCommand({
dataSize: "mydb.mycoll",
keyPattern: { "uuid" : 1 },
min: { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" },
max: { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" } });
Start moveChunk (>=3.4)
www.objectrocket.com
84
s1 s2
moveChunk
Build Indexes
www.objectrocket.com
85
s1 s2
createIndex
The destination shard builds all non-existing indexes for the collection.
Clone Collection
www.objectrocket.com
86
s1 s2
insert()
The destination shard requests documents from the source and inserts them into the collection.
Sync Collection
www.objectrocket.com
87
s1 s2
Synchronize _id’s
Operations executed on source shard after the start of the migration are logged for synchronizing.
Commit
www.objectrocket.com
88
s1 s2
Finalize the chunk migration by committing metadata changes.
Clean Up - RangeDeleter
www.objectrocket.com
89
This process is responsible for removing the chunk ranges that were moved by
the balancer (i.e. moveChunk).
● This thread only runs on the primary
● Is not persistent in the event of an election
● Can be blocked by open cursors
● Can have multiple ranges queued to delete(_waitForDelete: false)
● If a queue is present, the shard can not be the destination for a new chunk
Pre-splitting (Hashed)
www.objectrocket.com
90
Shard keys using MongoDB’s hashed index allow the use of numInitialChunks.
Hashing Mechanism
jdoe@example.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”)
Value 64-bits of MD5 64-bit Integer
Estimation
Size = Collection size (in MB)/32
Count = Number of documents/125000
Limit = Number of shards*8192
numInitialChunks = Min(Max(Size, Count), Limit)
1,600 = 51,200/32
800 = 100,000,000/125,000
32,768 = 4*8192
1600 = Min(Max(1600, 800), 32768)
Command
db.runCommand( { shardCollection: "mydb.mycoll", key: { "uid": "hashed" }, numInitialChunks : 5510 } );
Pre-splitting - splitAt()
www.objectrocket.com
91
When the shard key is not hashed but throughput requires pre-splitting with sh.splitAt().
1. In a testing environment shard the collection to generate split points.
a) Alternative: Calculate split points manually by analyzing the shard key field(s)
db.runCommand( { shardCollection: "mydb.mycoll", key: { "u_id": 1, "c": 1 } } )
db.chunks.find({ns: "mydb.mycoll"}).sort({min: 1}).forEach(function(doc) {
print("sh.splitAt('" + doc.ns + "', { "u_id": ObjectId("" + doc.min.u_id + ""), "c": "" + doc.min.c + '" });');
});
splitAt() - cont.
www.objectrocket.com
92
2. Identify enough split points to create N chunks which is at least 2x the number of shards and build a
JavaScript file containing your split points
3. Shard the collection and then execute the splitAt() commands to create chunks
db.runCommand( { shardCollection: "mydb.mycoll", key: { "u_id": 1, "c": 1 } } )
sh.splitAt('mydb.mycoll', { "u_id": ObjectId("581939bd1b0e4d73dd3d72db"), "c": ISODate("2017-01-31T12:02:45.532Z") });
sh.splitAt('mydb.mycoll', { "u_id": ObjectId("5800ed895c5a687c99c2f2ae"), "c": ISODate("2017-01-31T11:02:46.522Z") });
….
sh.splitAt('mydb.mycoll', { "u_id": ObjectId("5823859e5c5a6829cc3b09d0"), "c": ISODate("2017-01-31T13:22:48.522Z") });
sh.splitAt('mydb.mycoll', { "u_id": ObjectId("582f53315c5a68669f569c21"), "c": ISODate("2017-01-31T13:12:49.532Z") });
….
sh.splitAt('mydb.mycoll', { "u_id": ObjectId("581939bd1b0e4d73dd3d72db"), "c": ISODate("2017-01-31T17:32:41.552Z") });
sh.splitAt('mydb.mycoll', { "u_id": ObjectId("58402d255c5a68436b40602e"), "c": ISODate("2017-01-31T19:32:42.562Z") });
splitAt() - cont.
www.objectrocket.com
93
4. Using moveChunk distribute the chunks evenly.
Optional: Now that the chunks are distributed evenly run more splits to prevent immediate splitting.
db.runCommand({
"moveChunk": "mydb.mycoll",
"bounds": [{
"u_id": ObjectId("52375761e697aecddc000026"), "c": ISODate("2017-01-31T00:00:00Z")
}, {
"u_id": ObjectId("533c83f6a25cf7b59900005a"), "c": ISODate("2017-01-31T00:00:00Z")
}
],
"to": "rs1",
"_secondaryThrottle": true
});
Shard
Management ● Adding Shards
● Removing Shards
● Replacing Shards
○ Virtual
○ Physical
www.objectrocket.com
94
Shard Management-Operations
www.objectrocket.com
95
Add shard(s):
- Increase capacity
- Increase throughput or Re-plan
Remove shard(s):
- Reduce capacity
- Re-plan
Replace shard(s):
- Change Hardware specs
- Hardware maintenance
- Move to different underlying platform
Shard Mgmt – Add shard(s)
www.objectrocket.com
96
Command for adding shards sh.addShard(host)
Host is either a standalone or replica set instance
Alternatively db.getSiblingDB('admin').runCommand( { addShard: host} )
Optional parameters with runCommand:
maxSize (int) : The maximum size in megabytes of the shard
Name (string): A unique name for the shard
Localhost and hidden members can’t be used on host variable
Shard Mgmt-Remove shard(s)
www.objectrocket.com
97
Command for removing shards:
db.getSiblingDB('admin').runCommand( { removeShard: host } )
host’s data (sharded and unsharded) MUST migrated to other shards in the cluster
1) Ensure that the balancer is running
2) Execute db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } )
MongoDB will print:
"msg" : "draining started successfully",
"state" : "started",
"shard" : "<shard>",
"note" : "you need to drop or movePrimary these databases",
"dbsToMove" : [ ],
"ok" : 1
Shard Mgmt-Remove shard(s)
www.objectrocket.com
98
3) Balancer will now start move chunks from the <shard_name> to all other shards
4) Check the status using db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } ),
MongoDB will print:
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : <num_of_chunks_remaining>,
"dbs" : 1
},
"ok" : 1
}
Shard Mgmt-Remove shard(s)
www.objectrocket.com
99
5) Run the db.getSiblingDB('admin').runCommand( { removeShard: host }) for one last time
MongoDB will print:
{
"msg" : "removeshard completed successfully",
"state" : "completed",
"shard" : "<shard_name>",
"ok" : 1
}
If the shard is the primary shard for a database(s) it may or may not host unsharded
collections
Removal will only completed when moving unsharded data to a different shard
Shard Mgmt-Remove shard(s)
www.objectrocket.com
100
6) Check the status using db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } ),
MongoDB will print:
{
"msg" : "draining ongoing",
"state" : "ongoing",
"remaining" : {
"chunks" : NumberLong(0),
"dbs" : NumberLong(1) },
"note" : "you need to drop or movePrimary these databases",
"dbsToMove" : ["<database name>"],
"ok" : 1
}
Shard Mgmt-Remove shard(s)
www.objectrocket.com
101
7) Use the following command to movePrimary:
db.getSiblingDB('admin').runCommand( { movePrimary: <db name>, to: "<shard name>" })
MongoDB will print: {"primary" : "<shard_name>:<host>","ok" : 1}
8) After you move all databases you will be able to remove the shard:
db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } )
MongoDB will print:
{"msg" : "removeshard completed successfully",
"state" : "completed",
"shard" : "<shard_name>",
"ok" : 1}
Shard Mgmt-Remove shard(s)
www.objectrocket.com
102
movePrimary:
- requires Write Downtime for the unsharded collections
- Without Write Downtime you may encounter dataloss
- May impact performance (massive writes &foreground index builds)
Write downtime:
• Application layer
- Database layer
- - Stop all mongos
- - Drop databasesusers (restart the mongos)
flushrouterconfig is mandatory when you movePrimary - db.adminCommand({flushRouterConfig: 1})
Shard Mgmt-Remove shard(s)
www.objectrocket.com
103
Calculate average chunk moves time:
db.getSiblingDB(”config").changelog.aggregate([{$match:{"what" : "moveChunk.from"}},
{$project:{"time1":"$details.step 1 of 7", "time2":"$details.step 2 of 7","time3":"$details.step 3 of 7","time4":"$details.step
4 of 7","time5":"$details.step 5 of 7","time6":"$details.step 6 of 7","time7":"$details.step 7 of 7", ns:1}},
{$group:{_id: "$ns","avgtime": { $avg: {$sum:["$time",
"$time1","$time2","$time3","$time4","$time5","$time6","$time7"]}}}},
{$sort:{avgtime:-1}},
{ $project:{collection:"$_id", "avgt":{$divide:["$avgtime",1000]}}}])
Calculate the number of chunks to be moved:
db.getSiblingDB(”config").chunks.aggregate({$match:{"shard" : <shard>}},{$group: {_id:"$ns",number : {$sum:1}}})
Shard Mgmt-Remove shard(s)
www.objectrocket.com
104
Calculate non-sharded collection size:
function FindCostToMovePrimary(shard){
moveCostMB = 0; DBmoveCostMB = 0;
db.getSiblingDB('config').databases.find({primary:shard,}).forEach(function(d){
db.getSiblingDB(d._id).getCollectionNames().forEach(function(c){
if ( db.getSiblingDB('config').collections.find({_id : d._id+"."+c, key: {$exists : true} }).count() < 1){
x=db.getSiblingDB(d._id).getCollection(c).stats();
collectionSize = Math.round((x.size+x.totalIndexSize)/1024/1024*100)/100;
moveCostMB += collectionSize;
DBmoveCostMB += collectionSize;
}
else if (! /system/.test(c)) { }
})
print(d._id);
print("Cost to move database :t"+ DBmoveCostMB+"M");
DBmoveCostMB = 0;
});
print("Cost to move:t"+ moveCostMB+"M");};
Shard Mgmt-Remove shard(s)
www.objectrocket.com
105
s1 s2
1) {removeShard: ‘s2’}
Chunks
2) Drain chunks
3) Move Primary
5) {removeShard: ‘s2’}
4) flushRouterConfig
Database
? ? ?
Shard Mgmt-Replace shard(s)
www.objectrocket.com
106
Drain Method using javascript
1)Stop the balancer
2)Add the target shard <target>
3)Create and execute chunk moves from <source> to <target>
db.chunks.find({shard:"<shard>"}).forEach(function(chunk){print("db.adminCommand({moveChunk : '"+ chunk.ns +"' ,
bounds:[ "+ tojson(chunk.min) +" , "+ tojson(chunk.max) +"] , to:<target>'})");})
mongo <host:port> drain.js | tail -n +1 | sed 's/{ "$maxKey" : 1 }/MaxKey/' | sed 's/{ "$minKey" : 1 }/MinKey/' >
run_drain.js
4)Remove the <source> shard (movePrimary may required)
Shard Mgmt-Replace shard(s)
www.objectrocket.com
107
Drain Method using the Balancer
1) Stop the balancer
2) Add the target shard <target>
3) Run the remove command for <source> shard
4) Optional: Set the maxSize=1 for all shards but <source> and <target>
5) Start the balancer - it will move chunks from <source> to <target>
6) Finalize the <source> removal (movePrimary may required)
Shard Mgmt-Replace shard(s)
www.objectrocket.com
108
Replica-set extension
1) Stop the balancer
2) Add additional mongods on replica set <source>
3) Wait for the nodes to become fully sync
4) Perform a stepdown and promote one of the newly added nodes as the new Primary
5) Remove <source> nodes from the replica set
6) Start the balancer
Troubleshooting
● Hotspots
● Imbalanced Data
● Configuration Servers
● RangeDeleter
● Orphans
● Collection Resharding
www.objectrocket.com
109
Hotspotting
www.objectrocket.com
110
s1 s2
80% 20%
Possible Causes:
• Shard Key
• Unbalanced Collections
• Unsharded Collections
• Data Imbalance
• Randomness
Hotspotting - Profiling
www.objectrocket.com
111
Knowing the workload this is how you can identify your top u_id’s.
db.system.profile.aggregate([
{$match: { $and: [ {op:"update"}, {ns : "mydb.mycoll"} ] }},
{$group: { "_id":"$query.u_id", count:{$sum:1}}},
{$sort: {"count": -1}}, {$limit : 5 } ]);
{
"result" : [
{
"_id" : ObjectId('53fcc281a25cf72b59000013',
"count" : 28672
},
..........
], "ok" : 1 }
Hotspotting - SplitAt()
www.objectrocket.com
112
{
"_id" : "mydb.mycoll-u_id_ObjectId('533974d30daac135a9000049')",
"lastmod" : Timestamp(1, 5),
"lastmodEpoch" : ObjectId("570733145f2bf94777a62155"),
"ns" : "mydb.mycoll",
"min" : {
"u_id" : ObjectId('533974d30daac135a9000049')
},
"max" : {
"u_id" : ObjectId('545d29e74d32da4e290000cf'
},
"shard" : ”s1"
} { “u_id” : ObjectId('53fcc281a25cf72b59000013'), …
{ “u_id” : ObjectId('54062bd2a25cf7e051000017') , …
{ “u_id” : ObjectId('5432ada44d32dadcc400007b') , …
{ “u_id” : ObjectId('5433f7254d32daa1e3000198') , …
sh.splitAt()
Unbalanced Operations
Data Imbalance – Disk Usage
www.objectrocket.com
113
s1 s2
s1 s2
Unbalanced Resources
Data Imbalance
www.objectrocket.com
114
Common Causes
● Balancing has been disabled or window to small
○ sh.getBalancerState();
○ db.getSiblingDB("config").settings.find({"_id" : "balancer"});
● maxSize has been reached or misconfigured across the shards
○ db.getSiblingDB("config").shards.find({},{maxSize:1});
● Configuration servers in an inconsistent state
○ db.getSiblingDB("config").runCommand( {dbHash: 1} );
● Jumbo Chunks
○ Previously covered, chunks that exceed chunksize or 250,000 documents
● Empty Chunks
○ Chunks that have no size and contain no documents
● Unsharded Collections
○ Data isolated to primary shard for the database
● Orphaned Documents
Data Imbalance
www.objectrocket.com
115
s1 s2
80% 20%
s1 s2
Unbalanced Operations
Unbalanced Resources
Empty Chunks
{ "uuid" : "00005cf6-....." } -->> { "uuid" : "7fe55637-....." } on :
s1
100K Docs
@ 45MB
{ "uuid" : "7fe55637-....." } -->> { "uuid" : "7fe55742-....." } on :
s2
0 Docs
@ 0MB
Data Imbalance - Empty Chunks
www.objectrocket.com
116
Unevenly distributed remove or TTL operations
insert()
& split
insert()
& split
insert() -> split
->
remove()
insert() -> split
->
remove()
s1 s2 s3 s4
moveChunk
Data Imbalance - Merging
www.objectrocket.com
117
Using JavaScript the following process can resolve the imbalance.
1. Check balancer state, set to false if not already.
2. Identify empty chunks and their current shard.
a. dataSize command covered earlier in the presentation, record this output
3. Locate adjacent chunk and it's current shard, this is done by comparing max and min
a. The chunk map is continuous, there are no gaps between max and min
4. If shard is not the same move empty chunks to the shard containing the adjacent
chunk
a. Moving the empty chunk is cheap but will cause frequent metadata changes
5. Merge empty chunks into their adjacent chunks
a. Merging chunks is also non-blocking but will cause frequent meta changes
Consideration, how frequent is this type of maintenance required?
Configuration Servers - SCCC
www.objectrocket.com
118
Sync Cluster Connection Configuration
At times a configuration server can be corrupted, the following methods can be used to
fixed both configurations. This is applicable to versions <= 3.2 MMAP and 3.2 WiredTiger.
1. As pre-caution back up all three config servers via mongodump
2. If only one of the three servers is not functioning or out of sync
a. Stop one of the available configuration servers
b. Rsync the data directory to the broken configuration server and start mongod
3. If only the first configuration server is healthy or all three are out of sync
a. Open a Mongo and SSH session to the first configuration server
b. From the Mongo session use db.fsyncLock() to lock the mongod
c. From the SSH session make a copy of the data directory
d. In the existing Mongo session use db.fsyncUnlock() to unlock the mongod
e. Rsync the copy to the broken configuration servers and start mongod
Configuration Servers - CSRS
www.objectrocket.com
119
Config Servers as Replica Sets
With CSRS an initial sync from a another member of the cluster is the preferred method to
repair the health of the replica set. If for any reason the majority is lost the cluster metadata
will still be available but read-only.
CSRS became available in version 3.2 and mandatory in version 3.4.
RangeDeleter
www.objectrocket.com
120
This process is responsible for removing the chunk ranges that were moved by the
balancer (i.e. moveChunk).
● This thread only runs on the primary
● Is not persistent in the event of an election
● Can be blocked by open cursors
● Can have multiple ranges queued to delete
● If a queue is present, the shard can not be the destination for a new chunk
A queue being blocked by open cursors can create two potential problems:
● Duplicate results for secondary and secondaryPreferred read preferences
● Permanent Orphans
Range Deleter - Open Cursors
www.objectrocket.com
121
Log Line:
[RangeDeleter] waiting for open cursors before removing range [{ _id: -869707922059464413 }, {
_id: -869408809113996381 }) in mydb.mycoll, elapsed secs: 16747, cursor ids: [74167011554]
MongoDB does not provide a server side command to close cursors, in this example
74167011554 must be closed to allow RangeDeleter to proceed.
Solution using Python:
from pymongo import MongoClient
c = MongoClient('<host>:<port>')
c.the_database.authenticate('<user>','<pass>',source='admin')
c.kill_cursors([74167011554])
Orphans
www.objectrocket.com
122
moveChunk started and documents are being inserted into s2 from s1.
s1 s2
Commit
Range: [{ u_id: 100 }, { u_id: 200 }]
Orphans
www.objectrocket.com
123
s1 s2
s2 synchronizes the changes documents and s1 commits the meta data change.
Commit
Range: [{ u_id: 100 }, { u_id: 200 }]
Orphans
www.objectrocket.com
124
s1 s2
The delete process is asynchronous, at this time both shards contain the documents. Primary
read preference filters them but not secondary and secondaryPreferred.
RangeDeleter Waiting
On Open Cursor
Orphans
www.objectrocket.com
125
s1 s2
An unplanned election or or step down occurs on s1.
RangeDeleter Waiting
On Open Cursor
Orphans
www.objectrocket.com
126
s1 s2
The documents are now orphaned on s1.
Range: [{ u_id: 100 }, { u_id: 200 }]Range: [{ u_id: 100 }, { u_id: 200 }]
Orphans
www.objectrocket.com
127
moveChunk reads and writes to and from primary members. Primary members cache a
copy of the chunk map via the ChunkManager process.
To resolve the issue:
1. Set the balancer state to false.
2. Add a new shard to the cluster.
3. Using moveChunk move all chunks from s1 to sN.
4. When all chunks have been moved and RangeDeleter are complete, connect to
primary:
a. Run a remove() operation to removed remaining “orphaned” documents
b. Or: Drop and recreate empty collection with all indexes
Alternatively Use:
https://docs.mongodb.com/manual/reference/command/cleanupOrphaned/
Collection Resharding
www.objectrocket.com
128
Overtime query patterns and writes can make a shard key not optimal or the wrong shard key
was implemented.
Dump And Restore
● Typically most time consuming
● Requires write outage if changing collection names
● Requires read and write outage if keeping the same collection name
Forked Writes
● Fork all writes to separate collection, database, or cluster
● Reduces or eliminates downtime completely using code deploys
● Increases complexity from a development perspective
● Allows for rollback and read testing
Collection Resharding
www.objectrocket.com
129
Incremental Dump and Restores (Append Only)
● For insert only collections begin incremental dump and restores
● Requires a different name space but reduces the cut-over downtime
● Requires read and write outage if keeping the same collection name
Mongo to Mongo Connector
● Connectors tail the oplog for namespace changes
● These changes can be filtered and applied to another namespace or cluster
● Similar to the forked write approach but handled outside of the application
● Allows for a rollback and read testing
www.objectrocket.com
130
Lab 03
Lab 04
Lab 03. Navigate to /Percona2017/Lab03/exercises
Lab 04. Navigate to /Percona2017/Lab04/exercises
https://goo.gl/ddaVdS
Resharding and
Chunk Management
Miscellaneous
● User Management
● Zones
● Read Operations
● Connecting
www.objectrocket.com
131
User Management
● Users and Roles
○ Application Users
○ Admin Users
○ Monitoring
www.objectrocket.com
132
● Replica Set Access
Users And Roles
www.objectrocket.com
133
● Role based authentication framework with collection level granularity
○ Starting in version 3.4 views allow for finer granularity beyond roles
● To enable authentication for all components
○ Set security.authorization to enabled
○ Set security.keyFile to a file path containing a 6 - 1024 character random string
○ RBAC documents stored in admin.system.users, admin.system.roles, and
$external.users
● Once enabled the localhost exception can be used to create the initial admin user
○ For replica sets the admin database is stored and replicated to each member
○ For sharded clusters the admin data is stored on the configuration replica set
● Alternatively the user(s) can be created prior to enabling authentication
● External authentication is also available in the Enterprise* and Percona* distributions
○ SASL / LDAP*
○ Kerberos*
Cluster Authentication
www.objectrocket.com
134
s1 s2
Client
Replica Authentication
www.objectrocket.com
135
s1 s2
Client
Two Tier Authentication
www.objectrocket.com
136
s1 s2
Client
Discovery
Creating Users
www.objectrocket.com
137
use mydb
db.createUser({
user: "<user>",
pwd: "<secret>",
roles: [{ role: "readWrite", db: "mydb" }]
});
use admin
db.createUser({
user: "<user>",
pwd: "<secret>",
roles: [
{ role: "readWrite", db: "mydb1" },
{ role: "readWrite", db: "mydb2" },
{ role: "clusterMonitor", db: "admin" }
]
Application User Engineer
● Read/Write access to single database ● Read/Write access to multiple database
● Authentication scoped to admin
● Read-only cluster role
Built-In Roles
www.objectrocket.com
138
● Database
○ read
○ readWrite
● Database Administration
○ dbAdmin
○ dbOwner
○ userAdmin
● Backup And Restore
○ backup
○ restore
● Cluster Administration
○ clusterAdmin
○ clusterManager
○ clusterMonitor
○ hostManager
● All Databases
○ readAnyDatabase
○ readWriteAnyDatabase
○ userAdminAnyDatabase
○ dbAdminAnyDatabase
● Superuser
○ root
Creating Custom Roles
www.objectrocket.com
139
use admin
db.runCommand({ createRole: "<role>",
privileges: [
{ resource: { db: "local", collection: "oplog.rs" }, actions: [ "find" ] },
{ resource: { cluster: true }, actions: [ "listDatabases" ] }
],
roles: [ { role: "read", db: "mydb" } ]
});
Role Creation
• find() privilege on local.oplog.rs
• List all databases on the replica set
• Read only access on mydb
Granting Access
use admin
db.createUser( { user: "<user>", pwd: "<secret>",
roles: [ "<role>" ]
} );
Or
use admin
db.grantRolesToUser(
"<user>",
[ { role: "<role>", db: "admin" }]
);
Additional Methods
www.objectrocket.com
140
• db.updateUser()
• db.changeUserPassword()
• db.dropUser()
• db.grantRolesToUser()
• db.revokeRolesFromUser()
• db.getUser()
• db.getUsers()
User Management
• db.updateRole()
• db.dropRole()
• db.grantPrivilegesToRole()
• db.revokePrivilegesFromRole()
• db.grantRolesToRole()
• db.revokeRolesFromRole()
• db.getRole()
• db.getRoles()
Role Management
Zones
● Geographic Distribution
● Mixed Engine Topology
● Tiered Performance
www.objectrocket.com
141
Zones
www.objectrocket.com
142
A Zone is a range based on the shard key prefix, pre-3.4 was called Tag Aware
Sharding
A zone can be associated with one or more shards
A shard can be associated with any number of non-conflicting zones
Chunks covered by a zone will be migrated to associated shards.
We use zones for:
- Data isolation
- Geographically distributed clusters (keep data close to application)
- Heterogeneous hardware clusters
Zones
www.objectrocket.com
143
Zone : [“A”] Zone : [“B”] Zone : []
Zones:
[A] KEY: 1-10
[B] KEY: 10-20
{min , 10}
{10, 21}
{21, max}
Zones
www.objectrocket.com
144
Zone : [“A”] Zone : [“A”,“B”] Zone : [“C”]
Zones:
[A] KEY: 1-10
[B] KEY: 10-20
[C] KEY: 50-60
{min, 7}
{7, 30}
{31,max}
Zones
www.objectrocket.com
145
Define a zone:
sh.addTagRange(<collection>, { <shardkey>: min }, { <shardkey>: max }, <zone_name>)
Zone ranges are always inclusive of the lower boundary and exclusive of the upper boundary
The range ({ <shardkey>: min },{ <shardkey>: max }) must be a prefix of the shard key
Hashed keys are also supported - using the hash key ranges
Remove a zone:
There is no helper available yet - Manually remove from config.tags
db.getSiblingDB('config').tags.remove({_id:{ns:<collection>, min:{<shardkey>:min}},
tag:<zone_name>})
Zones
www.objectrocket.com
146
Add shards to a zone:
sh.addShardTag(<shard_name>, <zone_name>)
Remove zones from a shard:
sh.removeShardTag(<shard_name>, <zone_name>)
Manipulate zones may or may not cause chunk moves
Manipulate zones may be a long running operation
Zones
www.objectrocket.com
147
View existing zones/shards associations:
sh.status() list the tags next to the shards ,or
Query the config database db.getSiblingDB('config').shards.find({tags:<zone>})
View existing zones:
Query the config database db.getSiblingDB('config').tags.find({tags:<zone>})
Zones- Balancing
www.objectrocket.com
148
Balancer aims for an even chunk distribution
During a chunk migration:
- Balancer checks possible destination based on configured zones
- If the chunk range falls into a zone, it migrates the chunk to a shard within the zone
- Chunks that do not fall into a zone may exists on any shard
During balancing rounds balancer moves chunks that violate configured zones
Read Operations
● Covering Queries
● Sharding Filter
● Secondary Reads
www.objectrocket.com
149
Read Operations – Covered Queries
www.objectrocket.com
150
Covered query is a query that can be satisfied entirely using an index and does not have
to examine any documents
On unsharded collection a query on db.foo.find({x:1},{x:1,_id:0}) is covered by the {x:1}
index
On a sharded collection the query is covered if:
- {x:1} is the shard key, or
- compound index on {x:1,<shard_key>} exists
Read Operations – Covered Queries
www.objectrocket.com
151
Reading from primary db.foo.find({x:1},{x:1,_id:0}) using {x:1} index explain() reports:
"totalKeysExamined" : N
"totalDocsExamined" : N
Reading from a secondary db.foo.find({x:1},{x:1,_id:0}) is covered by the {x:1} index.
explain() reports:
"totalKeysExamined" : N,
"totalDocsExamined" : 0,
Read Operations – Sharding filter
www.objectrocket.com
152
Reading from Primaries:
- Queries must fetch the shard key to filter the orphans
- The cost of sharding filter reports on explain() plan "stage" : "SHARDING_FILTER”
- For operation that return a high number of results, sharding filter can be expensive
Secondaries don’t apply the sharding filter
But… on a sharded collection may return orphans or “on move documents”
You can’t fetch orphans when:
- Read preference is set to primary
- Read preference is set to secondary and shard key exists on read predicates
Read Operations – Sharding filter
www.objectrocket.com
153
Distinct and Count commands return orphans even with primary read preference
Always use aggregation framework instead of Distinct and Count on sharded collections
Reveal Orphans:
Compare "nReturned" of SHARDING_FILTER with "nReturned" of EXECUTION stage
When SHARDING_FILTER < EXECUTION orphans may exists
Compare count() with itcount()
When itcount() < count() orphans may exists
Read Operations – Sharding filter
www.objectrocket.com
154
A handy JavaScript that reveals orphans
db.getMongo().getDBNames().forEach(function(database)
{var re = new RegExp('config|admin'); if (!database.match(re)) {print("Checking database:
"+database);print("");db.getSiblingDB(database).getCollectionNames().forEach(function(co
llection){print("Checking collection: "+collection);
db.getSiblingDB(database).getCollection(collection).find().explain(true).executionStats.exe
cutionStages.shards.forEach(function(orphans)
{if (orphans.executionStages.chunkSkips>0){print("Shard " +
orphans.shardName + " has " + orphans.executionStages.chunkSkips +"
orphans")}})
})}
})
Read Operations
www.objectrocket.com
155
Find: Mongos merges the results from each shard
Aggregate: Runs on each shard and a random shard perform the merge
In older versions the mongos or the primary shard only was responsible for the merge
Sorting: Each shard sorts its own results and mongos performs a merge sort
Limits: Each shard applies the limit and then mongos re-apply it on the results
Skips: Each shard returns the full result set (un-skipped) and mongos applies the skip
Explain() output contains individual shard execution plan and the merge phase
It's not uncommon each shard to follow different execution plan
Connecting
● URI parameters
● Load Balancing
● New parameters
www.objectrocket.com
156
Connecting – Server discovery
www.objectrocket.com
157
URI Format mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
Server Discovery:
- The client measure the duration of an ismaster call (RTT)
- Variable "localThresholdMS" defines the acceptable RTT ,default is 15 millis.
- The driver load-balances randomly across mongos that fall within the localThresholdMS
serverSelectionTimeoutMS: how long to block for server selection before throwing an exception. The
default is 30 seconds.
heartbeatFrequencyMS: The interval between server checks, counted from the end of the previous
check until the beginning of the next one, default 10 seconds.
A client waits between checks at least minHeartbeatFrequencyMS. Default 500 millis.
Important: Each driver may use its own algorithm to measure average RTT
Connecting – Load Balancing
www.objectrocket.com
158
Driver:
- Increases heartbeat traffic/availability
Driver with virtual mongos pools:
- N number of mongos
- Each application server uses a portion , for example N/2
- Reduce heartbeat traffic/availability
Use a Load Balancer
- Hash based routing (avoid round-robin)
- Minimize heartbeat traffic
- Use more than one LB to avoid Single point of failure
- Be aware of hot app servers and client-side NAT
Connecting – Connection Pool
www.objectrocket.com
159
TaskExecutorPoolSize (SERVER-25027)
The number of connection pools to use for a given mongos. Defaults to 4 <=
NUM_CORES <= 64.
Decreasing the value will decrease the number of open connections
ShardingTaskExecutorPoolHostTimeoutMS (SERVER-25027)
How long is the wait before dropping all connections to a host that mongos haven’t
communicated with. Default 5 minutes
Increasing the value will smooth connection spikes
Connecting – Connection Pool
www.objectrocket.com
160
ShardingTaskExecutorPoolMaxSize (SERVER-25027)
The maximum number of connections to hold to a given host at any time. Default is
unlimited
Controls simultaneous operations from a mongos to a host (Poolsize*PoolMaxSize)
ShardingTaskExecutorPoolMinSize (SERVER-25027)
The minimum number of connections to hold to a given host at any time. Default is 1
Increasing the value will increase the number of open connections per mongod
(PoolMinSize*PoolSize*MONGOS)
Connecting – Connection Pool
www.objectrocket.com
161
ShardingTaskExecutorRefreshRequirement (SERVER-25027)
How long mongos will wait before attempting to heartbeat an idle connection in the pool.
Default 1 minute
Increasing the value will reduce the heartbeat traffic
Lower the value may decrease the connection failures
ShardingTaskExecutorRefreshTimeout (SERVER-25027)
How long mongos will wait before timing out a heartbeat. Default 20 seconds
Tuning this value up may help if heartbeats taking too long.
Connecting – Connection Pool
www.objectrocket.com
162
ShardingTaskExecutorPoolMaxConnecting (SERVER-29237)
Limit the rate at which mongosnodes add connectons to connection pools
Only N connections can be in the processing state at the same time (setup/refresh state)
Avoid overwhelming mongod nodes with connection requests
Per connection pool , C*N can be in setup/refresh
Our patch for SERVER-26654
setParameter:
taskExecutorPoolSize: 8
ShardingTaskExecutorPoolRefreshTimeoutMS: 600000
ShardingTaskExecutorPoolHostTimeoutMS: 86400000
ShardingTaskExecutorPoolMinSize: 100
ShardingTaskExecutorPoolMaxConnecting: 25
Backup & Disaster
Recovery ● Methods
● Topologies
www.objectrocket.com
163
mongodump
www.objectrocket.com
164
This utility is provided as part of the MongoDB binaries that creates a binary backup of
your database(s) or collection(s).
● Preserves data integrity when compared to mongoexport for all BSON types
● Recommended for stand-a-lone mongod and replica sets
○ It does work with sharded clusters but be cautious of cluster and data
consistency
● When dumping a database (--database) or collection (--collection) you have the
option of passing a query (--query) and a read preference (--readPreference) for
more granular control
● Because mongodump can be time consuming --oplog is recommended so the
operations for the duration of the dump are also captured
● To restore the output from mongodump you use mongorestore, not mongoimport
mongorestore
www.objectrocket.com
165
This utility is provided as part of the MongoDB binaries that restores a binary backup of
your database(s) or collection(s).
● Preserves data integrity when compared to mongoimport for all BSON types
● When restoring a database (--database) or collection (--collection) you have the
option of removing (--drop) collections contained in the backup and the option to
replay the oplog events (--oplogReplay) to a point in time
● Be mindful of destination of the restore, restoring (i.e. inserting) does add
additional work to the already existing workload in additional to the number of
collections being restored in parallel (--numParallelCollections)
● MongoDB will also restore indexes in the foreground (blocking), indexes can be
created in advance or skipped (--noIndexRestore) depending on the scenario
Data Directory Backup
www.objectrocket.com
166
While a mongod process is stopped a filesystem copy or snapshot can be performed to
make a consistent copy of the replica set member.
● Replica set required to prevent interruption
● Works for both WiredTiger and MMAPv1 engines
● Optionally you can use hidden secondary with no votes to perform this process to
prevent affectioning quorum
● For MMAPv1 this will copy fragmentation which increases the size of the backup
Alternatively fsyncLock() can be used to flush all pending writes and lock the mongod
process. At this time a file system copy can be performed, after the copy has completed
fsyncUnlock() can be used to return to normal operation.
Percona Hot Backup
www.objectrocket.com
167
For Percona Server running WiredTiger or RocksDB backups can be taken using an
administrative backup command.
● Can be executed on a mongod as a non-blocking operation
● Creates a backup of an the entire dbPath
○ Similar to data directory backup extra storage capacity required
> use admin
switched to db admin
> db.runCommand({createBackup: 1, backupDir: "/tmp/backup"})
{ "ok" : 1 }
Replicated DR
www.objectrocket.com
168
s1 s2
s1 s2
DC or AZ
Interconnect
Preferred Zone
Secondary Zone
● Primary zones replicas
configured with a higher
priority and more votes for
election purposes
● All secondary zones
replica members are
secondaries until
secondary zone is utilized
● An rs.reconfig() is required
to remove a preferred
zone member and level
votes to elect a primary in
a failover event
www.objectrocket.com
169
Lab 05
Lab 06
https://goo.gl/ddaVdS
Zone Management and
Backup / Restore
Lab 05. Navigate to /Percona2017/Lab05/exercises
Lab 06. Navigate to /Percona2017/Lab06/exercises
Use Cases
● Social Application
● Aggregation
● Caching
● Retail
● Time Series
www.objectrocket.com
170
Use Cases - Caching
www.objectrocket.com
171
Use Case: A url/content cache
Schema example:
{
_id:ObjectId,
url:<string>
response_body:<string>
ts:<timestamp>
}
Operations:
- Insert urls into cache
- Invalidate cache (for example TTL index)
- Read from cache (Read the latest url entry based on ts)
Use Cases - Caching
www.objectrocket.com
172
Shard key Candidates:
{url:1}
- Advantages: Scale Reads Writes
- Disadvantages: Key length, may create empty chunks
{url:hashed}
- Advantages: Scale Reads Writes
- Disadvantages: CPU cycles to convert url to hashed, orderBy is not supported, may create
empty chunks
Hash the {url:1} on the application
- Advantages: Scale Reads/Writes, One index less
- Disadvantages: Num of initial chunks not supported, may create empty chunks
Use Cases – Social app
www.objectrocket.com
173
Use Case: A social app , a users “interact” with other users
Schema example:
{
_id:ObjectId,
toUser :<string> or <ObjectId> or <number>
fromUser :<string> or <ObjectId> or <number>
interaction:<string>
ts:<timestamp>
}
Operations:
- Insert interaction
- Remove interaction
- Read interaction (fromUser and toUser)
Use Cases – Social app
www.objectrocket.com
174
Shard key Candidates:
{toUser:1} or {fromUser:1}
- Advantages: Scale a portion of Reads
- Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks
{toUser:1, fromUser:1}
- Advantages: Scale a portion of Reads, Improved Write distribution but not perfect
- Disadvantages: Frequent splits & moves, Empty Chunks
{fromUser:1, toUser:1}
- Advantages: Scale a portion of Reads, Improved Write distribution but not perfect
- Disadvantages: Frequent splits & moves, Empty Chunks
Use Cases – Social app
www.objectrocket.com
175
Shard key Candidates(cont.):
{toUser+fromUser:”hashed”}
- Advantages: Scale writes, Avoid Empty and Jumbo Chunks
- Disadvantages: All queries are Scatter-Gather but one query type
- Similar to _id:”hashed”
Perfect example that there is no perfect shard key:
- Application behavior will point us to the shard key
- Pre-define user_ids and split might not be effective
Use Cases – Email app
www.objectrocket.com
176
Use Case: Provide inbox access to users
Schema example:
{
_id:ObjectId,
recipient :<string> or <ObjectId> or <number>
sender :<string>
title:<string>
body:<string>
ts:<timestamp>
}
Operations:
- Read emails
- Insert emails
- Delete emails
Use Cases – Email app
www.objectrocket.com
177
Shard key Candidates:
{recipient:1}
- Advantages: All Reads will access one shard
- Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks
{recipient:1, sender:1}
- Advantages: Some Reads will access one shard
- Disadvantages: Write Hotspots, Empty Chunks
Introduce buckets (an auto-increment number per user)
{recipient:1, bucket:1}
- Advantages: Reads will access one shard for every iteration
- Disadvantages: Write Hotspots, Empty Chunks
Use Cases – Time-series data
www.objectrocket.com
178
Use Case: A server-log collector
Schema example:
{
_id:ObjectId,
host :<string> or <ObjectId> or <number>
type:<string> or <number>
message :<string>
ts:<timestamp>
}
Operations:
- Insert logs
- Archive logs
- Read logs (host,type,<range date>)
Use Cases – Time-series data
www.objectrocket.com
179
Shard key Candidates:
{host:1}
- Advantages: Targets Reads
- Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks
{host:hashed}
- Advantages: Targets Reads
- Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks
{host:1,type:1,ts:1}
- Advantages: Decent cardinality, may Target Reads
- Disadvantages: Monotonically increased per host,type, Write Hotspots , Empty Chunks
Use Cases – Time-series data
www.objectrocket.com
180
Shard key Candidates (cont.):
{host:1,type:1, date_bucket:1}
- Advantages: Decent cardinality, Read targeting
- Disadvantages: Monotonically increased per host, Write Hotspots , Empty Chunks
Pre-create and distribute chunks easily
Merge host, type, date_bucket on the application layer and use hashed key - Queries {shard_key,
ts:<range>}
Use Cases – Retail app
www.objectrocket.com
181
Use Case: A product catalog
{
_id:ObjectId,
product :<string>
description:<string>
category:<string>
dimensions:<subdocument>
weight: <float>
price:<float>
rating:<float>
shipping :<string>
released_date:<ISODate>
related_products:<array>
}
Use Cases – Retail app
www.objectrocket.com
182
Operations:
- Insert Products
- Update Products
- Delete Products
- Read from Products - various query patterns
Shard key Candidates:
{category:1}
- Advantages: Read targeting for most of the queries
- Disadvantages: Jumbo Chunks, Read hotspots
Use Cases – Retail app
www.objectrocket.com
183
Shard key Candidates (cont.):
{category:1, price:1}
- Advantages: Read targeting for most of the queries
- Disadvantages: Jumbo Chunks, Read hotspots
Tag documents with price ranges 0-100,101-200 etc
Use a shard key of the type {category:1,price_range:1,release_date:1}
Try to involve as much of the query criteria on your shard key, but careful of Read hotspots
For example rating as users tend to search for 5-stars
Questions?
www.objectrocket.com
184
or…
www.objectrocket.com
185
We’re Hiring!
Looking to join a dynamic & innovative
team?
https://www.objectrocket.com/careers
Reach out to us directly at careers@objectrocket.com
Thank you!
Address:
100 Congress Ave
Suite 400
Austin, TX 78701
Support:
1-800-961-4454
Sales:
1-888-440-3242
www.objectrocket.com
186

Contenu connexe

Tendances

Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBAntonios Giannopoulos
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External AuthenticationJason Terpko
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingJason Terpko
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best PracticesJason Terpko
 
MongoDB – Sharded cluster tutorial - Percona Europe 2017
MongoDB – Sharded cluster tutorial - Percona Europe 2017MongoDB – Sharded cluster tutorial - Percona Europe 2017
MongoDB – Sharded cluster tutorial - Percona Europe 2017Antonios Giannopoulos
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com琛琳 饶
 
StackExchange.redis
StackExchange.redisStackExchange.redis
StackExchange.redisLarry Nung
 
Redis modules 101
Redis modules 101Redis modules 101
Redis modules 101Dvir Volk
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with ModulesItamar Haber
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Marco Pas
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오PgDay.Seoul
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streamingRamūnas Urbonas
 

Tendances (20)

Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and Merging
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
MongoDB – Sharded cluster tutorial - Percona Europe 2017
MongoDB – Sharded cluster tutorial - Percona Europe 2017MongoDB – Sharded cluster tutorial - Percona Europe 2017
MongoDB – Sharded cluster tutorial - Percona Europe 2017
 
Unqlite
UnqliteUnqlite
Unqlite
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 
StackExchange.redis
StackExchange.redisStackExchange.redis
StackExchange.redis
 
Redis modules 101
Redis modules 101Redis modules 101
Redis modules 101
 
Tag based sharding presentation
Tag based sharding presentationTag based sharding presentation
Tag based sharding presentation
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules
 
Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)Collect distributed application logging using fluentd (EFK stack)
Collect distributed application logging using fluentd (EFK stack)
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
 

Similaire à Sharded cluster tutorial

Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialAntonios Giannopoulos
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSudheer Kondla
 
Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Alex Cercel
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Santosh Kangane
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context ConstraintsAlessandro Arrichiello
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardwayDave Pitts
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDBKishor Parkhe
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsiChanaka Lasantha
 
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 PresentationSreenivas Makam
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionBen Hall
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica setSudheer Kondla
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Bertrand Delacretaz
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Praguetomasbart
 
Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfsChanaka Lasantha
 
TrinityCore server install guide
TrinityCore server install guideTrinityCore server install guide
TrinityCore server install guideSeungmin Shin
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefMatt Ray
 

Similaire à Sharded cluster tutorial (20)

Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorial
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018
 
Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0Oracle 11g R2 RAC setup on rhel 5.0
Oracle 11g R2 RAC setup on rhel 5.0
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 
MongoDB Shard Cluster
MongoDB Shard ClusterMongoDB Shard Cluster
MongoDB Shard Cluster
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDB
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsi
 
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 Presentation
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and Production
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica set
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?
 
Docker.io
Docker.ioDocker.io
Docker.io
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfs
 
TrinityCore server install guide
TrinityCore server install guideTrinityCore server install guide
TrinityCore server install guide
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Rac on NFS
Rac on NFSRac on NFS
Rac on NFS
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
 

Dernier

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Sharded cluster tutorial

  • 1. MongoDB – Sharded Cluster Tutorial Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1
  • 3. Overview • Cluster Components • Shard Key • Shard Key Selection • Balancing • Chunk Management • Troubleshooting • Zones • Backups & Disaster Recovery • Use Cases • Labs www.objectrocket.com 3
  • 4. www.objectrocket.com 4 Labs 1. Unzip the provided .vdi file 2. Install and or Open VirtualBox 3. Select New 4. Enter A Name 5. Select Type: Linux 6. Select Version: Red Hat (64-bit) 7. Set Memory to 4096 (if possible) 8. Select "Use an existing ... disk file", select the provided .vdi file. 9. Select Create 10. Select Start 11. CentOS will boot and auto-login 12. Navigate to /Percona2017/Lab01/exercises for the first lab. https://goo.gl/ddaVdS
  • 5. Key Terms www.objectrocket.com 5 Replication: Replica-set: A group of mongod processes that maintain the same data set Primary: Endpoint for writes Secondary: Pull and Replicates changes from Primary (oplog) Election: The process that changes the Primary on a replica-set Partitioning: Partition: A logical separation of a data set (may also be physical) on a server Partition key: A field (or a combination of fields) used to distinct partition Sharding: Similar to partitioning with the difference that partitions(chunks) may located on different servers (shards) Sharding: A field (or a combination of fields) used to distinct chunks
  • 7. Sharded Cluster Deployment ● Deploying ○ Shards ○ Configuration Servers ○ Mongos ● Major / Minor Upgrade ● Major / Minor Downgrade www.objectrocket.com 7
  • 8. Deploying - Recommended setup www.objectrocket.com 8 Minimum requirements: 1 mongos, 1 config server , 1 shard (1 mongod process) Recommended setup: 2+ mongos, 3 config servers, 1 shard (3 node replica-set) Ensure connectivity between machines involved in the sharded cluster Ensure that each process has a DNS name assigned - Not a good idea to use IPs If you are running a firewall you must allow traffic to and from mongoDB instances Example using iptables: iptables -A INPUT -s <ip-address> -p tcp --destination-port <port> -m state --state NEW,ESTABLISHED -j ACCEPT iptables -A OUTPUT -d <ip-address> -p tcp --source-port <port> -m state --state ESTABLISHED -j ACCEPT
  • 9. Deploying - Config Servers www.objectrocket.com 9 Create a keyfile: ● keyfile must be between 6 and 1024 characters long ● Generate a 1024 characters key openssl rand -base64 756 > <path-to-keyfile> ● Secure the keyfile chmod 400 <path-to-keyfile> ● Copy the keyfile to every machine involves to the sharded cluster Create the config servers: - Before 3.2 config servers could only run on SCCC topology - On 3.2 config servers could run on either SCCC or CSRS - On 3.4 only CSRS topology supported - CSRS mode requires WiredTiger as the underlying storage engine - SCCC mode may run on either WiredTiger or MMAPv1 Minimum configuration for CSRS mongod --keyFile <path-to-keyfile> --configsvr --replSet <setname> --dbpath <path>
  • 10. Deploying - Config Servers www.objectrocket.com 10 Baseline configuration for config server (CSRS) net: port: '<port>' processManagement: fork: true security: authorization: enabled keyFile: <keyfile location> sharding: clusterRole: configsvr replication: replSetName: <replicaset name> storage: dbPath: <data directory> systemLog: destination: syslog
  • 11. Deploying - Config Servers www.objectrocket.com 11 Login on one of the config servers using the localhost exception. Initiate the replica set: rs.initiate( { _id: "<replSetName>", configsvr: true, members: [ { _id : 0, host : "<host:port>" }, { _id : 1, host : "<host:port>" }, { _id : 2, host : "<host:port>" } ]}) Check the status of the replica-set using rs.status()
  • 12. Deploying - Shards www.objectrocket.com 12 Create shard(s): - For production environments use a replica set with at least three members - For test environments replication is not mandatory - Start three (or more) mongod process with the same keyfile and replSetName - ‘sharding.clusterRole’: shardsrv is mandatory in MongoDB 3.4 - Note: Default port for mongod instances with the shardsvr role is 27018 Minimum configuration for shard mongod --keyFile <path-to-keyfile> --shardsvr --replSet <replSetname> --dbpath <path> - Default storage engine is WiredTiger - On a production environment you have to populate more configuration variables , like oplog size
  • 13. Deploying - Shards www.objectrocket.com 13 Baseline configuration for shard net: port: '<port>' processManagement: fork: true security: authorization: enabled keyFile: <keyfile location> sharding: clusterRole: shardsrv replication: replSetName: <replicaset name> storage: dbPath: <data directory> systemLog: destination: syslog
  • 14. Deploying - Shards www.objectrocket.com 14 Login on one of the shard members using the localhost exception. Initiate the replica set: rs.initiate( { _id: "<replSetName>", members: [ { _id : 0, host : "<host:port>" }, { _id : 1, host : "<host:port>" }, { _id : 2, host : "<host:port>" } ]}) Check the status of the replica-set using rs.status() Create local user administrator (shard scope): { role: "userAdminAnyDatabase", db: "admin" } Create local cluster administrator (shard scope): roles: { "role" : "clusterAdmin", "db" : "admin" } Be greedy with "role" [ { "resource" : { "anyResource" : true }, "actions" : [ "anyAction" ] }]
  • 15. Deploying - Mongos www.objectrocket.com 15 Deploy mongos: - For production environments use more than one mongos - For test environments a single mongos is fine - Start three (or more) mongos process with the same keyfile Minimum configuration for mongos mongos --keyFile <path-to-keyfile> --config <path-to-config> net: port: '50001' processManagement: fork: true security: keyFile: <path-to-keyfile> sharding: configDB: <path-to-config> systemLog: destination: syslog
  • 16. Deploying - Mongos www.objectrocket.com 16 Login on one of the mongos using the localhost exception. - Create user administrator (shard scope): { role: "userAdminAnyDatabase", db: "admin" } - Create cluster administrator (shard scope): roles: { "role" : "clusterAdmin", "db" : "admin" } Be greedy with "role" [ { "resource" : { "anyResource" : true }, "actions" : [ "anyAction" ] }] What about config server user creation? - All users created against the mongos are saved on the config server’s admin database - The same users may be used to login directly on the config servers - In general (with few exceptions), config database should only be accessed through the mongos
  • 17. Deploying - Sharded Cluster www.objectrocket.com 17 Login on one of the mongos using the cluster administrator - sh.status() prints the status of the cluster - At this point shards: should be empty - Check connectivity to your shards Add a shard to the sharded cluster: - sh.addShard("<replSetName>/<host:port>") - You don’t have to define all replica set members - sh.status() should now display the newly added shard - Hidden replica-set members are not apear on the sh.status() output You are now ready to add databases and shard collections!!!
  • 18. Deploying - Recommended Setup www.objectrocket.com 18 - Use Replica Set for shards - Use at least three data nodes for the Replica Sets - Use three config servers on SCCC topology - Use more than one mongos - Use DNS names instead of IP - Use a consistent network topology - Make sure you have redundant NTP servers - Always use authentication - Always use authorization and give only the necessary privileges to users
  • 19. Upgrading - Categories www.objectrocket.com 19 Upgrade minor versions - Without changing config servers topology - For example: 3.2.15 to 3.2.16 or 3.4.6 to 3.4.7 - Change config servers topology from SCCC to CSRS - For example: 3.2.15 to 3.2.16 Upgrade major versions - Without changing config servers topology - For example: 3.0.14 to 3.2.12 - Change config servers topology from SCCC to CSRS - For example: 3.2.12 to 3.4.2
  • 20. Upgrading – Best practices www.objectrocket.com 20 Best Practices (not mandatory) - Upgrades heavily involve binary swaps - Keep all mongoDB releases under /opt - Create a symbolic link of /opt/mongodb point on your desired version - Binary swap may be implemented by changing the symbolic link > ll lrwxrwxrwx. 1 mongod mongod 34 Mar 24 14:16 mongodb -> mongodb-linux-x86_64-rhel70-3.4.1/ drwxr-xr-x. 3 mongod mongod 4096 Mar 24 12:06 mongodb-linux-x86_64-rhel70-3.4.1 drwxr-xr-x. 3 mongod mongod 4096 Mar 21 14:12 mongodb-linux-x86_64-rhel70-3.4.2 > unlink mongodb;ln -s mongodb-linux-x86_64-rhel70-3.4.2/ mongodb >ll lrwxrwxrwx. 1 root root 34 Mar 24 14:19 mongodb -> mongodb-linux-x86_64-rhel70-3.4.2/ drwxr-xr-x. 3 root root 4096 Mar 24 12:06 mongodb-linux-x86_64-rhel70-3.4.1 drwxr-xr-x. 3 root root 4096 Mar 21 14:12 mongodb-linux-x86_64-rhel70-3.4.2 >echo 'pathmunge /opt/mongodb/bin' > /etc/profile.d/mongo.sh; chmod +x /etc/profile.d/mongo.sh
  • 21. Upgrading - Checklist www.objectrocket.com 21 - Configuration Options changes: For example, sharding.chunkSize configuration file setting and --chunkSize command-line option removed from the mongos - Deprecated Operations: For example, Aggregate command without cursor deprecated in 3.4 - Topology Changes: For example, Removal of Support for SCCC Config Servers - Connectivity changes: For example, Version 3.4 mongos instances cannot connect to earlier versions of mongod instances - Tool removals: For example, In MongoDB 3.4, mongosniff is replaced by mongoreplay - Authentication and User management changes: For example, dbAdminAnyDatabase can't be applied to local and config databases - Driver Compatibility Changes: Please refer to the compatibility matrix for your driver version and language version.
  • 22. Upgrading – Minor Versions www.objectrocket.com 22 Upgrade minor versions - Without changing config servers topology Upgrade 3.2.x to 3.2.y or 3.4.x to 3.4.y 1) Stop the balancer sh.stopBalancer() and check that is not running 2) Upgrade the shards. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start 3) Upgrade the shards. Perform a stepdown and upgrade the ex-Primaries by stop;replace binaries;start 4) Upgrade the config servers. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start 5) Upgrade the config servers. Perform a stepdown and upgrade the ex-Primary by stop;replace binaries;start 6) Upgrade the mongos in a rolling fashion by stop;replace binaries;start Rollback 3.2.y to 3.2.x or 3.4.y to 3.4.x 1) Stop the balancer sh.stopBalancer() and check that is not running 2) Downgrade the mongos in a rolling fashion by stop;replace binaries;start 3) Downgrade the config servers. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start 4) Downgrade the config servers. Perform a stepdown and downgrade the ex-Primary by stop;replace binaries;start 5) Downgrade the shards. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start 6) Downgrade the shards. Perform a stepdown and Downgrade the ex-Primaries by stop;replace binaries;start
  • 23. Minor Versions Upgrade (3.4.x to 3.4.y) www.objectrocket.com 23 s1 s2 3. Upgrade mongos 2. Upgrade config servers 1. Upgrade shards Stop the balancer Start the balancer
  • 24. Minor Versions Downgrade (3.4.x to 3.4.y) www.objectrocket.com 24 s1 s2 1. Downgrade mongos 2. Downgrade config servers 3. Downgrade shards Stop the balancer Start the balancer
  • 25. Upgrading - Major Versions www.objectrocket.com 25 Upgrade major versions - Without changing config servers topology Upgrade 3.2.x to 3.4.y 1) Stop the balancer sh.stopBalancer() and check that is not running 2) Upgrade the config servers. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start 3) Upgrade the config servers. Perform a stepdown and upgrade the ex-Primary by stop;replace binaries;start 4) Upgrade the shards. Upgrade the secondaries in a rolling fashion by stop;replace binaries;start 5) Upgrade the shards. Perform a stepdown and upgrade the ex-Primaries by stop;replace binaries;start 6) Upgrade the mongos in a rolling fashion by stop;replace binaries;start Enable backwards-incompatible 3.4 features db.adminCommand( { setFeatureCompatibilityVersion: "3.4" } ) It is recommended to wait for a small period of time before enable the backwards-incompatible feautures. Includes: - Views - Collation option on collection and indexes - Data type decimal - New version indexes (v:2 indexes)
  • 26. Major Versions Upgrade (3.2.x to 3.4.y) www.objectrocket.com 26 s1 s2 3. Upgrade mongos 1. Upgrade config servers 2. Upgrade shards Stop the balancer Start the balancer
  • 27. Downgrade - Major Versions www.objectrocket.com 27 Upgrade major versions - Without changing config servers topology Rollback 3.4.y to 3.2.x 1) Stop the balancer sh.stopBalancer() and check that is not running 2) Downgrade the mongos in a rolling fashion by stop;replace binaries;start 3) Downgrade the shards. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start 4) Downgrade the shards. Perform a stepdown and Downgrade the ex-Primaries by stop;replace binaries;start 5) Downgrade the config servers. Downgrade the secondaries in a rolling fashion by stop;replace binaries;start 6) Downgrade the config servers. Perform a stepdown and downgrade the ex-Primary by stop;replace binaries;start If backwards-incompatible 3.4 features are enabled before downgrade: - db.adminCommand({setFeatureCompatibilityVersion: "3.2"}) - Remove Views - Remove Collation Option from Collections and Indexes - Convert Data of Type Decimal - Downgrade Index Versions
  • 28. Major Versions Downgrade (3.4.x to 3.2.y) www.objectrocket.com 28 s1 s2 1. Downgrade mongos 3. Downgrade config servers 2. Downgrade shards Stop the balancer Start the balancer
  • 29. Downgrading - Major Versions www.objectrocket.com 29 Rollback 3.4.y to 3.2.x Remove Views Find views on a sharded cluster: db.adminCommand("listDatabases").databases.forEach(function(d){ let mdb = db.getSiblingDB(d.name); mdb.getCollectionInfos({type: "view"}).forEach(function(c){ print(mdb[c.name]); }); }); In each database that contains views, drop the system.views collection to drop all views in that database.
  • 30. Downgrading - Major Versions www.objectrocket.com 30 Rollback 3.4.y to 3.2.x Remove Collation Option from Collections and Indexes Find collections with non-simple collations db.adminCommand("listDatabases").databases.forEach(function(d){ let mdb = db.getSiblingDB(d.name); mdb.getCollectionInfos( { "options.collation": { $exists: true } } ).forEach(function(c){ print(mdb[c.name]);});}); Find indexes with collation specification db.adminCommand("listDatabases").databases.forEach(function(d){ let mdb = db.getSiblingDB(d.name); mdb.getCollectionInfos().forEach(function(c){ let currentCollection = mdb.getCollection(c.name); currentCollection.getIndexes().forEach(function(i){ if (i.collation){ printjson(i);}});});});
  • 31. Downgrading - Major Versions www.objectrocket.com 31 Rollback 3.4.y to 3.2.x Remove Collation Option from Collections and Indexes Indexes: Drop and recreate the indexes Collection: Rewrite the entire collection: - Stream the collection using $out - Stream the collection using dump/restore - Change the definition on one of the secondaries
  • 32. Downgrading - Major Versions www.objectrocket.com 32 Rollback 3.4.y to 3.2.x Convert Data of Type Decimal Diagnostic utility: db.collection.validate(full) Could be impactful as it performs a full scan. "errmsg" : "Index version does not support NumberDecimal" How you are going to convert the Type Decimal depends on your application. - Change it to string - Change to a two fields format (using a multiplier) - Change to a two fields format (store each part of the decimal on different fields)
  • 33. Downgrading - Major Versions www.objectrocket.com 33 Rollback 3.4.y to 3.2.x Find version 2 indexes db.adminCommand("listDatabases").databases.forEach(function(d){ let mdb = db.getSiblingDB(d.name); mdb.getCollectionInfos().forEach(function(c){ let currentCollection = mdb.getCollection(c.name); currentCollection.getIndexes().forEach(function(i){ if (i.v === 2){ printjson(i);}});});}); Must be performed on both shards and config servers Re-index the collection after change the setFeatureCompatibilityVersion to 3.2 - _id index in the foreground - Must be executed on both Primary and Secondaries
  • 34. Upgrading - Major Versions www.objectrocket.com 34 Upgrade major versions - Change the config servers topology - 3.2.x to 3.4.y 1) Use MongoDB version 3.2.4 or higher 2) Disable the balancer 3) Connect a mongo shell to the first config server listed in the configDB setting of the mongos and run rs.initiate() rs.initiate( { _id: "csReplSet", configsvr: true, version: 1, members: [ { _id: 0, host: "<host>:<port>" } ] } ) 4) Restart this config server as a single member replica set with: mongod --configsvr --replSet csReplSet --configsvrMode=sccc --storageEngine <storageEngine> --port <port> -- dbpath <path> or the equivalent config file settings
  • 35. Upgrading - Major Versions www.objectrocket.com 35 Upgrade major versions - Change the config servers topology - 3.2.x to 3.4.y 5) Start the new mongod instances to add to the replica set: - must use the WiredTiger storage engine - Do not add existing config servers to the replica set - Use new dbpaths for the new instances - If the config server is using MMAPv1, start 3 new mongod instances - If the config server is using WiredTiger, start 2 new mongod instances 6) Connected to the replica set config server and add the new mongod instances as non-voting, priority 0 members: - rs.add( { host: <host:port>, priority: 0, votes: 0 } ) - Wait for the initial sync to complete (SECONDARY state) 7) Shut down one of the other non-replica set config servers (2nd or 3rd) 8) Reconfigure the replica set to allow all members to vote and have default priority of 1
  • 36. Upgrading - Major Versions www.objectrocket.com 36 Upgrade major versions - Change the config servers topology - 3.2.x to 3.4.y 9) Step down the first config server and restart without the sccc flag 10) Restart mongos instances with updated --configdb or sharding.configDB setting 11) Verify that the restarted mongos instances are aware of the protocol change 12) Cleanup: - Remove the first config server using rs.remove() - Shutdown 2nd and 3rd config servers Next steps are similar to slide Upgrade major versions - Without changing config servers topology
  • 37. Change from SCCC to CSRS www.objectrocket.com 37 1) rs.initiate 2) Restart with --replSet 3) Start 3 new cfgsrv 4) Add new cfgsrv to RS 5) Shutdown a cfgsrv 6) rs.reconfig 7) rs.stepdown and restart 8) Restart mongos 9) Cleanup
  • 38. Downgrading - Major Versions www.objectrocket.com 38 Downgrade major versions - Change the config servers topology (3.2.x to 3.0.y) 1) Disable the balancer 2) Remove or replace incompatible indexes: - Partial indexes - Text indexes version 3 - Geo Indexes version 3 3) Check minOpTimeUpdaters value on every shard - Must be zero - If it's different to zero without any active migration ongoing, a stepdown needed 4) Keep only two config servers secondaries and set their votes and priority equal to zero , using rs.reconfig() 5) Stepdown config server primary db.adminCommand( { replSetStepDown: 360, secondaryCatchUpPeriodSecs: 300 }) 6) Stop the world - shut-down all mongos/shards/config servers
  • 39. Downgrading - Major Versions www.objectrocket.com 39 Downgrade major versions - Change the config servers topology(3.2.x to 3.0.y) 7) Restart each config server as standalone 8) Start all shards and change the protocolVersion=0 9) Downgrade the mongos to 3.0 and change the configsrv to SCCC 10) Downgrade the config servers to 3.0 11) Downgrade each shard - one at a time - Remove the minOpTimeRecovery document from the admin.system.version - Downgrade the secondaries and then issue a stepdown - Downgrade former primaries
  • 40. Change from CSRS to SCCC www.objectrocket.com 40 1)Remove minOpTimeUpdaters 2) rs.reconfig() set priority=0 3) rs.stepdown() 4) Stop the world 5) Start cfgsrv as standalone 6) Start all shards 7) Start the mongos with SCCC
  • 41. www.objectrocket.com 41 Lab 01 1. Unzip the provided .vdi file 2. Install and or Open VirtualBox 3. Select New 4. Enter A Name 5. Select Type: Linux 6. Select Version: Red Hat (64-bit) 7. Set Memory to 4096 (if possible) 8. Select "Use an existing ... disk file", select the provided .vdi file. 9. Select Create 10. Select Start 11. CentOS will boot and auto-login 12. Navigate to /Percona2017/Lab01/exercises for the first lab. https://goo.gl/ddaVdS Setup A Sharded Cluster
  • 42. Shard Key ● What is it? ● Limitations ● Chunks ● Metadata ○ Change Log ○ Collections ○ Chunks www.objectrocket.com 42
  • 43. Shard Key www.objectrocket.com 43 This is how MongoDB distributes and finds documents for a sharded collection. Considerations of a good shard key: • Fields exist on every document • Indexed field or fields for fast lookup • High cardinality for good distribution of data across the cluster • Random to allow for increased throughput • Usable by the application for targeted operations
  • 44. Shard Key Limitations www.objectrocket.com 44 There are requirements when defining a shard key that can affect both the cluster and application. • Key is immutable • Value is also immutable • For collections containing data an index must be present • Prefix of a compound index is usable • Ascending order is required • Update and findAndModify operations must contain shard key • Unique constraints must be maintained by shard key or prefix of shard key • Hashed indexes can not enforce uniqueness, therefore are not allowed • A non-hashed indexed can be added with the unique option • A shard key cannot contain special index types (i.e. text)
  • 45. Shard Key Limitations www.objectrocket.com 45 Shard key must not exceed the 512 bytes The following script will reveal documents with long shard keys: db.<collection>.find({},{<shard_key>:1}).forEach(function(shardkey){ size = Object.bsonsize(shardkey) ; if (size>512){print(shardkey._id)} }) Mongo will allow you to shard the collection even if you have existing shard keys over 512 bytes buton the next insert with a shard key > 512 bytes: "code" : 13334,"errmsg" : "shard keys must be less than 512 bytes, but key <shard key> is ... bytes"
  • 46. Shard Key Limitations www.objectrocket.com 46 Shard Key Index Type A shard key index can be an ascending index on the shard key, a compound index that start with the shard key and specify ascending order for the shard key, or a hashed index. A shard key index cannot be an index that specifies a multi-key index, a text index or a geospatial index on the shard key fields. If you try to shard with a -1 index you will get an error: “Unsupported shard key pattern. Pattern must either be a single hashed field, or a list of ascending fields” If you try to shard with “text”, “multi-key” or “geo” you will get an error: “couldn't find valid index for shard key”
  • 47. Shard Key Limitations www.objectrocket.com 47 Unique Indexes Sharded collections may support up to one unique index. The shard key MUST be a prefix of the unique index If you attempt to shard a collection with more than one unique indexes or using a field different than the unique index an error will be produced: ”Can't shard collection <col name> with unique index on <field1:1> and proposed shard key <field2:2>. Uniqueness can't be maintained unless shard key is a prefix" To maintain more than one unique indexes use a proxy collection. Client generated _id is unique by design if you are using custom _id you must preserve uniqueness from the application tier.
  • 48. Shard key Limitations www.objectrocket.com 48 Field(s) must exist on every document If you try to shard a collection with null on shard key an exception will be produced: "found missing value in key { : null } for doc: { _id: <value>}" On compound shard keys none of fields is allowed to have null values. A handy script to identify NULL values is the following. You need to execute it for each of the shard key fields: db.<collection_name>.find({<shard_key_element>:{$exists:false}}) A potential solution is to replace NULL with a dummy value that your application will read as NULL. Be careful because “dummy NULL” might create a hotspot.
  • 49. Shard key Limitations www.objectrocket.com 49 Updates and FindAndModify must use the shard key Updates and FindAndmodify must use the shard key on the query predicates If an update of FAM executed on a field different than the shard key or _id the following error is been produced "A single update on a sharded collection must contain an exact match on _id (and have the collection default collation) or contain the shard key (and have the simple collation). Update request: { q: { <field1> }, u: { $set: { <field2> } }, multi: false, upsert: false }, shard key pattern: { <shard_key>: 1 }” For update operations the workaround is to use the {multi:true} flag. For FindAndModify {multi:true} flag doesn’t exist. For upserts _id instead of <shard key> is not applicable.
  • 50. Shard key Limitations www.objectrocket.com 50 Sharding Existing Collection Data Size You can’t shard collections that their size violate the maxCollectionSize as defined below: maxSplits = 16777216 (bytes) / <average size of shard key values in bytes> maxCollectionSize (MB) = maxSplits * (chunkSize / 2) Maximum Number of Documents Per Chunk to Migrate MongoDB cannot move a chunk if the number of documents in the chunk exceeds: • 250000 documents or 1.3 times the result of dividing the configured chunk size by the avgObjSize For example, with an avgObjSize of 512 bytes and chunk size of 64MB a chunk is considered Jumbo with 170394 documents.
  • 51. Shard key Limitations www.objectrocket.com 51 Operation not allowed on a sharded collection • group collection method Deprecated since version 3.4 - Use mapReduce or aggregate instead • db.eval() Deprecated since version 3.0 • $where does not permit references to the db object from the $where function. This is uncommon in un-sharded collections. • $isolated update modifier • $snapshot queries • The geoSearch command
  • 52. Chunks www.objectrocket.com 52 The purpose of shard key is to create chunks. The logic partition your collection is divided into and how data is distributed across the cluster. • Maximum size is defined in config.settings • Default 64MB • Hardcoded maximum document count of 250,000 • Chunk map is stored in config.chunks • Continuous range from MinKey to MaxKey • Chunk map is cached at both the mongos and mongod • Query Routing • Sharding Filter • Chunks distributed by the Balancer • Using moveChunk • Up to maxSize
  • 53. Chunk www.objectrocket.com 53 { "_id" : "mydb.mycoll-uuid_"00005cf6-1217-4414-935b-cf1bde09cc77"", "lastmod" : Timestamp(1, 5), "lastmodEpoch" : ObjectId("570733145f2bf94777a62155"), "ns" : "mydb.mycoll", "min" : { "uuid" : "00005cf6-1217-4414-935b-cf1bde09cc77" }, "max" : { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" }, "shard" : ”s1" } config.chunks
  • 54. Additional Sharding Metadata www.objectrocket.com 54 Collections • Historical Sharding and Balancing Activity • config.changelog • config.actionlog • Additional Balancing Settings • config.settings • All databases sharded (and non-sharded) • config.databases • Balancer Activity • config.locks • Shards • config.shards
  • 55. Shard Key Selection ● Profiling ● Cardinality ● Throughput ○ Targeted Operations ○ Broadcast Operations ● Anti-Patterns ● Data Distribution www.objectrocket.com 55
  • 56. Shard key selection www.objectrocket.com 56 Famous quotes about sharding: “Sharding is an art” “There is no such thing as the perfect shard key” “Friends don't let friends use sharding”
  • 57. Shard key selection www.objectrocket.com 57 Profiling Identify Patterns Constraints/Best practices Code changes Apply shard key(s)
  • 58. Shard key selection www.objectrocket.com 58 Profiling Identify Patterns Constraints/Best practices Code changes Apply shard key(s)
  • 59. Shard key selection–Profiling www.objectrocket.com 59 Profiling will help you identify your workload. Enable statement profiling on level 2 (collects profiling data for all database operations) db.getSiblingDB(<database>).setProfilingLevel(2) To collect a representative sample you might have to increase profiler size db.getSiblingDB(<database>).setProfilingLevel(0) db.getSiblingDB(<database>).system.profile.drop() db.getSiblingDB(<database>).createCollection( "system.profile", { capped: true, size: <size>} ) db.getSiblingDB(<database>).setProfilingLevel(2) *size is in bytes
  • 60. Shard key selection–Profiling www.objectrocket.com 60 If you don’t have enough space you may either: - Periodically dump the profiler and restore to a different instance - Use a tailable cursor and save the output on a different instance Profiler adds overhead to the instance due to extra inserts When analyzing the output is recommended to: - Dump the profiler to another instance using a non-capped collection - Keep only the useful dataset - Create the necessary indexes
  • 61. Shard key selection–Profiling www.objectrocket.com 61 {op:"query" , ns:<db.col>} , reveals find operations Example: { "op" : "query", "ns" : "foo.foo", "query" : { "find" : "foo", "filter" : { "x" : 1 } ... {op:"insert" , ns:<db.col>} , reveals insert operations Example: { "op" : "insert", "ns" : "foo.foo", "query" : { "insert" : "foo", "documents" : [ { "_id" : ObjectId("58dce9730fe5025baa0e7dcd"), "x" : 1} ] ... {op:"remove" , ns:<db.col>} , reveals remove operations Example: {"op" : "remove", "ns" : "foo.foo", "query" : { "x" : 1} ... {op:"update" , ns:<db.col>} , reveals update operations Example: { "op" : "update", "ns" : "foo.foo", "query" : { "x" : 1 }, "updateobj" : { "$set" : { "y" : 2 } } ...
  • 62. Shard key selection–Profiling www.objectrocket.com 62 { "op" : "command", "ns" : <db.col>, "command.findAndModify" : <col>} reveals FAMs Example: { "op" : "command", "ns" : "foo.foo", "command" : { "findAndModify" : "foo", "query" : { "x" : "1" }, "sort" : { "y" : 1 }, "update" : { "$inc" : { "z" : 1 } } }, "updateobj" : { "$inc" : { "z" : 1 } } ... {"op" : "command", "ns" : <db.col>, "command.aggregate" : <col>}: reveals aggregations Example: { "op" : "command", "ns" : "foo.foo", "command" : { "aggregate" : "foo", "pipeline" : [ { "$match" : { "x" : 1} } ] ... { "op" : "command", "ns" : <db.col>, "command.count" : <col>} : reveals counts Example: { "op" : "command", "ns" : "foo.foo", "command" : { "count" : "foo", "query" : { "x" : 1} } ... More commands: mapreduce, distinct, geoNear are following the same pattern Be aware that profiler format may be different from major version to major version
  • 63. Shard key selection www.objectrocket.com 63 Profiling Identify Patterns Constraints/Best practices Code changes Apply shard key(s)
  • 64. Shard key selection–Patterns www.objectrocket.com 64 Identify the workload nature (type of statements) db.system.profile.aggregate([{$match:{ns:<db.col>}},{$group: {_id:"$op",number : {$sum:1}}},{$sort:{number:-1}}]) var cmdArray = ["aggregate", "count", "distinct", "group", "mapReduce", "geoNear", "geoSearch", "find", "insert", "update", "delete", "findAndModify", "getMore", "eval"]; cmdArray.forEach(function(cmd) { var c = "<col>"; var y = "command." + cmd; var z = '{"' + y + '": "' + c + '"}'; var obj = JSON.parse(z); var x = db.system.profile.find(obj).count(); if (x>0) { printjson(obj); print("Number of occurrences: "+x);} });
  • 65. Shard key selection–Patterns www.objectrocket.com 65 Identify statement patterns var tSummary = {} db.system.profile.find( { op:"query",ns : {$in : ['<ns>']}},{ns:1,"query.filter":1}).forEach( function(doc){ tKeys=[]; if ( doc.query.filter === undefined) { for (key in doc.query.filter){ tKeys.push(key) } } else{ for (key in doc.query.filter){ tKeys.push(key) } } sKeys= tKeys.join(',') if ( tSummary[sKeys] === undefined){ tSummary[sKeys] = 1 print("Found new pattern of : "+sKeys) print(tSummary[sKeys]) }else{ tSummary[sKeys] +=1 print("Incremented "+sKeys) print(tSummary[sKeys]) } print(sKeys) tSummary=tSummary } )
  • 66. Shard key selection–Patterns www.objectrocket.com 66 At this stage you will be able to create a report similar to: Collection <Collection Name> - Profiling period <Start time> , <End time> - Total statements: <num> Number of Inserts: <num> Number of Queries: <num> Query Patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num> Number of Updates: <num> Update patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num> Number of Removes: <num> Remove patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num> Number of FindAndModify: <num> FindandModify patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
  • 67. Shard key selection–Patterns www.objectrocket.com 67 Using the statement’s patterns you may identify shard key candidates Shard key candidates may be a single field ({_id:1}) or a group of fields ({name:1, age:1}). Note down the amount and percentage of operations that each shard key candidate can “scale” Note down the important operations on your cluster Order the shard key candidates by descending <importance>,<scale%> Identify exceptions and constraints as described on shard key limitations If a shard key candidate doesn’t satisfy a constraint you may either: - Remove it from the list, OR - Make the necessary changes on your schema
  • 68. Shard key selection www.objectrocket.com 68 Profiling Identify Patterns Constraints/Best practices Code changes Apply shard key(s)
  • 69. Shard key selection- Constraints www.objectrocket.com 69 Shard key must not have NULL values: db.<collection>.find({<shard_key>:{$exists:false}}) , in the case of a compound key each element must be checked for NULL Shard key is immutable: Check if collected updates or FindAndModify have any of the shard key components on “updateobj”: db.system.profile.find({"ns" : <col>,"op" : "update" },{"updateobj":1}).forEach(function(op) { if (op.updateobj.$set.<shard_key> != undefined ) printjson(op.updateobj);}) db.system.profile.find({"ns" : <col>,"op" : "update" },{"updateobj":1}).forEach(function(op) { if (op.updateobj.<shard_key> != undefined ) printjson(op.updateobj);}) Assuming you are running a replica set the oplog may also proven useful
  • 70. Shard key selection- Constraints www.objectrocket.com 70 Updates must use the shard key or _id on the query predicates: We can identify the query predicates using the exported patterns Shard key must have good cardinality db.<col>.distinct(<shard_key>).length OR db.<col>.aggregate([{$group: { _id:"$<shard_key>" ,n : {$sum:1}}},{$count:"n"}]) We define cardinality as <total docs> / <distinct values> The closer to 1 the better, for example a unique constraint has cardinality of 1 Cardinality is important for splits. Fields with cardinality close to 0 may create indivisible jumbo chunks
  • 71. Shard key selection- Constraints www.objectrocket.com 71 Check for data hotspots db.<col>.aggregate([{$group: { _id:"$<shard_key>" ,number : {$sum:1}},{$sort:{number:- 1}},{$limit:100}},{allowDiskUse:true}]) We don’t want a single range to exceed the 250K document limit We have to try and predict the future based on the above values Check for operational hotspots db.system.profile.aggregate([{$group: { _id:"$query.filter.<shard_key>" ,number : {$sum:1}},{$sort:{number:-1}},{$limit:100}},{allowDiskUse:true}]) We want uniformity as much as possible. Hotspoted ranges may affect scaling
  • 72. Shard key selection- Constraints www.objectrocket.com 72 Check for monotonically increased fields: Monotonically increased fields may affect scaling as all new inserts are directed to maxKey chunk Typical examples: _id , timestamp fields , client-side generated auto-increment numbers db.<col>.find({},{<shard_key>:1}).sort({$natural:1}) Workarounds: Hashed shard key or use application level hashing or compound shard keys
  • 73. Shard key selection- Constraints www.objectrocket.com 73 Throughput: Targeted operations - Operations that using the shard key and will access only one shard - Adding shards on targeted operations will increase throughput linearly (or almost linearly) - Ideal type of operation on a sharded cluster Scatter-gather operations - Operations that aren’t using the shard key or using a part of the shard key - Will access more than one shard - sometimes all shards - Adding shards won’t increase throughput linearly - Mongos opens a connection to all shards / higher connection count - Mongos fetches a larger amount of data on each iteration - A merge phase is required - Unpredictable statement execution behavior - Queries, Aggregations, Updates, Removes (Inserts and FAMs are always targeted)
  • 74. Shard key selection-Constraints www.objectrocket.com 74 Throughput: Scatter-gather operations are not always “evil’’ - Sacrifice reads to scale writes on a write intensive workload - Certain read workloads like Geo or Text searches are scatter-gather anyway - Maybe is faster to divide an operation into N pieces and execute them in parallel - Smaller indexes & dataset - Other datastores like Elastic are using scatter-gather for queries
  • 75. Shard key selection-Constraints www.objectrocket.com 75 Data distribution Factors that may cause an imbalance: - Poor cardinality - Good cardinality with data hotspots - TTL indexes - Data pruning - Orphans Always plan ahead - Try to simulate the collection dataset in 3,6 and 12 months - Consider your deletion strategy before capacity becomes an issue
  • 76. Shard key selection-Anti-Patterns www.objectrocket.com 76 - Monotonically increased fields - _id or timestamps - Use hashed or compound keys instead - Poor cardinality field(s) - country or city - Eventually leads to jumbos - Use Compound keys instead - Operational hotspots - client_id or user_id - Affect scaling - Use Compound keys instead - Data hotspots - client_id or user_id - Affect data distribution may lead to jumbo - Use Compound keys instead
  • 77. Shard key selection www.objectrocket.com 77 Profiling Identify Patterns Constraints/Best practices Code changes Apply shard key(s)
  • 78. Shard key selection-Code Changes www.objectrocket.com 78 Remove unique constraints - Sharded collections can only support one unique index - Use a satellite collection for multiple unique indexes - Change your updates to use the unique index rather than the “_id” - Careful of custom _id Change statements - Make sure FAM are using the shard key - Make sure update are using the shard key or the {multi:true} - Remove geoSearch, eval(), $isolate, $snapshot, group and $where Schema changes - Change null on shard key with “dummy” values - Implement an Insert+Delete functionality for changing shard keys
  • 79. Shard key selection www.objectrocket.com 79 Profiling Identify Patterns Constraints/Best practices Code changes Apply shard key(s)
  • 80. www.objectrocket.com 80 Lab 02 Lab 02. Navigate to /Percona2017/Lab02/exercises https://goo.gl/ddaVdS Sharding Analysis
  • 81. Balancing & Chunk Management ● Sizing and Limits ● moveChunk ● Pre-Splitting ● Splitting ● Merging www.objectrocket.com 81
  • 82. Sizing and Limits www.objectrocket.com 82 Under normal circumstances the default size is sufficient for most workloads. • Maximum size is defined in config.settings • Default 64MB • Hardcoded maximum document count of 250,000 • Chunks that exceed either limit are referred to as Jumbo • Can not be moved unless split into smaller chunks • Ideal data size per chunk is (chunk_size/2) • Chunks can have a zero size with zero documents • Jumbo’s can grow unbounded • Unique shard key guarantees no Jumbos (i.e. max document size is 16MB) post splitting
  • 83. dataSize www.objectrocket.com 83 Estimated Size and Count (Recommended) db.adminCommand({ dataSize: "mydb.mycoll", keyPattern: { "uuid" : 1 }, min: { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" }, max: { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" }, estimate=true }); Actual Size and Count db.adminCommand({ dataSize: "mydb.mycoll", keyPattern: { "uuid" : 1 }, min: { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" }, max: { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" } });
  • 85. Build Indexes www.objectrocket.com 85 s1 s2 createIndex The destination shard builds all non-existing indexes for the collection.
  • 86. Clone Collection www.objectrocket.com 86 s1 s2 insert() The destination shard requests documents from the source and inserts them into the collection.
  • 87. Sync Collection www.objectrocket.com 87 s1 s2 Synchronize _id’s Operations executed on source shard after the start of the migration are logged for synchronizing.
  • 88. Commit www.objectrocket.com 88 s1 s2 Finalize the chunk migration by committing metadata changes.
  • 89. Clean Up - RangeDeleter www.objectrocket.com 89 This process is responsible for removing the chunk ranges that were moved by the balancer (i.e. moveChunk). ● This thread only runs on the primary ● Is not persistent in the event of an election ● Can be blocked by open cursors ● Can have multiple ranges queued to delete(_waitForDelete: false) ● If a queue is present, the shard can not be the destination for a new chunk
  • 90. Pre-splitting (Hashed) www.objectrocket.com 90 Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@example.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB)/32 Count = Number of documents/125000 Limit = Number of shards*8192 numInitialChunks = Min(Max(Size, Count), Limit) 1,600 = 51,200/32 800 = 100,000,000/125,000 32,768 = 4*8192 1600 = Min(Max(1600, 800), 32768) Command db.runCommand( { shardCollection: "mydb.mycoll", key: { "uid": "hashed" }, numInitialChunks : 5510 } );
  • 91. Pre-splitting - splitAt() www.objectrocket.com 91 When the shard key is not hashed but throughput requires pre-splitting with sh.splitAt(). 1. In a testing environment shard the collection to generate split points. a) Alternative: Calculate split points manually by analyzing the shard key field(s) db.runCommand( { shardCollection: "mydb.mycoll", key: { "u_id": 1, "c": 1 } } ) db.chunks.find({ns: "mydb.mycoll"}).sort({min: 1}).forEach(function(doc) { print("sh.splitAt('" + doc.ns + "', { "u_id": ObjectId("" + doc.min.u_id + ""), "c": "" + doc.min.c + '" });'); });
  • 92. splitAt() - cont. www.objectrocket.com 92 2. Identify enough split points to create N chunks which is at least 2x the number of shards and build a JavaScript file containing your split points 3. Shard the collection and then execute the splitAt() commands to create chunks db.runCommand( { shardCollection: "mydb.mycoll", key: { "u_id": 1, "c": 1 } } ) sh.splitAt('mydb.mycoll', { "u_id": ObjectId("581939bd1b0e4d73dd3d72db"), "c": ISODate("2017-01-31T12:02:45.532Z") }); sh.splitAt('mydb.mycoll', { "u_id": ObjectId("5800ed895c5a687c99c2f2ae"), "c": ISODate("2017-01-31T11:02:46.522Z") }); …. sh.splitAt('mydb.mycoll', { "u_id": ObjectId("5823859e5c5a6829cc3b09d0"), "c": ISODate("2017-01-31T13:22:48.522Z") }); sh.splitAt('mydb.mycoll', { "u_id": ObjectId("582f53315c5a68669f569c21"), "c": ISODate("2017-01-31T13:12:49.532Z") }); …. sh.splitAt('mydb.mycoll', { "u_id": ObjectId("581939bd1b0e4d73dd3d72db"), "c": ISODate("2017-01-31T17:32:41.552Z") }); sh.splitAt('mydb.mycoll', { "u_id": ObjectId("58402d255c5a68436b40602e"), "c": ISODate("2017-01-31T19:32:42.562Z") });
  • 93. splitAt() - cont. www.objectrocket.com 93 4. Using moveChunk distribute the chunks evenly. Optional: Now that the chunks are distributed evenly run more splits to prevent immediate splitting. db.runCommand({ "moveChunk": "mydb.mycoll", "bounds": [{ "u_id": ObjectId("52375761e697aecddc000026"), "c": ISODate("2017-01-31T00:00:00Z") }, { "u_id": ObjectId("533c83f6a25cf7b59900005a"), "c": ISODate("2017-01-31T00:00:00Z") } ], "to": "rs1", "_secondaryThrottle": true });
  • 94. Shard Management ● Adding Shards ● Removing Shards ● Replacing Shards ○ Virtual ○ Physical www.objectrocket.com 94
  • 95. Shard Management-Operations www.objectrocket.com 95 Add shard(s): - Increase capacity - Increase throughput or Re-plan Remove shard(s): - Reduce capacity - Re-plan Replace shard(s): - Change Hardware specs - Hardware maintenance - Move to different underlying platform
  • 96. Shard Mgmt – Add shard(s) www.objectrocket.com 96 Command for adding shards sh.addShard(host) Host is either a standalone or replica set instance Alternatively db.getSiblingDB('admin').runCommand( { addShard: host} ) Optional parameters with runCommand: maxSize (int) : The maximum size in megabytes of the shard Name (string): A unique name for the shard Localhost and hidden members can’t be used on host variable
  • 97. Shard Mgmt-Remove shard(s) www.objectrocket.com 97 Command for removing shards: db.getSiblingDB('admin').runCommand( { removeShard: host } ) host’s data (sharded and unsharded) MUST migrated to other shards in the cluster 1) Ensure that the balancer is running 2) Execute db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } ) MongoDB will print: "msg" : "draining started successfully", "state" : "started", "shard" : "<shard>", "note" : "you need to drop or movePrimary these databases", "dbsToMove" : [ ], "ok" : 1
  • 98. Shard Mgmt-Remove shard(s) www.objectrocket.com 98 3) Balancer will now start move chunks from the <shard_name> to all other shards 4) Check the status using db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } ), MongoDB will print: { "msg" : "draining ongoing", "state" : "ongoing", "remaining" : { "chunks" : <num_of_chunks_remaining>, "dbs" : 1 }, "ok" : 1 }
  • 99. Shard Mgmt-Remove shard(s) www.objectrocket.com 99 5) Run the db.getSiblingDB('admin').runCommand( { removeShard: host }) for one last time MongoDB will print: { "msg" : "removeshard completed successfully", "state" : "completed", "shard" : "<shard_name>", "ok" : 1 } If the shard is the primary shard for a database(s) it may or may not host unsharded collections Removal will only completed when moving unsharded data to a different shard
  • 100. Shard Mgmt-Remove shard(s) www.objectrocket.com 100 6) Check the status using db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } ), MongoDB will print: { "msg" : "draining ongoing", "state" : "ongoing", "remaining" : { "chunks" : NumberLong(0), "dbs" : NumberLong(1) }, "note" : "you need to drop or movePrimary these databases", "dbsToMove" : ["<database name>"], "ok" : 1 }
  • 101. Shard Mgmt-Remove shard(s) www.objectrocket.com 101 7) Use the following command to movePrimary: db.getSiblingDB('admin').runCommand( { movePrimary: <db name>, to: "<shard name>" }) MongoDB will print: {"primary" : "<shard_name>:<host>","ok" : 1} 8) After you move all databases you will be able to remove the shard: db.getSiblingDB('admin').runCommand( { removeShard: <shard_name> } ) MongoDB will print: {"msg" : "removeshard completed successfully", "state" : "completed", "shard" : "<shard_name>", "ok" : 1}
  • 102. Shard Mgmt-Remove shard(s) www.objectrocket.com 102 movePrimary: - requires Write Downtime for the unsharded collections - Without Write Downtime you may encounter dataloss - May impact performance (massive writes &foreground index builds) Write downtime: • Application layer - Database layer - - Stop all mongos - - Drop databasesusers (restart the mongos) flushrouterconfig is mandatory when you movePrimary - db.adminCommand({flushRouterConfig: 1})
  • 103. Shard Mgmt-Remove shard(s) www.objectrocket.com 103 Calculate average chunk moves time: db.getSiblingDB(”config").changelog.aggregate([{$match:{"what" : "moveChunk.from"}}, {$project:{"time1":"$details.step 1 of 7", "time2":"$details.step 2 of 7","time3":"$details.step 3 of 7","time4":"$details.step 4 of 7","time5":"$details.step 5 of 7","time6":"$details.step 6 of 7","time7":"$details.step 7 of 7", ns:1}}, {$group:{_id: "$ns","avgtime": { $avg: {$sum:["$time", "$time1","$time2","$time3","$time4","$time5","$time6","$time7"]}}}}, {$sort:{avgtime:-1}}, { $project:{collection:"$_id", "avgt":{$divide:["$avgtime",1000]}}}]) Calculate the number of chunks to be moved: db.getSiblingDB(”config").chunks.aggregate({$match:{"shard" : <shard>}},{$group: {_id:"$ns",number : {$sum:1}}})
  • 104. Shard Mgmt-Remove shard(s) www.objectrocket.com 104 Calculate non-sharded collection size: function FindCostToMovePrimary(shard){ moveCostMB = 0; DBmoveCostMB = 0; db.getSiblingDB('config').databases.find({primary:shard,}).forEach(function(d){ db.getSiblingDB(d._id).getCollectionNames().forEach(function(c){ if ( db.getSiblingDB('config').collections.find({_id : d._id+"."+c, key: {$exists : true} }).count() < 1){ x=db.getSiblingDB(d._id).getCollection(c).stats(); collectionSize = Math.round((x.size+x.totalIndexSize)/1024/1024*100)/100; moveCostMB += collectionSize; DBmoveCostMB += collectionSize; } else if (! /system/.test(c)) { } }) print(d._id); print("Cost to move database :t"+ DBmoveCostMB+"M"); DBmoveCostMB = 0; }); print("Cost to move:t"+ moveCostMB+"M");};
  • 105. Shard Mgmt-Remove shard(s) www.objectrocket.com 105 s1 s2 1) {removeShard: ‘s2’} Chunks 2) Drain chunks 3) Move Primary 5) {removeShard: ‘s2’} 4) flushRouterConfig Database ? ? ?
  • 106. Shard Mgmt-Replace shard(s) www.objectrocket.com 106 Drain Method using javascript 1)Stop the balancer 2)Add the target shard <target> 3)Create and execute chunk moves from <source> to <target> db.chunks.find({shard:"<shard>"}).forEach(function(chunk){print("db.adminCommand({moveChunk : '"+ chunk.ns +"' , bounds:[ "+ tojson(chunk.min) +" , "+ tojson(chunk.max) +"] , to:<target>'})");}) mongo <host:port> drain.js | tail -n +1 | sed 's/{ "$maxKey" : 1 }/MaxKey/' | sed 's/{ "$minKey" : 1 }/MinKey/' > run_drain.js 4)Remove the <source> shard (movePrimary may required)
  • 107. Shard Mgmt-Replace shard(s) www.objectrocket.com 107 Drain Method using the Balancer 1) Stop the balancer 2) Add the target shard <target> 3) Run the remove command for <source> shard 4) Optional: Set the maxSize=1 for all shards but <source> and <target> 5) Start the balancer - it will move chunks from <source> to <target> 6) Finalize the <source> removal (movePrimary may required)
  • 108. Shard Mgmt-Replace shard(s) www.objectrocket.com 108 Replica-set extension 1) Stop the balancer 2) Add additional mongods on replica set <source> 3) Wait for the nodes to become fully sync 4) Perform a stepdown and promote one of the newly added nodes as the new Primary 5) Remove <source> nodes from the replica set 6) Start the balancer
  • 109. Troubleshooting ● Hotspots ● Imbalanced Data ● Configuration Servers ● RangeDeleter ● Orphans ● Collection Resharding www.objectrocket.com 109
  • 110. Hotspotting www.objectrocket.com 110 s1 s2 80% 20% Possible Causes: • Shard Key • Unbalanced Collections • Unsharded Collections • Data Imbalance • Randomness
  • 111. Hotspotting - Profiling www.objectrocket.com 111 Knowing the workload this is how you can identify your top u_id’s. db.system.profile.aggregate([ {$match: { $and: [ {op:"update"}, {ns : "mydb.mycoll"} ] }}, {$group: { "_id":"$query.u_id", count:{$sum:1}}}, {$sort: {"count": -1}}, {$limit : 5 } ]); { "result" : [ { "_id" : ObjectId('53fcc281a25cf72b59000013', "count" : 28672 }, .......... ], "ok" : 1 }
  • 112. Hotspotting - SplitAt() www.objectrocket.com 112 { "_id" : "mydb.mycoll-u_id_ObjectId('533974d30daac135a9000049')", "lastmod" : Timestamp(1, 5), "lastmodEpoch" : ObjectId("570733145f2bf94777a62155"), "ns" : "mydb.mycoll", "min" : { "u_id" : ObjectId('533974d30daac135a9000049') }, "max" : { "u_id" : ObjectId('545d29e74d32da4e290000cf' }, "shard" : ”s1" } { “u_id” : ObjectId('53fcc281a25cf72b59000013'), … { “u_id” : ObjectId('54062bd2a25cf7e051000017') , … { “u_id” : ObjectId('5432ada44d32dadcc400007b') , … { “u_id” : ObjectId('5433f7254d32daa1e3000198') , … sh.splitAt() Unbalanced Operations
  • 113. Data Imbalance – Disk Usage www.objectrocket.com 113 s1 s2 s1 s2 Unbalanced Resources
  • 114. Data Imbalance www.objectrocket.com 114 Common Causes ● Balancing has been disabled or window to small ○ sh.getBalancerState(); ○ db.getSiblingDB("config").settings.find({"_id" : "balancer"}); ● maxSize has been reached or misconfigured across the shards ○ db.getSiblingDB("config").shards.find({},{maxSize:1}); ● Configuration servers in an inconsistent state ○ db.getSiblingDB("config").runCommand( {dbHash: 1} ); ● Jumbo Chunks ○ Previously covered, chunks that exceed chunksize or 250,000 documents ● Empty Chunks ○ Chunks that have no size and contain no documents ● Unsharded Collections ○ Data isolated to primary shard for the database ● Orphaned Documents
  • 115. Data Imbalance www.objectrocket.com 115 s1 s2 80% 20% s1 s2 Unbalanced Operations Unbalanced Resources Empty Chunks { "uuid" : "00005cf6-....." } -->> { "uuid" : "7fe55637-....." } on : s1 100K Docs @ 45MB { "uuid" : "7fe55637-....." } -->> { "uuid" : "7fe55742-....." } on : s2 0 Docs @ 0MB
  • 116. Data Imbalance - Empty Chunks www.objectrocket.com 116 Unevenly distributed remove or TTL operations insert() & split insert() & split insert() -> split -> remove() insert() -> split -> remove() s1 s2 s3 s4 moveChunk
  • 117. Data Imbalance - Merging www.objectrocket.com 117 Using JavaScript the following process can resolve the imbalance. 1. Check balancer state, set to false if not already. 2. Identify empty chunks and their current shard. a. dataSize command covered earlier in the presentation, record this output 3. Locate adjacent chunk and it's current shard, this is done by comparing max and min a. The chunk map is continuous, there are no gaps between max and min 4. If shard is not the same move empty chunks to the shard containing the adjacent chunk a. Moving the empty chunk is cheap but will cause frequent metadata changes 5. Merge empty chunks into their adjacent chunks a. Merging chunks is also non-blocking but will cause frequent meta changes Consideration, how frequent is this type of maintenance required?
  • 118. Configuration Servers - SCCC www.objectrocket.com 118 Sync Cluster Connection Configuration At times a configuration server can be corrupted, the following methods can be used to fixed both configurations. This is applicable to versions <= 3.2 MMAP and 3.2 WiredTiger. 1. As pre-caution back up all three config servers via mongodump 2. If only one of the three servers is not functioning or out of sync a. Stop one of the available configuration servers b. Rsync the data directory to the broken configuration server and start mongod 3. If only the first configuration server is healthy or all three are out of sync a. Open a Mongo and SSH session to the first configuration server b. From the Mongo session use db.fsyncLock() to lock the mongod c. From the SSH session make a copy of the data directory d. In the existing Mongo session use db.fsyncUnlock() to unlock the mongod e. Rsync the copy to the broken configuration servers and start mongod
  • 119. Configuration Servers - CSRS www.objectrocket.com 119 Config Servers as Replica Sets With CSRS an initial sync from a another member of the cluster is the preferred method to repair the health of the replica set. If for any reason the majority is lost the cluster metadata will still be available but read-only. CSRS became available in version 3.2 and mandatory in version 3.4.
  • 120. RangeDeleter www.objectrocket.com 120 This process is responsible for removing the chunk ranges that were moved by the balancer (i.e. moveChunk). ● This thread only runs on the primary ● Is not persistent in the event of an election ● Can be blocked by open cursors ● Can have multiple ranges queued to delete ● If a queue is present, the shard can not be the destination for a new chunk A queue being blocked by open cursors can create two potential problems: ● Duplicate results for secondary and secondaryPreferred read preferences ● Permanent Orphans
  • 121. Range Deleter - Open Cursors www.objectrocket.com 121 Log Line: [RangeDeleter] waiting for open cursors before removing range [{ _id: -869707922059464413 }, { _id: -869408809113996381 }) in mydb.mycoll, elapsed secs: 16747, cursor ids: [74167011554] MongoDB does not provide a server side command to close cursors, in this example 74167011554 must be closed to allow RangeDeleter to proceed. Solution using Python: from pymongo import MongoClient c = MongoClient('<host>:<port>') c.the_database.authenticate('<user>','<pass>',source='admin') c.kill_cursors([74167011554])
  • 122. Orphans www.objectrocket.com 122 moveChunk started and documents are being inserted into s2 from s1. s1 s2 Commit Range: [{ u_id: 100 }, { u_id: 200 }]
  • 123. Orphans www.objectrocket.com 123 s1 s2 s2 synchronizes the changes documents and s1 commits the meta data change. Commit Range: [{ u_id: 100 }, { u_id: 200 }]
  • 124. Orphans www.objectrocket.com 124 s1 s2 The delete process is asynchronous, at this time both shards contain the documents. Primary read preference filters them but not secondary and secondaryPreferred. RangeDeleter Waiting On Open Cursor
  • 125. Orphans www.objectrocket.com 125 s1 s2 An unplanned election or or step down occurs on s1. RangeDeleter Waiting On Open Cursor
  • 126. Orphans www.objectrocket.com 126 s1 s2 The documents are now orphaned on s1. Range: [{ u_id: 100 }, { u_id: 200 }]Range: [{ u_id: 100 }, { u_id: 200 }]
  • 127. Orphans www.objectrocket.com 127 moveChunk reads and writes to and from primary members. Primary members cache a copy of the chunk map via the ChunkManager process. To resolve the issue: 1. Set the balancer state to false. 2. Add a new shard to the cluster. 3. Using moveChunk move all chunks from s1 to sN. 4. When all chunks have been moved and RangeDeleter are complete, connect to primary: a. Run a remove() operation to removed remaining “orphaned” documents b. Or: Drop and recreate empty collection with all indexes Alternatively Use: https://docs.mongodb.com/manual/reference/command/cleanupOrphaned/
  • 128. Collection Resharding www.objectrocket.com 128 Overtime query patterns and writes can make a shard key not optimal or the wrong shard key was implemented. Dump And Restore ● Typically most time consuming ● Requires write outage if changing collection names ● Requires read and write outage if keeping the same collection name Forked Writes ● Fork all writes to separate collection, database, or cluster ● Reduces or eliminates downtime completely using code deploys ● Increases complexity from a development perspective ● Allows for rollback and read testing
  • 129. Collection Resharding www.objectrocket.com 129 Incremental Dump and Restores (Append Only) ● For insert only collections begin incremental dump and restores ● Requires a different name space but reduces the cut-over downtime ● Requires read and write outage if keeping the same collection name Mongo to Mongo Connector ● Connectors tail the oplog for namespace changes ● These changes can be filtered and applied to another namespace or cluster ● Similar to the forked write approach but handled outside of the application ● Allows for a rollback and read testing
  • 130. www.objectrocket.com 130 Lab 03 Lab 04 Lab 03. Navigate to /Percona2017/Lab03/exercises Lab 04. Navigate to /Percona2017/Lab04/exercises https://goo.gl/ddaVdS Resharding and Chunk Management
  • 131. Miscellaneous ● User Management ● Zones ● Read Operations ● Connecting www.objectrocket.com 131
  • 132. User Management ● Users and Roles ○ Application Users ○ Admin Users ○ Monitoring www.objectrocket.com 132 ● Replica Set Access
  • 133. Users And Roles www.objectrocket.com 133 ● Role based authentication framework with collection level granularity ○ Starting in version 3.4 views allow for finer granularity beyond roles ● To enable authentication for all components ○ Set security.authorization to enabled ○ Set security.keyFile to a file path containing a 6 - 1024 character random string ○ RBAC documents stored in admin.system.users, admin.system.roles, and $external.users ● Once enabled the localhost exception can be used to create the initial admin user ○ For replica sets the admin database is stored and replicated to each member ○ For sharded clusters the admin data is stored on the configuration replica set ● Alternatively the user(s) can be created prior to enabling authentication ● External authentication is also available in the Enterprise* and Percona* distributions ○ SASL / LDAP* ○ Kerberos*
  • 137. Creating Users www.objectrocket.com 137 use mydb db.createUser({ user: "<user>", pwd: "<secret>", roles: [{ role: "readWrite", db: "mydb" }] }); use admin db.createUser({ user: "<user>", pwd: "<secret>", roles: [ { role: "readWrite", db: "mydb1" }, { role: "readWrite", db: "mydb2" }, { role: "clusterMonitor", db: "admin" } ] Application User Engineer ● Read/Write access to single database ● Read/Write access to multiple database ● Authentication scoped to admin ● Read-only cluster role
  • 138. Built-In Roles www.objectrocket.com 138 ● Database ○ read ○ readWrite ● Database Administration ○ dbAdmin ○ dbOwner ○ userAdmin ● Backup And Restore ○ backup ○ restore ● Cluster Administration ○ clusterAdmin ○ clusterManager ○ clusterMonitor ○ hostManager ● All Databases ○ readAnyDatabase ○ readWriteAnyDatabase ○ userAdminAnyDatabase ○ dbAdminAnyDatabase ● Superuser ○ root
  • 139. Creating Custom Roles www.objectrocket.com 139 use admin db.runCommand({ createRole: "<role>", privileges: [ { resource: { db: "local", collection: "oplog.rs" }, actions: [ "find" ] }, { resource: { cluster: true }, actions: [ "listDatabases" ] } ], roles: [ { role: "read", db: "mydb" } ] }); Role Creation • find() privilege on local.oplog.rs • List all databases on the replica set • Read only access on mydb Granting Access use admin db.createUser( { user: "<user>", pwd: "<secret>", roles: [ "<role>" ] } ); Or use admin db.grantRolesToUser( "<user>", [ { role: "<role>", db: "admin" }] );
  • 140. Additional Methods www.objectrocket.com 140 • db.updateUser() • db.changeUserPassword() • db.dropUser() • db.grantRolesToUser() • db.revokeRolesFromUser() • db.getUser() • db.getUsers() User Management • db.updateRole() • db.dropRole() • db.grantPrivilegesToRole() • db.revokePrivilegesFromRole() • db.grantRolesToRole() • db.revokeRolesFromRole() • db.getRole() • db.getRoles() Role Management
  • 141. Zones ● Geographic Distribution ● Mixed Engine Topology ● Tiered Performance www.objectrocket.com 141
  • 142. Zones www.objectrocket.com 142 A Zone is a range based on the shard key prefix, pre-3.4 was called Tag Aware Sharding A zone can be associated with one or more shards A shard can be associated with any number of non-conflicting zones Chunks covered by a zone will be migrated to associated shards. We use zones for: - Data isolation - Geographically distributed clusters (keep data close to application) - Heterogeneous hardware clusters
  • 143. Zones www.objectrocket.com 143 Zone : [“A”] Zone : [“B”] Zone : [] Zones: [A] KEY: 1-10 [B] KEY: 10-20 {min , 10} {10, 21} {21, max}
  • 144. Zones www.objectrocket.com 144 Zone : [“A”] Zone : [“A”,“B”] Zone : [“C”] Zones: [A] KEY: 1-10 [B] KEY: 10-20 [C] KEY: 50-60 {min, 7} {7, 30} {31,max}
  • 145. Zones www.objectrocket.com 145 Define a zone: sh.addTagRange(<collection>, { <shardkey>: min }, { <shardkey>: max }, <zone_name>) Zone ranges are always inclusive of the lower boundary and exclusive of the upper boundary The range ({ <shardkey>: min },{ <shardkey>: max }) must be a prefix of the shard key Hashed keys are also supported - using the hash key ranges Remove a zone: There is no helper available yet - Manually remove from config.tags db.getSiblingDB('config').tags.remove({_id:{ns:<collection>, min:{<shardkey>:min}}, tag:<zone_name>})
  • 146. Zones www.objectrocket.com 146 Add shards to a zone: sh.addShardTag(<shard_name>, <zone_name>) Remove zones from a shard: sh.removeShardTag(<shard_name>, <zone_name>) Manipulate zones may or may not cause chunk moves Manipulate zones may be a long running operation
  • 147. Zones www.objectrocket.com 147 View existing zones/shards associations: sh.status() list the tags next to the shards ,or Query the config database db.getSiblingDB('config').shards.find({tags:<zone>}) View existing zones: Query the config database db.getSiblingDB('config').tags.find({tags:<zone>})
  • 148. Zones- Balancing www.objectrocket.com 148 Balancer aims for an even chunk distribution During a chunk migration: - Balancer checks possible destination based on configured zones - If the chunk range falls into a zone, it migrates the chunk to a shard within the zone - Chunks that do not fall into a zone may exists on any shard During balancing rounds balancer moves chunks that violate configured zones
  • 149. Read Operations ● Covering Queries ● Sharding Filter ● Secondary Reads www.objectrocket.com 149
  • 150. Read Operations – Covered Queries www.objectrocket.com 150 Covered query is a query that can be satisfied entirely using an index and does not have to examine any documents On unsharded collection a query on db.foo.find({x:1},{x:1,_id:0}) is covered by the {x:1} index On a sharded collection the query is covered if: - {x:1} is the shard key, or - compound index on {x:1,<shard_key>} exists
  • 151. Read Operations – Covered Queries www.objectrocket.com 151 Reading from primary db.foo.find({x:1},{x:1,_id:0}) using {x:1} index explain() reports: "totalKeysExamined" : N "totalDocsExamined" : N Reading from a secondary db.foo.find({x:1},{x:1,_id:0}) is covered by the {x:1} index. explain() reports: "totalKeysExamined" : N, "totalDocsExamined" : 0,
  • 152. Read Operations – Sharding filter www.objectrocket.com 152 Reading from Primaries: - Queries must fetch the shard key to filter the orphans - The cost of sharding filter reports on explain() plan "stage" : "SHARDING_FILTER” - For operation that return a high number of results, sharding filter can be expensive Secondaries don’t apply the sharding filter But… on a sharded collection may return orphans or “on move documents” You can’t fetch orphans when: - Read preference is set to primary - Read preference is set to secondary and shard key exists on read predicates
  • 153. Read Operations – Sharding filter www.objectrocket.com 153 Distinct and Count commands return orphans even with primary read preference Always use aggregation framework instead of Distinct and Count on sharded collections Reveal Orphans: Compare "nReturned" of SHARDING_FILTER with "nReturned" of EXECUTION stage When SHARDING_FILTER < EXECUTION orphans may exists Compare count() with itcount() When itcount() < count() orphans may exists
  • 154. Read Operations – Sharding filter www.objectrocket.com 154 A handy JavaScript that reveals orphans db.getMongo().getDBNames().forEach(function(database) {var re = new RegExp('config|admin'); if (!database.match(re)) {print("Checking database: "+database);print("");db.getSiblingDB(database).getCollectionNames().forEach(function(co llection){print("Checking collection: "+collection); db.getSiblingDB(database).getCollection(collection).find().explain(true).executionStats.exe cutionStages.shards.forEach(function(orphans) {if (orphans.executionStages.chunkSkips>0){print("Shard " + orphans.shardName + " has " + orphans.executionStages.chunkSkips +" orphans")}}) })} })
  • 155. Read Operations www.objectrocket.com 155 Find: Mongos merges the results from each shard Aggregate: Runs on each shard and a random shard perform the merge In older versions the mongos or the primary shard only was responsible for the merge Sorting: Each shard sorts its own results and mongos performs a merge sort Limits: Each shard applies the limit and then mongos re-apply it on the results Skips: Each shard returns the full result set (un-skipped) and mongos applies the skip Explain() output contains individual shard execution plan and the merge phase It's not uncommon each shard to follow different execution plan
  • 156. Connecting ● URI parameters ● Load Balancing ● New parameters www.objectrocket.com 156
  • 157. Connecting – Server discovery www.objectrocket.com 157 URI Format mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]] Server Discovery: - The client measure the duration of an ismaster call (RTT) - Variable "localThresholdMS" defines the acceptable RTT ,default is 15 millis. - The driver load-balances randomly across mongos that fall within the localThresholdMS serverSelectionTimeoutMS: how long to block for server selection before throwing an exception. The default is 30 seconds. heartbeatFrequencyMS: The interval between server checks, counted from the end of the previous check until the beginning of the next one, default 10 seconds. A client waits between checks at least minHeartbeatFrequencyMS. Default 500 millis. Important: Each driver may use its own algorithm to measure average RTT
  • 158. Connecting – Load Balancing www.objectrocket.com 158 Driver: - Increases heartbeat traffic/availability Driver with virtual mongos pools: - N number of mongos - Each application server uses a portion , for example N/2 - Reduce heartbeat traffic/availability Use a Load Balancer - Hash based routing (avoid round-robin) - Minimize heartbeat traffic - Use more than one LB to avoid Single point of failure - Be aware of hot app servers and client-side NAT
  • 159. Connecting – Connection Pool www.objectrocket.com 159 TaskExecutorPoolSize (SERVER-25027) The number of connection pools to use for a given mongos. Defaults to 4 <= NUM_CORES <= 64. Decreasing the value will decrease the number of open connections ShardingTaskExecutorPoolHostTimeoutMS (SERVER-25027) How long is the wait before dropping all connections to a host that mongos haven’t communicated with. Default 5 minutes Increasing the value will smooth connection spikes
  • 160. Connecting – Connection Pool www.objectrocket.com 160 ShardingTaskExecutorPoolMaxSize (SERVER-25027) The maximum number of connections to hold to a given host at any time. Default is unlimited Controls simultaneous operations from a mongos to a host (Poolsize*PoolMaxSize) ShardingTaskExecutorPoolMinSize (SERVER-25027) The minimum number of connections to hold to a given host at any time. Default is 1 Increasing the value will increase the number of open connections per mongod (PoolMinSize*PoolSize*MONGOS)
  • 161. Connecting – Connection Pool www.objectrocket.com 161 ShardingTaskExecutorRefreshRequirement (SERVER-25027) How long mongos will wait before attempting to heartbeat an idle connection in the pool. Default 1 minute Increasing the value will reduce the heartbeat traffic Lower the value may decrease the connection failures ShardingTaskExecutorRefreshTimeout (SERVER-25027) How long mongos will wait before timing out a heartbeat. Default 20 seconds Tuning this value up may help if heartbeats taking too long.
  • 162. Connecting – Connection Pool www.objectrocket.com 162 ShardingTaskExecutorPoolMaxConnecting (SERVER-29237) Limit the rate at which mongosnodes add connectons to connection pools Only N connections can be in the processing state at the same time (setup/refresh state) Avoid overwhelming mongod nodes with connection requests Per connection pool , C*N can be in setup/refresh Our patch for SERVER-26654 setParameter: taskExecutorPoolSize: 8 ShardingTaskExecutorPoolRefreshTimeoutMS: 600000 ShardingTaskExecutorPoolHostTimeoutMS: 86400000 ShardingTaskExecutorPoolMinSize: 100 ShardingTaskExecutorPoolMaxConnecting: 25
  • 163. Backup & Disaster Recovery ● Methods ● Topologies www.objectrocket.com 163
  • 164. mongodump www.objectrocket.com 164 This utility is provided as part of the MongoDB binaries that creates a binary backup of your database(s) or collection(s). ● Preserves data integrity when compared to mongoexport for all BSON types ● Recommended for stand-a-lone mongod and replica sets ○ It does work with sharded clusters but be cautious of cluster and data consistency ● When dumping a database (--database) or collection (--collection) you have the option of passing a query (--query) and a read preference (--readPreference) for more granular control ● Because mongodump can be time consuming --oplog is recommended so the operations for the duration of the dump are also captured ● To restore the output from mongodump you use mongorestore, not mongoimport
  • 165. mongorestore www.objectrocket.com 165 This utility is provided as part of the MongoDB binaries that restores a binary backup of your database(s) or collection(s). ● Preserves data integrity when compared to mongoimport for all BSON types ● When restoring a database (--database) or collection (--collection) you have the option of removing (--drop) collections contained in the backup and the option to replay the oplog events (--oplogReplay) to a point in time ● Be mindful of destination of the restore, restoring (i.e. inserting) does add additional work to the already existing workload in additional to the number of collections being restored in parallel (--numParallelCollections) ● MongoDB will also restore indexes in the foreground (blocking), indexes can be created in advance or skipped (--noIndexRestore) depending on the scenario
  • 166. Data Directory Backup www.objectrocket.com 166 While a mongod process is stopped a filesystem copy or snapshot can be performed to make a consistent copy of the replica set member. ● Replica set required to prevent interruption ● Works for both WiredTiger and MMAPv1 engines ● Optionally you can use hidden secondary with no votes to perform this process to prevent affectioning quorum ● For MMAPv1 this will copy fragmentation which increases the size of the backup Alternatively fsyncLock() can be used to flush all pending writes and lock the mongod process. At this time a file system copy can be performed, after the copy has completed fsyncUnlock() can be used to return to normal operation.
  • 167. Percona Hot Backup www.objectrocket.com 167 For Percona Server running WiredTiger or RocksDB backups can be taken using an administrative backup command. ● Can be executed on a mongod as a non-blocking operation ● Creates a backup of an the entire dbPath ○ Similar to data directory backup extra storage capacity required > use admin switched to db admin > db.runCommand({createBackup: 1, backupDir: "/tmp/backup"}) { "ok" : 1 }
  • 168. Replicated DR www.objectrocket.com 168 s1 s2 s1 s2 DC or AZ Interconnect Preferred Zone Secondary Zone ● Primary zones replicas configured with a higher priority and more votes for election purposes ● All secondary zones replica members are secondaries until secondary zone is utilized ● An rs.reconfig() is required to remove a preferred zone member and level votes to elect a primary in a failover event
  • 169. www.objectrocket.com 169 Lab 05 Lab 06 https://goo.gl/ddaVdS Zone Management and Backup / Restore Lab 05. Navigate to /Percona2017/Lab05/exercises Lab 06. Navigate to /Percona2017/Lab06/exercises
  • 170. Use Cases ● Social Application ● Aggregation ● Caching ● Retail ● Time Series www.objectrocket.com 170
  • 171. Use Cases - Caching www.objectrocket.com 171 Use Case: A url/content cache Schema example: { _id:ObjectId, url:<string> response_body:<string> ts:<timestamp> } Operations: - Insert urls into cache - Invalidate cache (for example TTL index) - Read from cache (Read the latest url entry based on ts)
  • 172. Use Cases - Caching www.objectrocket.com 172 Shard key Candidates: {url:1} - Advantages: Scale Reads Writes - Disadvantages: Key length, may create empty chunks {url:hashed} - Advantages: Scale Reads Writes - Disadvantages: CPU cycles to convert url to hashed, orderBy is not supported, may create empty chunks Hash the {url:1} on the application - Advantages: Scale Reads/Writes, One index less - Disadvantages: Num of initial chunks not supported, may create empty chunks
  • 173. Use Cases – Social app www.objectrocket.com 173 Use Case: A social app , a users “interact” with other users Schema example: { _id:ObjectId, toUser :<string> or <ObjectId> or <number> fromUser :<string> or <ObjectId> or <number> interaction:<string> ts:<timestamp> } Operations: - Insert interaction - Remove interaction - Read interaction (fromUser and toUser)
  • 174. Use Cases – Social app www.objectrocket.com 174 Shard key Candidates: {toUser:1} or {fromUser:1} - Advantages: Scale a portion of Reads - Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks {toUser:1, fromUser:1} - Advantages: Scale a portion of Reads, Improved Write distribution but not perfect - Disadvantages: Frequent splits & moves, Empty Chunks {fromUser:1, toUser:1} - Advantages: Scale a portion of Reads, Improved Write distribution but not perfect - Disadvantages: Frequent splits & moves, Empty Chunks
  • 175. Use Cases – Social app www.objectrocket.com 175 Shard key Candidates(cont.): {toUser+fromUser:”hashed”} - Advantages: Scale writes, Avoid Empty and Jumbo Chunks - Disadvantages: All queries are Scatter-Gather but one query type - Similar to _id:”hashed” Perfect example that there is no perfect shard key: - Application behavior will point us to the shard key - Pre-define user_ids and split might not be effective
  • 176. Use Cases – Email app www.objectrocket.com 176 Use Case: Provide inbox access to users Schema example: { _id:ObjectId, recipient :<string> or <ObjectId> or <number> sender :<string> title:<string> body:<string> ts:<timestamp> } Operations: - Read emails - Insert emails - Delete emails
  • 177. Use Cases – Email app www.objectrocket.com 177 Shard key Candidates: {recipient:1} - Advantages: All Reads will access one shard - Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks {recipient:1, sender:1} - Advantages: Some Reads will access one shard - Disadvantages: Write Hotspots, Empty Chunks Introduce buckets (an auto-increment number per user) {recipient:1, bucket:1} - Advantages: Reads will access one shard for every iteration - Disadvantages: Write Hotspots, Empty Chunks
  • 178. Use Cases – Time-series data www.objectrocket.com 178 Use Case: A server-log collector Schema example: { _id:ObjectId, host :<string> or <ObjectId> or <number> type:<string> or <number> message :<string> ts:<timestamp> } Operations: - Insert logs - Archive logs - Read logs (host,type,<range date>)
  • 179. Use Cases – Time-series data www.objectrocket.com 179 Shard key Candidates: {host:1} - Advantages: Targets Reads - Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks {host:hashed} - Advantages: Targets Reads - Disadvantages: Write Hotspots, Jumbo Chunks, Empty Chunks {host:1,type:1,ts:1} - Advantages: Decent cardinality, may Target Reads - Disadvantages: Monotonically increased per host,type, Write Hotspots , Empty Chunks
  • 180. Use Cases – Time-series data www.objectrocket.com 180 Shard key Candidates (cont.): {host:1,type:1, date_bucket:1} - Advantages: Decent cardinality, Read targeting - Disadvantages: Monotonically increased per host, Write Hotspots , Empty Chunks Pre-create and distribute chunks easily Merge host, type, date_bucket on the application layer and use hashed key - Queries {shard_key, ts:<range>}
  • 181. Use Cases – Retail app www.objectrocket.com 181 Use Case: A product catalog { _id:ObjectId, product :<string> description:<string> category:<string> dimensions:<subdocument> weight: <float> price:<float> rating:<float> shipping :<string> released_date:<ISODate> related_products:<array> }
  • 182. Use Cases – Retail app www.objectrocket.com 182 Operations: - Insert Products - Update Products - Delete Products - Read from Products - various query patterns Shard key Candidates: {category:1} - Advantages: Read targeting for most of the queries - Disadvantages: Jumbo Chunks, Read hotspots
  • 183. Use Cases – Retail app www.objectrocket.com 183 Shard key Candidates (cont.): {category:1, price:1} - Advantages: Read targeting for most of the queries - Disadvantages: Jumbo Chunks, Read hotspots Tag documents with price ranges 0-100,101-200 etc Use a shard key of the type {category:1,price_range:1,release_date:1} Try to involve as much of the query criteria on your shard key, but careful of Read hotspots For example rating as users tend to search for 5-stars
  • 185. www.objectrocket.com 185 We’re Hiring! Looking to join a dynamic & innovative team? https://www.objectrocket.com/careers Reach out to us directly at careers@objectrocket.com
  • 186. Thank you! Address: 100 Congress Ave Suite 400 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com 186