SlideShare une entreprise Scribd logo
1  sur  53
Sharding Methods For
MongoDB
Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
#MongoDB
2
• Customer Stories
• Sharding for Performance/Scale
– When to shard?
– How many shards do I need?
• Types of Sharding
• How to Pick a Shard Key
• Sharding for Other Reasons
Agenda
Customer Stories
4
5
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B
Foursquare
6
• 11 MongoDB clusters
– 8 are sharded
• Largest cluster has 15 shards (check ins)
– Sharded on user id
Foursquare clusters
7
• Large data set
CarFax
8
• 13 billion+ documents
– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents
• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers
CarFax Shards
9
What is Sharding?
12
Sharding Overview
Primary
Secondary
Secondary
Shard 1
Primary
Secondary
Secondary
Shard 2
Primary
Secondary
Secondary
Shard 3
Primary
Secondary
Secondary
Shard N
…
Query
Router
Query
Router
Query
Router
……
Driver
Application
14
Scaling: Sharding
mongod
Read/Write Scalability
Key Range
0..100
15
Scaling: Sharding
Read/Write Scalability
mongod mongod
Key Range
0..50
Key Range
51..100
16
Scaling: Sharding
mongod mongod mongod mongod
Key Range
0..25
Key Range
26..50
Key Range
51..75
Key Range
76.. 100
Read/Write Scalability
How do I know I need to shard?
18
Does one server/replica…
• Have enough disk space to store
all my data?
• Handle my query throughput
(operations per second)?
• Respond to queries fast enough
(latency)?
19
• Have enough disk space to store
all my data?
• Handle my query throughput
(operations per second)?
• Respond to queries fast enough
(latency)?
Does one server/replica set…
Server Specs
Disk Capacity
Disk IOPS
RAM
Network
Disk IOPS
RAM
Network
How many shards do I need?
21
• Sum of disk space across shards > greater than
required storage size
Disk Space: How Many Shards Do I
Need?
22
• Sum of disk space across shards > greater than
required storage size
Disk Space: How Many Shards Do I
Need?
Example
Storage size = 3 TB
Server disk capacity = 2 TB
2 Shards Required
23
• Working set should fit in RAM
– Sum of RAM across shards > Working Set
• WorkSet = Indexes plus the set of documents
accessed frequently
• WorkSet in RAM 
– Shorter latency
– Higher Throughput
RAM: How Many Shards Do I Need?
24
• Measuring Index Size and Working Set
db.stats() – index size of each collection
db.serverStatus({ workingSet: 1}) – working
set size estimate
RAM: How Many Shards Do I Need?
25
• Measuring Index Size and Working Set
db.stats() – index size of each collection
db.serverStatus({ workingSet: 1}) – working
set size estimate
RAM: How Many Shards Do I Need?
Example
Working Set = 428 GB
Server RAM = 128 GB
428/128 = 3.34
4 Shards Required
26
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal
– Log entry?
• Best approach – build a prototype and measure
Disk Throughput: How Many Shards
Do I Need
27
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal
– Log entry?
• Best approach – build a prototype and measure
Disk Throughput: How Many Shards
Do I Need
Example
Required IOPS = 11000
Server disk IOPS = 5000
3 Shards Required
28
• S = ops/sec of a single server
• G = required ops/sec
• N = # of shards
• G = N * S * .7
N = G/.7S
OPS: How Many Shards Do I Need?
29
• S = ops/sec of a single server
• G = required ops/sec
• N = # of shards
• G = N * S * .7
N = G/.7S
OPS: How Many Shards Do I Need?
Sharding Overhead
30
• S = ops/sec of a single server
• G = required ops/sec
• N = # of shards
• G = N * S * .7
N = G/.7S
OPS: How Many Shards Do I Need?
Example
S = 4000
G = 10000
N = 3.57
4 Shards
Types of Sharding
32
• Range
• Tag-Aware
• Hashed
Sharding Types
33
Range Sharding
mongod mongod mongod mongod
Key Range
0..25
Key Range
26..50
Key Range
51..75
Key Range
76.. 100
Read/Write Scalability
34
Tag-Aware Sharding
mongod mongod mongod mongod
Shard Tags
Shard Tag Start End
Winter 23 Dec 21 Mar
Spring 22 Mar 21 Jun
Summer 21 Jun 23 Sep
Fall 24 Sep 22 Dec
Tag Ranges
Winter Spring Summer Fall
35
Hash-Sharding
mongod mongod mongod mongod
Hash Range
0000..4444
Hash Range
4445..8000
Hash Range
i8001..aaaa
Hash Range
aaab..ffff
36
Hashed shard key
• Pros:
– Evenly distributed writes
• Cons:
– Random data (and index) updates can be IO
intensive
– Range-based queries turn into scatter gather
Shard 1
mongos
Shard 2 Shard 3 Shard N
37
Range sharding document
distribution
38
Hashed sharding document
distribution
How do I Pick A Shard Key
40
Shard Key characteristics
• A good shard key has:
– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible
– scatter gather otherwise
• Choosing a good shard key is important!
– affects performance and scalability
– changing it later is expensive
41
Low cardinality shard key
• Induces "jumbo chunks"
• Examples: boolean field
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ a, b )
42
Ascending shard key
• Monotonically increasing shard key values
cause "hot spots" on inserts
• Examples: timestamps, _id
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ ISODate(…), $maxKe
Reasons to Shard
44
• Scale
– Data volume
– Query volume
• Global deployment with local writes
– Geography aware sharding
• Tiered Storage
• Fast backup restore
Reasons to shard
45
Global Deployment/Local Writes
Primary:NYC
Secondary:NYC
Primary:LON
Primary:SYD
Secondary:LON
Secondary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD
46
• Save hardware costs
• Put frequently accessed documents on fast
servers
– Infrequently accessed documents on less capable
servers
• Use Tag aware sharding
Tiered Storage
mongod mongod mongod mongod
Current Current Archive Archive
SSD SSD HDD HDD
47
• 40 TB Database
• 2 shards of 20 TB each
• Challenge
– Cannot meet restore SLA after data loss
Fast Restore
mongod mongod
20 TB 20 TB
48
• 40 TB Database
• 4 shards of 10 TB each
• Solution
– Reduce the restore time by 50%
Fast Restore
mongod mongod
10 TB 10 TB
mongod mongod
10 TB 10 TB
Summary
50
• To determine required # of shards determine
– Storage requirements
– Latency requirements
– Throughput requirements
• Derive total
– Disk capacity
– Disk throughput
– RAM
• Calculate # of shards based upon individual
server specs
Determining the # of shards
51
• Scalability
• Geo-aware clusters
• Tiered Storage
• Reduce backup restore times
Leverage Sharding For
52
• MongoDB Manual:
http://docs.mongodb.org/manual/sharding/
• Other Webinars:
– How to Achieve Scale With MongoDB
• White Papers
– MongoDB Performance Best Practices
– MongoDB Architecture Guide
Sharding: Where to go from here…
Get Expert Advice on Scaling. For Free.
For a limited time, if
you’re considering a
commercial
relationship with
MongoDB, you can
sign up for a free one
hour consult about
scaling with one of our
MongoDB Engineers.
Sign Up: http://bit.ly/1rkXcfN
54
Webinar Q&A
jay.runkel@mongodb.com
@jayrunkel
Stay tuned
after the
webinar and
take our
survey for your
chance to win
MongoDB
swag.
Thank You

Contenu connexe

Tendances

Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 

Tendances (20)

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Creating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized AggregatesCreating Continuously Up to Date Materialized Aggregates
Creating Continuously Up to Date Materialized Aggregates
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
Redis database
Redis databaseRedis database
Redis database
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 

En vedette

Basic Replication in MongoDB
Basic Replication in MongoDBBasic Replication in MongoDB
Basic Replication in MongoDB
MongoDB
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
MongoSF
 

En vedette (12)

Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Basic Replication in MongoDB
Basic Replication in MongoDBBasic Replication in MongoDB
Basic Replication in MongoDB
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at Netlog
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
 
Introduction to Sharding with MongoDB
Introduction to Sharding with MongoDBIntroduction to Sharding with MongoDB
Introduction to Sharding with MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 

Similaire à Sharding Methods for MongoDB

MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
MongoDB
 

Similaire à Sharding Methods for MongoDB (20)

Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Data Collection and Storage
Data Collection and StorageData Collection and Storage
Data Collection and Storage
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Sharding Methods for MongoDB

  • 1. Sharding Methods For MongoDB Jay Runkel jay.runkel@mongodb.com @jayrunkel #MongoDB
  • 2. 2 • Customer Stories • Sharding for Performance/Scale – When to shard? – How many shards do I need? • Types of Sharding • How to Pick a Shard Key • Sharding for Other Reasons Agenda
  • 4. 4
  • 5. 5 • 50M users. • 6B check-ins to date (6M per day growth). • 55M points of interest / venues. • 1.7M merchants using the platform for marketing • Operations Per Second: 300,000 • Documents: 5.5B Foursquare
  • 6. 6 • 11 MongoDB clusters – 8 are sharded • Largest cluster has 15 shards (check ins) – Sharded on user id Foursquare clusters
  • 7. 7 • Large data set CarFax
  • 8. 8 • 13 billion+ documents – 1.5 billion documents added every year • 1 vehicle history report is > 200 documents • 12 Shards • 9-node replica sets • Replicas distributed across 3 data centers CarFax Shards
  • 9. 9
  • 11. 12 Sharding Overview Primary Secondary Secondary Shard 1 Primary Secondary Secondary Shard 2 Primary Secondary Secondary Shard 3 Primary Secondary Secondary Shard N … Query Router Query Router Query Router …… Driver Application
  • 13. 15 Scaling: Sharding Read/Write Scalability mongod mongod Key Range 0..50 Key Range 51..100
  • 14. 16 Scaling: Sharding mongod mongod mongod mongod Key Range 0..25 Key Range 26..50 Key Range 51..75 Key Range 76.. 100 Read/Write Scalability
  • 15. How do I know I need to shard?
  • 16. 18 Does one server/replica… • Have enough disk space to store all my data? • Handle my query throughput (operations per second)? • Respond to queries fast enough (latency)?
  • 17. 19 • Have enough disk space to store all my data? • Handle my query throughput (operations per second)? • Respond to queries fast enough (latency)? Does one server/replica set… Server Specs Disk Capacity Disk IOPS RAM Network Disk IOPS RAM Network
  • 18. How many shards do I need?
  • 19. 21 • Sum of disk space across shards > greater than required storage size Disk Space: How Many Shards Do I Need?
  • 20. 22 • Sum of disk space across shards > greater than required storage size Disk Space: How Many Shards Do I Need? Example Storage size = 3 TB Server disk capacity = 2 TB 2 Shards Required
  • 21. 23 • Working set should fit in RAM – Sum of RAM across shards > Working Set • WorkSet = Indexes plus the set of documents accessed frequently • WorkSet in RAM  – Shorter latency – Higher Throughput RAM: How Many Shards Do I Need?
  • 22. 24 • Measuring Index Size and Working Set db.stats() – index size of each collection db.serverStatus({ workingSet: 1}) – working set size estimate RAM: How Many Shards Do I Need?
  • 23. 25 • Measuring Index Size and Working Set db.stats() – index size of each collection db.serverStatus({ workingSet: 1}) – working set size estimate RAM: How Many Shards Do I Need? Example Working Set = 428 GB Server RAM = 128 GB 428/128 = 3.34 4 Shards Required
  • 24. 26 • Sum of IOPS across shards > greater than required IOPS • IOPS are difficult to estimate – Update doc – Update indexes – Append to journal – Log entry? • Best approach – build a prototype and measure Disk Throughput: How Many Shards Do I Need
  • 25. 27 • Sum of IOPS across shards > greater than required IOPS • IOPS are difficult to estimate – Update doc – Update indexes – Append to journal – Log entry? • Best approach – build a prototype and measure Disk Throughput: How Many Shards Do I Need Example Required IOPS = 11000 Server disk IOPS = 5000 3 Shards Required
  • 26. 28 • S = ops/sec of a single server • G = required ops/sec • N = # of shards • G = N * S * .7 N = G/.7S OPS: How Many Shards Do I Need?
  • 27. 29 • S = ops/sec of a single server • G = required ops/sec • N = # of shards • G = N * S * .7 N = G/.7S OPS: How Many Shards Do I Need? Sharding Overhead
  • 28. 30 • S = ops/sec of a single server • G = required ops/sec • N = # of shards • G = N * S * .7 N = G/.7S OPS: How Many Shards Do I Need? Example S = 4000 G = 10000 N = 3.57 4 Shards
  • 30. 32 • Range • Tag-Aware • Hashed Sharding Types
  • 31. 33 Range Sharding mongod mongod mongod mongod Key Range 0..25 Key Range 26..50 Key Range 51..75 Key Range 76.. 100 Read/Write Scalability
  • 32. 34 Tag-Aware Sharding mongod mongod mongod mongod Shard Tags Shard Tag Start End Winter 23 Dec 21 Mar Spring 22 Mar 21 Jun Summer 21 Jun 23 Sep Fall 24 Sep 22 Dec Tag Ranges Winter Spring Summer Fall
  • 33. 35 Hash-Sharding mongod mongod mongod mongod Hash Range 0000..4444 Hash Range 4445..8000 Hash Range i8001..aaaa Hash Range aaab..ffff
  • 34. 36 Hashed shard key • Pros: – Evenly distributed writes • Cons: – Random data (and index) updates can be IO intensive – Range-based queries turn into scatter gather Shard 1 mongos Shard 2 Shard 3 Shard N
  • 37. How do I Pick A Shard Key
  • 38. 40 Shard Key characteristics • A good shard key has: – sufficient cardinality – distributed writes – targeted reads ("query isolation") • Shard key should be in every query if possible – scatter gather otherwise • Choosing a good shard key is important! – affects performance and scalability – changing it later is expensive
  • 39. 41 Low cardinality shard key • Induces "jumbo chunks" • Examples: boolean field Shard 1 mongos Shard 2 Shard 3 Shard N [ a, b )
  • 40. 42 Ascending shard key • Monotonically increasing shard key values cause "hot spots" on inserts • Examples: timestamps, _id Shard 1 mongos Shard 2 Shard 3 Shard N [ ISODate(…), $maxKe
  • 42. 44 • Scale – Data volume – Query volume • Global deployment with local writes – Geography aware sharding • Tiered Storage • Fast backup restore Reasons to shard
  • 44. 46 • Save hardware costs • Put frequently accessed documents on fast servers – Infrequently accessed documents on less capable servers • Use Tag aware sharding Tiered Storage mongod mongod mongod mongod Current Current Archive Archive SSD SSD HDD HDD
  • 45. 47 • 40 TB Database • 2 shards of 20 TB each • Challenge – Cannot meet restore SLA after data loss Fast Restore mongod mongod 20 TB 20 TB
  • 46. 48 • 40 TB Database • 4 shards of 10 TB each • Solution – Reduce the restore time by 50% Fast Restore mongod mongod 10 TB 10 TB mongod mongod 10 TB 10 TB
  • 48. 50 • To determine required # of shards determine – Storage requirements – Latency requirements – Throughput requirements • Derive total – Disk capacity – Disk throughput – RAM • Calculate # of shards based upon individual server specs Determining the # of shards
  • 49. 51 • Scalability • Geo-aware clusters • Tiered Storage • Reduce backup restore times Leverage Sharding For
  • 50. 52 • MongoDB Manual: http://docs.mongodb.org/manual/sharding/ • Other Webinars: – How to Achieve Scale With MongoDB • White Papers – MongoDB Performance Best Practices – MongoDB Architecture Guide Sharding: Where to go from here…
  • 51. Get Expert Advice on Scaling. For Free. For a limited time, if you’re considering a commercial relationship with MongoDB, you can sign up for a free one hour consult about scaling with one of our MongoDB Engineers. Sign Up: http://bit.ly/1rkXcfN
  • 52. 54 Webinar Q&A jay.runkel@mongodb.com @jayrunkel Stay tuned after the webinar and take our survey for your chance to win MongoDB swag.

Notes de l'éditeur

  1. MongoDB provides horizontal scale-out for databases using a technique called sharding, which is trans- parent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application. MongoDB supports three types of sharding: • Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range- based queries. • Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries. • Tag-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with shards. Users can optimize the physical location of documents for application requirements such as locating data in specific data centers. MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
  2. may consider hashing _id instead
  3. may consider hashing _id instead
  4. www.mongodb.com/lp/contact/scaling-101 http://www.mongodb.com/lp/contact/planning-for-scale