SlideShare une entreprise Scribd logo
1  sur  23
MongoDB Chunks
Jason Terpko
MongoDB Chunks – Distribution, Splitting, and Merging
NoSQL DBA, Rackspace/ObjectRocket
www.linkedin.com/in/jterpko, jason.terpko@rackspace.com
My Story
• Started out in relational databases in public education then financial
services
• Next came online media distribution combined with a paywall
• For analytics, started working with columnar databases and engines with
compression
• Made the switch to NoSQL at ObjectRocket by Rackspace
Overview
• MongoDB and Sharding
• What is a chunk in MongoDB?
• Chunk Distribution and Scaling MongoDB
• Use Cases For Splitting
• Use Cases For Merging
• Reconsidering Your Shard Key
Documents
• Common MongoDB BSON Types
• ObjectId
• String
• Integer
• Double
• NumberLong
• Date
• Boolean
• Null
• Array
{
"_id" : ObjectId("570680d891442f6efaff2005"),
"n" : "John Doe",
"a" : 45,
"h" : 5.9,
"w" : NumberLong(165),
"u" : ISODate("2016-01-07T15:46:32.085Z"),
"ac" : true,
"nn" : null,
"z" : [
"10013",
"10018"
]
}
MongoDB Sharded Cluster
s1 s2
What is a Chunk?
config.chunks
{
"_id" : "mydb.mycoll-uuid_"00005cf6-1217-4414-935b-
cf1bde09cc77"",
"lastmod" : Timestamp(1, 5),
"lastmodEpoch" : ObjectId("570733145f2bf94777a62155"),
"ns" : "mydb.mycoll",
"min" : {
"uuid" : "00005cf6-1217-4414-935b-cf1bde09cc77"
},
"max" : {
"uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e"
},
"shard" : ”s1"
}
Use Cases For Splitting (Case 1)
s1 s2
80% 20%
Unbalanced
Operations
Use Cases For Splitting (Case 1)
Profiler Example :
db.system.profile.aggregate([
{$match: { $and: [ {op:"update"}, {ns : "mydb.mycoll"} ] }}, {$group: { "_id":"$query.uuid",
count:{$sum:1}}},
{$sort: {"count": -1}}, {$limit : 5 } ]);
{
"result" : [
{
"_id" : "00005cf6-1217-4414-935b-
cf1bde09cc77",
"count" : 28672
},
..........
],
"ok" : 1
}
Use Cases For Splitting (Case 1)
{
"_id" : "mydb.mycoll-uuid_"00005cf6-1217-4414-935b-cf1bde09cc77"",
"lastmod" : Timestamp(1, 5),
"lastmodEpoch" : ObjectId("570733145f2bf94777a62155"),
"ns" : "mydb.mycoll",
"min" : {
"uuid" : "00005cf6-1217-4414-935b-cf1bde09cc77"
},
"max" : {
"uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e"
},
"shard" : ”s1"
}
{ “uuid” : “7800e10e-273d-4186-bca7-9e3f6647d33a”, …
{ “uuid” : “7801a55e-e326-4cdb-a5fb-a5d133016f13” , …
{ “uuid” : “7801c3d5-8506-4715-934b-ad66904ae01e” , …
{ “uuid” : “7801ff57-50b7-4505-9557-06058cc8ff80” , …
sh.splitAt(
)
Unbalanced
Operations
Moving the Chunks
db.runCommand({
"moveChunk" : "mydb.mycoll",
"bounds" : [
{
"uuid" : "7fe55742-7879-44bf-9a00-462a0284c982"
},
{
"uuid" : "ffffe5bc-d04d-4fcf-b8e2-283fa5998079"
}
],
"to" : "s2",
"_secondaryThrottle" : true
});
Example Command:
Use Cases For Splitting (Case 2)
s1 s2
s1 s2
Unbalanced
Resources
Use Cases For Splitting (Case 2)
Unbalanced Resources – Jumbo
Chunks
• Requirements
• Larger than db.settings.find({"_id" :
"chunksize"})
• Or contains 250,000 or more documents
• Causes
• Importing of data
• Config metadata lock
• Compound Shard Key
• Identifying
• sh.status(true)
• moveChunk()
• count() or Aggregation Pipeline
• ObjectRocket Utility : ChunkHunter.py*
• Rectifying
• sh.splitAt() and sh.splitFind()
• ObjectRocket Utilities : ChunkManager.py*
• Data reduction
• Re-sharding the collection
*https://github.com/objectrocket/Utils
Splitting : Identifying and Rectifying
Aggregation Example:
ObjectRocket Utils :
ChunkManager.py ChunkHunter.py ChunkSplitter.pyCn
t
exit
db.mycoll.aggregate([{$group: {"_id": {"uuid": "$uuid"}, "count": {$sum: 1}}},{$sort: {"count": -1}},{$lim
db.adminCommand({ dataSize: "mydb.mycoll", keyPattern: { "uuid" : 1 },
min: { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" },
max: { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" } })
Data Size:
Use Cases For Splitting (Case 3)
s1 s2
100%
New and Small
Collections
Use Cases For Splitting (Case 3)
New and Small Collections
• Considerations
– Workload
– Shard Key
– Shard Count
– Current Collection Size
– Expected Collection Size
• Command
db.runCommand( { shardCollection: "mydb.mycoll", key: { ”uuid": "hashed" }, numInitialChunks :
1024 } )
• Calculating The Value
(size_in_mb / chunk_size) * 2
Use Cases For Merging
s1 s2
80% 20%
s1 s2
Unbalanced
Operations
Unbalanced
Resources
Empty Chunks
{ "uuid" : "00005cf6-....." } -->> { "uuid" : "7fe55637-....." } on :
s1
100K Docs
@ 45MB
{ "uuid" : "7fe55637-....." } -->> { "uuid" : "7fe55742-....." } on :
s2
0 Docs
@ 0MB
Use Cases For Merging (Deletes)
Unexpected remove()
Operations
insert()
& split
insert()
& split
insert() -> split ->
remove()
insert() -> split ->
remove()
s1 s2 s3 s4
Merging Process
How we have resolved this with JavaScript in past:
1. Check Balancer State
2. Read in empty chunks from ChunkHunter.py results
collection
3. Locate adjacent chunk
4. If empty and adjacent chunks reside on the same shard,
merge
5. Else move chunk to the shard with the adjacent chunk, then
merge
What have we learned from the current process?
Use Cases For Merging (TTL)
TTL : db.mycoll.ensureIndex( { "ca": 1 }, { expireAfterSeconds:
5227200 } )
splitVector
& split
moveChunk()
s1 s2 s3 s4
Chunk Size and Splits
How frequently are chunks being split?
• Global Change
• split & splitVector
• Decreasing the size
• Increased the size
Review Your Shard Key
How frequently does this occur?
• Is this a re-occurring problem?
• What impact does it have to your business?
• Re-analyzing your structure, workload, and access patterns
• What method will you use to re-shard a sharded collection?
ObjectRocket by Rackspace
Questions?
• MongoDB
• Sharding
• Chunks
• Splitting Chunks
• Merging Chunks
• ObjectRocket
• Rackspace

Contenu connexe

Tendances

New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
 
Using MongoDB with Kafka - Use Cases and Best Practices
Using MongoDB with Kafka -  Use Cases and Best PracticesUsing MongoDB with Kafka -  Use Cases and Best Practices
Using MongoDB with Kafka - Use Cases and Best PracticesAntonios Giannopoulos
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4 Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4 MongoDB
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard keyMongoDB
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийGeeksLab Odessa
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBMongoDB
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityMongoDB
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesronwarshawsky
 
Elasticsearch War Stories
Elasticsearch War StoriesElasticsearch War Stories
Elasticsearch War StoriesArno Broekhof
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовGeeksLab Odessa
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryAlexandre Morgaut
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for AnalyticsMongoDB
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsMongoDB
 

Tendances (20)

New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
 
Using MongoDB with Kafka - Use Cases and Best Practices
Using MongoDB with Kafka -  Use Cases and Best PracticesUsing MongoDB with Kafka -  Use Cases and Best Practices
Using MongoDB with Kafka - Use Cases and Best Practices
 
Unqlite
UnqliteUnqlite
Unqlite
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4 Advanced Sharding Features in MongoDB 2.4
Advanced Sharding Features in MongoDB 2.4
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard key
 
Tag based sharding presentation
Tag based sharding presentationTag based sharding presentation
Tag based sharding presentation
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategies
 
Elasticsearch War Stories
Elasticsearch War StoriesElasticsearch War Stories
Elasticsearch War Stories
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love Story
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
JEEConf. Vanilla java
JEEConf. Vanilla javaJEEConf. Vanilla java
JEEConf. Vanilla java
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
 

Similaire à MongoDB Chunks - Distribution, Splitting, and Merging

MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
 
Mongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongoDB
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeMongoDB
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨flyinweb
 
rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!Yury Bushmelev
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Mongo Sharding: Case Study
Mongo Sharding: Case StudyMongo Sharding: Case Study
Mongo Sharding: Case StudyWill Button
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?SegFaultConf
 
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injectionStHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injectionStHack
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Big Data Spain
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkMongoDB
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators iammutex
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!François-Guillaume Ribreau
 
marko_go_in_badoo
marko_go_in_badoomarko_go_in_badoo
marko_go_in_badooMarko Kevac
 

Similaire à MongoDB Chunks - Distribution, Splitting, and Merging (20)

MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
 
Mongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongodb debugging-performance-problems
Mongodb debugging-performance-problems
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!rsyslog v8: more than just syslog!
rsyslog v8: more than just syslog!
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Mongo Sharding: Case Study
Mongo Sharding: Case StudyMongo Sharding: Case Study
Mongo Sharding: Case Study
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
 
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injectionStHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Letgo Data Platform: A global overview
Letgo Data Platform: A global overviewLetgo Data Platform: A global overview
Letgo Data Platform: A global overview
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!
 
marko_go_in_badoo
marko_go_in_badoomarko_go_in_badoo
marko_go_in_badoo
 

Dernier

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

MongoDB Chunks - Distribution, Splitting, and Merging

  • 1. MongoDB Chunks Jason Terpko MongoDB Chunks – Distribution, Splitting, and Merging NoSQL DBA, Rackspace/ObjectRocket www.linkedin.com/in/jterpko, jason.terpko@rackspace.com
  • 2. My Story • Started out in relational databases in public education then financial services • Next came online media distribution combined with a paywall • For analytics, started working with columnar databases and engines with compression • Made the switch to NoSQL at ObjectRocket by Rackspace
  • 3. Overview • MongoDB and Sharding • What is a chunk in MongoDB? • Chunk Distribution and Scaling MongoDB • Use Cases For Splitting • Use Cases For Merging • Reconsidering Your Shard Key
  • 4. Documents • Common MongoDB BSON Types • ObjectId • String • Integer • Double • NumberLong • Date • Boolean • Null • Array { "_id" : ObjectId("570680d891442f6efaff2005"), "n" : "John Doe", "a" : 45, "h" : 5.9, "w" : NumberLong(165), "u" : ISODate("2016-01-07T15:46:32.085Z"), "ac" : true, "nn" : null, "z" : [ "10013", "10018" ] }
  • 6. What is a Chunk? config.chunks { "_id" : "mydb.mycoll-uuid_"00005cf6-1217-4414-935b- cf1bde09cc77"", "lastmod" : Timestamp(1, 5), "lastmodEpoch" : ObjectId("570733145f2bf94777a62155"), "ns" : "mydb.mycoll", "min" : { "uuid" : "00005cf6-1217-4414-935b-cf1bde09cc77" }, "max" : { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" }, "shard" : ”s1" }
  • 7. Use Cases For Splitting (Case 1) s1 s2 80% 20% Unbalanced Operations
  • 8. Use Cases For Splitting (Case 1) Profiler Example : db.system.profile.aggregate([ {$match: { $and: [ {op:"update"}, {ns : "mydb.mycoll"} ] }}, {$group: { "_id":"$query.uuid", count:{$sum:1}}}, {$sort: {"count": -1}}, {$limit : 5 } ]); { "result" : [ { "_id" : "00005cf6-1217-4414-935b- cf1bde09cc77", "count" : 28672 }, .......... ], "ok" : 1 }
  • 9. Use Cases For Splitting (Case 1) { "_id" : "mydb.mycoll-uuid_"00005cf6-1217-4414-935b-cf1bde09cc77"", "lastmod" : Timestamp(1, 5), "lastmodEpoch" : ObjectId("570733145f2bf94777a62155"), "ns" : "mydb.mycoll", "min" : { "uuid" : "00005cf6-1217-4414-935b-cf1bde09cc77" }, "max" : { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" }, "shard" : ”s1" } { “uuid” : “7800e10e-273d-4186-bca7-9e3f6647d33a”, … { “uuid” : “7801a55e-e326-4cdb-a5fb-a5d133016f13” , … { “uuid” : “7801c3d5-8506-4715-934b-ad66904ae01e” , … { “uuid” : “7801ff57-50b7-4505-9557-06058cc8ff80” , … sh.splitAt( ) Unbalanced Operations
  • 10. Moving the Chunks db.runCommand({ "moveChunk" : "mydb.mycoll", "bounds" : [ { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" }, { "uuid" : "ffffe5bc-d04d-4fcf-b8e2-283fa5998079" } ], "to" : "s2", "_secondaryThrottle" : true }); Example Command:
  • 11. Use Cases For Splitting (Case 2) s1 s2 s1 s2 Unbalanced Resources
  • 12. Use Cases For Splitting (Case 2) Unbalanced Resources – Jumbo Chunks • Requirements • Larger than db.settings.find({"_id" : "chunksize"}) • Or contains 250,000 or more documents • Causes • Importing of data • Config metadata lock • Compound Shard Key • Identifying • sh.status(true) • moveChunk() • count() or Aggregation Pipeline • ObjectRocket Utility : ChunkHunter.py* • Rectifying • sh.splitAt() and sh.splitFind() • ObjectRocket Utilities : ChunkManager.py* • Data reduction • Re-sharding the collection *https://github.com/objectrocket/Utils
  • 13. Splitting : Identifying and Rectifying Aggregation Example: ObjectRocket Utils : ChunkManager.py ChunkHunter.py ChunkSplitter.pyCn t exit db.mycoll.aggregate([{$group: {"_id": {"uuid": "$uuid"}, "count": {$sum: 1}}},{$sort: {"count": -1}},{$lim db.adminCommand({ dataSize: "mydb.mycoll", keyPattern: { "uuid" : 1 }, min: { "uuid" : "7fe55637-74c0-4e51-8eed-ab6b411d2b6e" }, max: { "uuid" : "7fe55742-7879-44bf-9a00-462a0284c982" } }) Data Size:
  • 14. Use Cases For Splitting (Case 3) s1 s2 100% New and Small Collections
  • 15. Use Cases For Splitting (Case 3) New and Small Collections • Considerations – Workload – Shard Key – Shard Count – Current Collection Size – Expected Collection Size • Command db.runCommand( { shardCollection: "mydb.mycoll", key: { ”uuid": "hashed" }, numInitialChunks : 1024 } ) • Calculating The Value (size_in_mb / chunk_size) * 2
  • 16. Use Cases For Merging s1 s2 80% 20% s1 s2 Unbalanced Operations Unbalanced Resources Empty Chunks { "uuid" : "00005cf6-....." } -->> { "uuid" : "7fe55637-....." } on : s1 100K Docs @ 45MB { "uuid" : "7fe55637-....." } -->> { "uuid" : "7fe55742-....." } on : s2 0 Docs @ 0MB
  • 17. Use Cases For Merging (Deletes) Unexpected remove() Operations insert() & split insert() & split insert() -> split -> remove() insert() -> split -> remove() s1 s2 s3 s4
  • 18. Merging Process How we have resolved this with JavaScript in past: 1. Check Balancer State 2. Read in empty chunks from ChunkHunter.py results collection 3. Locate adjacent chunk 4. If empty and adjacent chunks reside on the same shard, merge 5. Else move chunk to the shard with the adjacent chunk, then merge What have we learned from the current process?
  • 19. Use Cases For Merging (TTL) TTL : db.mycoll.ensureIndex( { "ca": 1 }, { expireAfterSeconds: 5227200 } ) splitVector & split moveChunk() s1 s2 s3 s4
  • 20. Chunk Size and Splits How frequently are chunks being split? • Global Change • split & splitVector • Decreasing the size • Increased the size
  • 21. Review Your Shard Key How frequently does this occur? • Is this a re-occurring problem? • What impact does it have to your business? • Re-analyzing your structure, workload, and access patterns • What method will you use to re-shard a sharded collection?
  • 23. Questions? • MongoDB • Sharding • Chunks • Splitting Chunks • Merging Chunks • ObjectRocket • Rackspace