SlideShare une entreprise Scribd logo
1  sur  32
Ger Hartnett
Director of Technical Services (EMEA), MongoDB @ghartnett #MongoDB
Tales from the Field
Part three: Choosing the Right Shard Key for
High-Performance and Scale
Or:
●Cautionary Tales
●Don’t solve the wrong problems
●Bad schemas & shard keys hurt ops
too
●The main talk should take 30-35 minutes
●You can submit questions via the chat box
●We’ll answer as many as possible at the end
●We are recording and will send slides Friday
●This is the final webinar in a series of 3
Before we start
●You work in operations
●You work in development
●You have a MongoDB system in production
●You have contacted MongoDB Technical
Services (support)
●You attended an earlier webinar in the series
(part1, part2)
A quick poll - add a word to the
chat to let me know your
perspective
●We collect - observations about common
mistakes - to share the experience of many
●Names have been changed to protect the
(mostly) innocent
●No animals were harmed during the making
of this presentation (but maybe some DBAs
and engineers had light emotional scarring)
●While you might be new to MongoDB we
have deep experience that you can leverage
Stories
1. Discovering a DR flaw during a data
centre outage
2. Complex documents, memory and
an upgrade “surprise”
3. Wild success “uncovers” the wrong
shard key
The Stories (part three today)
Story #1: Quick Review
Story #1: Recovering from a
disaster
●Prospect in the process of signing up for a
subscription
●Called us late on Friday, data centre power
outage and 30+ (11 shards) servers down
●When they started bringing up the first
shard, the nodes crashed with data
corruption
●17TB of data, very little free disk space,
JOURNALLING DISABLED!
Recovering each shard
1.Start secondary
read only
2.Mount NFS
storage for repair
3.Repair former
primary node
4.Iterative rsync to
seed a secondary
Secondary
Primary
Secondary
Key takeaways for you
●If you are departing significantly from
standard config, check with us (i.e. if you
think journalling is a bad idea)
●Two DC in different buildings on different
flood plains, not in the path of the same
storm (i.e. secondaries in AWS)
●DR/backups are useless if you haven’t
tested them
Story #2: Complex documents,
memory and an upgrade “surprise”
●Well established ecommerce site selling
diverse goods in 20+ countries
●After switching to wired tiger in production,
performance dropped, this is the opposite of
what they were expecting
{
_id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... >
inventory: 423
}
Product Catalog: Original Schema
Key Takeaways
●When doing a major version/storage-engine
upgrade, test in staging with some
proportion of production data/workload
●Sometimes putting everything into one
document is counter productive
Story #3: Wild success uncovers the
wrong shard key
●Started out as error “[Balancer] caught
exception … tag ranges not valid for: db.coll”
●11 shards, they had added 2 new shards to
keep up traffic - 400+ databases
●Lots of code changes ahead of the Superbowl
●Spotted slow 300+s queries, decided to build
some indexes without telling us
●Went production down
Adding Shards
2 More Shards….
The “Golden Hammer” Tendency
Diagnosing the issues #1
●The red-herring hunt begins
●Transparent Huge Pages enabled -
production
●Chaotic call - 20 people talking at once, then
in the middle of the call everything started
working again
●Barrage of tickets and calls
●Connection storms
Using mtools to analyse logs - conn
churn
Diagnosing the issues #2
●Got inconsistent and missing log files
●Discovered repeated scatter-gather queries
returning the same results
●Secondary reads
●Heavy load on some shards and low disk
space
Insert load on two shards (from
Cloud Manager)
Diagnosing the issues #3
● Shard key - string with year/month & customer id
{
_id : ObjectId("4c4ba5e5e8aabf3"),
count: 1025,
changes: { … }
modified :
{ date : "2015_02",
customerId: 314159 }
}
Diagnosing the issues #4
●First heard about DDOS attack
●Missing tag ranges on some collections
●Stopped the balancer which reduced system
load from chunk moves
●Two clusters had a mongos each on the
same server
Fixing the issues
●Script to fix the tag ranges
●Proposed finer granularity shard key - but this
was not possible because of 30TB of data
●Moved mongos to dedicated servers
●Re-enable the balancer for short windows with
waitForDelete and secondaryThrottle
●Put together scripts to pre-split and move empty
chunks to quiet shards based on traffic from
month before
Monthly pre-split and move chunks
{ date : "2015_03",
customerId: min-500
customerId: 314159
customerId: 501-10000
customerId: 10001-300k
customerId: 300k-314158
customerId: 314160-max
The diagnosis in retrospect
●The outage did not appear to have been related
to either the invalid tag ranges or the earlier
failed moves
●The step downs did not help resolve the outage
but did highlight some queries that need to be
fixed
●The DDoS was the ultimate cause of the outage
- lead to diagnosis of deeper issues
●The deepest issue was the shard key
Aftermath and lessons learned
●Signed up for a Named TSE
●Now doing pre-split and move before the
end of every month
●Check before making other changes (i.e.
building new indexes)
Key takeaways for you
●Choosing a shard key is a pivotal decision -
make it carefully
●Understand current bottleneck
●Monitor insert distribution and chunk ranges
●Look for slow queries (logs & mtools)
●Run mongos, mongod, config server on
dedicated server or use containers/cgroups
Further Reading
Production notes
docs.mongodb.org/manual/administration/production-notes
Mtools
github.com/rueckstiess/mtools
Previous Webinars
mongodb.com/presentations
Ger Hartnett
Director Technical Services (EMEA), MongoDB
@ghartnett #MongoDB
Questions?
●You can submit questions via the chat box
●We are recording and will send slides Friday
Questions
Code GerHartnett gets 25% discount

Contenu connexe

Tendances

Tendances (20)

MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
 
Securing Your Enterprise Web Apps with MongoDB Enterprise
Securing Your Enterprise Web Apps with MongoDB Enterprise Securing Your Enterprise Web Apps with MongoDB Enterprise
Securing Your Enterprise Web Apps with MongoDB Enterprise
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 

En vedette

En vedette (16)

MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard key
 
Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
 
Step-by-Step Parse Migration
Step-by-Step Parse MigrationStep-by-Step Parse Migration
Step-by-Step Parse Migration
 
Webinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB CompassWebinar: The Visual Query Profiler and MongoDB Compass
Webinar: The Visual Query Profiler and MongoDB Compass
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Distributed Consensus in MongoDB's Replication System
Distributed Consensus in MongoDB's Replication SystemDistributed Consensus in MongoDB's Replication System
Distributed Consensus in MongoDB's Replication System
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
 
How We Test MongoDB: Evergreen
How We Test MongoDB: EvergreenHow We Test MongoDB: Evergreen
How We Test MongoDB: Evergreen
 
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
MongoDB Europe 2016 - Powering Microservices with Docker, Kubernetes, and Kafka
MongoDB Europe 2016 - Powering Microservices with Docker, Kubernetes, and KafkaMongoDB Europe 2016 - Powering Microservices with Docker, Kubernetes, and Kafka
MongoDB Europe 2016 - Powering Microservices with Docker, Kubernetes, and Kafka
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDB
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 

Similaire à Webinar: Choosing the Right Shard Key for High Performance and Scale

Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
MongoDB
 
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
confluent
 

Similaire à Webinar: Choosing the Right Shard Key for High Performance and Scale (20)

MongoDB Days UK: Tales from the Field
MongoDB Days UK: Tales from the FieldMongoDB Days UK: Tales from the Field
MongoDB Days UK: Tales from the Field
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Mongodb
MongodbMongodb
Mongodb
 
OSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean GabèsOSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean Gabès
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
 
LAS16-TR03: Upstreaming 201
LAS16-TR03: Upstreaming 201LAS16-TR03: Upstreaming 201
LAS16-TR03: Upstreaming 201
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Webinar: Tales from the Field - 48 Hours to Data Centre Recovery
Webinar: Tales from the Field - 48 Hours to Data Centre RecoveryWebinar: Tales from the Field - 48 Hours to Data Centre Recovery
Webinar: Tales from the Field - 48 Hours to Data Centre Recovery
 
Gab 2018 seguridad y escalado en azure service fabric
Gab 2018   seguridad y escalado en azure service fabricGab 2018   seguridad y escalado en azure service fabric
Gab 2018 seguridad y escalado en azure service fabric
 
Gab 2018 seguridad y escalado en azure service fabric
Gab 2018   seguridad y escalado en azure service fabricGab 2018   seguridad y escalado en azure service fabric
Gab 2018 seguridad y escalado en azure service fabric
 
There is something about serverless
There is something about serverlessThere is something about serverless
There is something about serverless
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Introduction to MDC Logging in Scala.pdf
Introduction to MDC Logging in Scala.pdfIntroduction to MDC Logging in Scala.pdf
Introduction to MDC Logging in Scala.pdf
 
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"
 

Plus de MongoDB

Plus de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Dernier

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Dernier (20)

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

Webinar: Choosing the Right Shard Key for High Performance and Scale

  • 1. Ger Hartnett Director of Technical Services (EMEA), MongoDB @ghartnett #MongoDB Tales from the Field Part three: Choosing the Right Shard Key for High-Performance and Scale
  • 2. Or: ●Cautionary Tales ●Don’t solve the wrong problems ●Bad schemas & shard keys hurt ops too
  • 3. ●The main talk should take 30-35 minutes ●You can submit questions via the chat box ●We’ll answer as many as possible at the end ●We are recording and will send slides Friday ●This is the final webinar in a series of 3 Before we start
  • 4. ●You work in operations ●You work in development ●You have a MongoDB system in production ●You have contacted MongoDB Technical Services (support) ●You attended an earlier webinar in the series (part1, part2) A quick poll - add a word to the chat to let me know your perspective
  • 5. ●We collect - observations about common mistakes - to share the experience of many ●Names have been changed to protect the (mostly) innocent ●No animals were harmed during the making of this presentation (but maybe some DBAs and engineers had light emotional scarring) ●While you might be new to MongoDB we have deep experience that you can leverage Stories
  • 6. 1. Discovering a DR flaw during a data centre outage 2. Complex documents, memory and an upgrade “surprise” 3. Wild success “uncovers” the wrong shard key The Stories (part three today)
  • 8. Story #1: Recovering from a disaster ●Prospect in the process of signing up for a subscription ●Called us late on Friday, data centre power outage and 30+ (11 shards) servers down ●When they started bringing up the first shard, the nodes crashed with data corruption ●17TB of data, very little free disk space, JOURNALLING DISABLED!
  • 9. Recovering each shard 1.Start secondary read only 2.Mount NFS storage for repair 3.Repair former primary node 4.Iterative rsync to seed a secondary Secondary Primary Secondary
  • 10. Key takeaways for you ●If you are departing significantly from standard config, check with us (i.e. if you think journalling is a bad idea) ●Two DC in different buildings on different flood plains, not in the path of the same storm (i.e. secondaries in AWS) ●DR/backups are useless if you haven’t tested them
  • 11. Story #2: Complex documents, memory and an upgrade “surprise” ●Well established ecommerce site selling diverse goods in 20+ countries ●After switching to wired tiger in production, performance dropped, this is the opposite of what they were expecting
  • 12. { _id: 375 en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > inventory: 423 } Product Catalog: Original Schema
  • 13. Key Takeaways ●When doing a major version/storage-engine upgrade, test in staging with some proportion of production data/workload ●Sometimes putting everything into one document is counter productive
  • 14. Story #3: Wild success uncovers the wrong shard key ●Started out as error “[Balancer] caught exception … tag ranges not valid for: db.coll” ●11 shards, they had added 2 new shards to keep up traffic - 400+ databases ●Lots of code changes ahead of the Superbowl ●Spotted slow 300+s queries, decided to build some indexes without telling us ●Went production down
  • 15. Adding Shards 2 More Shards….
  • 17. Diagnosing the issues #1 ●The red-herring hunt begins ●Transparent Huge Pages enabled - production ●Chaotic call - 20 people talking at once, then in the middle of the call everything started working again ●Barrage of tickets and calls ●Connection storms
  • 18. Using mtools to analyse logs - conn churn
  • 19. Diagnosing the issues #2 ●Got inconsistent and missing log files ●Discovered repeated scatter-gather queries returning the same results ●Secondary reads ●Heavy load on some shards and low disk space
  • 20. Insert load on two shards (from Cloud Manager)
  • 21. Diagnosing the issues #3 ● Shard key - string with year/month & customer id { _id : ObjectId("4c4ba5e5e8aabf3"), count: 1025, changes: { … } modified : { date : "2015_02", customerId: 314159 } }
  • 22.
  • 23. Diagnosing the issues #4 ●First heard about DDOS attack ●Missing tag ranges on some collections ●Stopped the balancer which reduced system load from chunk moves ●Two clusters had a mongos each on the same server
  • 24. Fixing the issues ●Script to fix the tag ranges ●Proposed finer granularity shard key - but this was not possible because of 30TB of data ●Moved mongos to dedicated servers ●Re-enable the balancer for short windows with waitForDelete and secondaryThrottle ●Put together scripts to pre-split and move empty chunks to quiet shards based on traffic from month before
  • 25. Monthly pre-split and move chunks { date : "2015_03", customerId: min-500 customerId: 314159 customerId: 501-10000 customerId: 10001-300k customerId: 300k-314158 customerId: 314160-max
  • 26. The diagnosis in retrospect ●The outage did not appear to have been related to either the invalid tag ranges or the earlier failed moves ●The step downs did not help resolve the outage but did highlight some queries that need to be fixed ●The DDoS was the ultimate cause of the outage - lead to diagnosis of deeper issues ●The deepest issue was the shard key
  • 27. Aftermath and lessons learned ●Signed up for a Named TSE ●Now doing pre-split and move before the end of every month ●Check before making other changes (i.e. building new indexes)
  • 28. Key takeaways for you ●Choosing a shard key is a pivotal decision - make it carefully ●Understand current bottleneck ●Monitor insert distribution and chunk ranges ●Look for slow queries (logs & mtools) ●Run mongos, mongod, config server on dedicated server or use containers/cgroups
  • 30. Ger Hartnett Director Technical Services (EMEA), MongoDB @ghartnett #MongoDB Questions?
  • 31. ●You can submit questions via the chat box ●We are recording and will send slides Friday Questions
  • 32. Code GerHartnett gets 25% discount

Notes de l'éditeur

  1. Field/Trenches
  2. What do the rest of you do?
  3. What do the rest of you do?
  4. Some borrowed, some merged into a single narrative Some of the people that inspired them may well be here in this room today
  5. Bill's Bulk Updates randomly affected an ever larger data set. In order to cope with the database size, Bill added more shards. The cluster scaled linearly, as intended.
  6. Bill's Bulk Updates randomly affected an ever larger data set. In order to cope with the database size, Bill added more shards. The cluster scaled linearly, as intended.
  7. Just because you can add horizontal capacity, does not mean it is the optimum solution
  8. Imagine that the sample rate was going to go from once a minute to once every 5 seconds
  9. Bill's Bulk Updates randomly affected an ever larger data set. In order to cope with the database size, Bill added more shards. The cluster scaled linearly, as intended.
  10. Imagine that the sample rate was going to go from once a minute to once every 5 seconds
  11. Just because you can add horizontal capacity, does not mean it is the optimum solution
  12. Imagine that the sample rate was going to go from once a minute to once every 5 seconds
  13. Imagine that the sample rate was going to go from once a minute to once every 5 seconds
  14. fast to move empty chunks
  15. Just because you can add horizontal capacity, does not mean it is the optimum solution
  16. We had actually told them to check with us before shard key but found out 3 months after they put it in production
  17. Just because you can add horizontal capacity, does not mean it is the optimum solution
  18. https://www.mongodb.com/presentations/webinar-48-hours-to-data-centre-recovery https://www.mongodb.com/presentations/webinar-avoiding-sub-optimal-performance-in-your-retail-application
  19. Field/Trenches
  20. What do the rest of you do?