SlideShare a Scribd company logo
1 of 32
Download to read offline
Mike Kania
Production Engineer @ Parse
Benchmarking, Load
Testing, and Preventing
Terrible Disasters
What Parse Does
We have 500k+ apps running on Parse.
Provide services to —
•Store user data
•Run server side JavaScript
•Send push notifications
•Handle crash reporting
•Generate analytics
Parse + MongoDB
• Use many of MongoDB’s feature set
• Support almost every type of workload you can
imagine
•Millions of collections and indexes
• new ones being created every minute
•Run MongoDB exclusively on AWS
•We do crazy things with MongoDB
Why Should You Listen
to Me?
• Parse has one of the most complex MongoDB
infrastructures(in the world?)
• Started using MongoDB in 1.8
• Upgraded 2.6 everywhere 6 months ago
• We have some battle wounds from upgrading
MongoDB to pass on to you
Why Shouldn’t You
Listen to Me?
MongoDB is a jack of all trades, and
there’s certain features that we haven’t
touched.
•Sharding — We built our own way
to shard data
•Aggregation/Map Reduce — We
don’t touch this at all
History of MongoDB
Upgrades at Parse
1.8 2.0 2.2 2.4 2.6 3.0
{Doitlive
Cowboy Upgrade
1. Review “Upgrade Requirements” and
known bugs in JIRA
2. Run intigration/unit tests agains the
new version
3. Spin up a hidden secondary. Watch for
problems
4. Unhide SECONDARY.. Watch for
problems
5. Promote to PRIMARY
6. Declare success! Oh wait I mean
watch for problems.
What Went Wrong
• 60% perf reduction
• all geo indexes block global
lock until the first document
found
• unindexable writes suddenly
refused
• changed the definition of
scan limits,
A New Approach
1.8 2.0 2.2 2.4 2.6 3.0
{
{
Doitlive
Doitwith
production
workloads
in a
test environment
Flashback
• Open sourced benchmarking
tool specifically for MongoDB
• Captures production
workloads
• Replay those workloads
over and over again with
configurable speeds
• Recently merged a pull request
to support load testing with
Mongo sharing
Record
Get the config setup:
•oplog_server: A secondary that will be used to
tail the oplog for write operations
•profiler_server: The primary in the target replica
set to capture profiling data
•duration_sec: Defines how long you want to
record
Enable Profiling
• Keep in mind, it does an additional write for every
operation.
•./set_mongo_profiling.py -a enable -n
$PRIMARY_HOSTNAME
Moar Better Recording
• What about just capturing it over the wire?
• Maybe use mongosniff
• MongoDB has a built in pcap library.
• Enter mongocaputils
• Also open source
• Still a little buggy
Running the Record
./record.py
Creating a Consistent
Snapshot
Need a way to quickly capture a consistent
snapshot of your dataset
We use EBS snapshots,
•locking mongod
•creating an EBS snapshot of all the RAIDed
volumes on /var/lib/mongodb
•unlocking mongod.
Quickly Replaying
Workloads
•Pre-Warming EBS snapshots after each run is slow
and time consuming
•Pulling down the blocks from S3 takes hours or
days if you have terabytes of data.
•We decided to use LVM on top of EBS
•Does incur I/O overhead
•Allows us to do LVM snapshots!
How we used LVM
Define a restore point before benchmarking
•lvcreate -l 10%VG -s -n restore_point /dev/
mongovg/mongoraid
Merge Copy-on-Write logical volume to rollback
•Stop MongoDB
•Unmount Filesystem
•lvconvert –merge /dev/mongovg/restore_point
Creating the Test
Environment
• Spin up new EC2 instance and restore the EBS
volumes from snapshot
•New EBS volumes need to be pre-warmed.
Blocks are lazily loaded from S3
• Benchmark server which will run Flashback
request and has the workload on disk.
•Nothing specials needs to happen here
Benchmarking New
Shiny Storage Engines
In MongoDB 3.0, each storage engine has a
different on-disk format
So we also need to run an initial sync of each
new storage engine against our restored
MMAPv1 backup, and then run benchmarks
on each format.
MMAPv1
(restored from
snapshot)
RocksDB
WiredTiger
initial sync
initial sync
Side Note: The Storage
Efficiency of the RocksDB/
WiredTiger is Amazing*
*You should totally check out the “Storage Engine Wars” talk
by Charity Majors and Igor Canadi
0
1,000
2,000
3,000
4,000
283GB318GB
3,245GB
MMAPv1 WiredTiger RocksDB
Running the Replay
• Two styles to replay: real and
stress
flashback 
-ops_filename=OUTPUT 
-style=real 
-url=$MONGO_HOST:27017 
-workers=50
MongoDB 2.6
MMAPv1
MongoDB 3.0
MMAPv1
MongoDB 3.0
RocksDB
Flashback
Metrics Gathering
• Flashback percentile latencies broken down by
operation type.
• Useful from a high level
• Not so useful when diving into query regressions
Logging Pipeline
• Mongo logs are hard to parse.
• Thankfully you don’t need to worry about it
• Just use our open source PEG parser
mongologtools
• Ship JSON via Scribe to an internal Facebook
data diving tool
First Results
Op
2.6
MMAPv1
3.0
MMAPv1
3.0
RockDB
query 2.93ms 4.43ms 3.04ms
p50 Query Latency
Op
2.6
MMAPv1
3.0
MMAPv1
3.0 RockDB
query 177.41ms
619471.47m
s
1441442.26
ms
p99 Query Latency
First Regression
•Regression in $nearSphere
queries just for 3.0
•SERVER-17469 — patched in
3.0.2
• After the fix average latency for
$nearSphere went from
•2354 ms to 35 ms
More Ad-Hoc Analysis
MMAPv1RocksDB
# documents scanned
durationmsdurationms
# documents scanned
P99 Latency
query
insert
remove
update
findandmodify
count
0ms 10ms 20ms 30ms 40ms
1
5
1
1
0
2
0
28
1
22
23
2
0
15
11
21
32
8
2.6 MMAPv1
3.0 MMAPv1
3.0 RockDB
Some time later…
Benchmarks Won’t
Find Everything
•[RocksDB] Prefix collision could happen between
restarts
https://github.com/mongodb-partners/mongo/
commit/
da8a90b3b71bf291684ffc5a6d2fd32118ce1a7b
•[MongoDB] Secondary reads block replication
https://jira.mongodb.org/browse/SERVER-18190
Where are we now with
testing 3.0?
• MongoDB 3.0 with RocksDB is serving some
production traffic and it looks amazing.
milliseconds
API Request
Linkage
• Flashback
• https://github.com/ParsePlatform/flashback
• Mongologtools
• https://github.com/tmc/mongologtools
• MongoDB 3.0 Benchmarking Results
• http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-
so-fast-it-makes-your-head-spin/
• nearSphere regression
• https://jira.mongodb.org/browse/SERVER-17469
• WT/RocksDB secondary crash
• https://jira.mongodb.org/browse/SERVER-17882

More Related Content

What's hot

High Performance MongoDB on Storage-Optimized AWS EC2
High Performance MongoDB on Storage-Optimized AWS EC2High Performance MongoDB on Storage-Optimized AWS EC2
High Performance MongoDB on Storage-Optimized AWS EC2
MongoDB
 
Open source and cross platform .net
Open source and cross platform .netOpen source and cross platform .net
Open source and cross platform .net
Ibon Landa
 

What's hot (19)

Scaling application servers for efficiency
Scaling application servers for efficiencyScaling application servers for efficiency
Scaling application servers for efficiency
 
Moving mongo db to the cloud strategies and points to consider
Moving mongo db to the cloud  strategies and points to considerMoving mongo db to the cloud  strategies and points to consider
Moving mongo db to the cloud strategies and points to consider
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
 
Configure Grafana Chat Bot with GitHub Hubot and SLACK.
Configure Grafana Chat Bot with GitHub Hubot and SLACK.Configure Grafana Chat Bot with GitHub Hubot and SLACK.
Configure Grafana Chat Bot with GitHub Hubot and SLACK.
 
High Performance MongoDB on Storage-Optimized AWS EC2
High Performance MongoDB on Storage-Optimized AWS EC2High Performance MongoDB on Storage-Optimized AWS EC2
High Performance MongoDB on Storage-Optimized AWS EC2
 
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013
Scaling MongoDB on Amazon Web Services (DAT209) | AWS re:Invent 2013
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
 
Server side rendering review
Server side rendering reviewServer side rendering review
Server side rendering review
 
OSDC.no 2015 introduction to node.js workshop
OSDC.no 2015 introduction to node.js workshopOSDC.no 2015 introduction to node.js workshop
OSDC.no 2015 introduction to node.js workshop
 
(SDD416) Amazon EBS Deep Dive | AWS re:Invent 2014
(SDD416) Amazon EBS Deep Dive | AWS re:Invent 2014(SDD416) Amazon EBS Deep Dive | AWS re:Invent 2014
(SDD416) Amazon EBS Deep Dive | AWS re:Invent 2014
 
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan HoracekOpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
 
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
 
Golang @ Tokopedia
Golang @ TokopediaGolang @ Tokopedia
Golang @ Tokopedia
 
Memcache d
Memcache dMemcache d
Memcache d
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"
 
Keynote: Scaling Sensu Go
Keynote: Scaling Sensu GoKeynote: Scaling Sensu Go
Keynote: Scaling Sensu Go
 
Open source and cross platform .net
Open source and cross platform .netOpen source and cross platform .net
Open source and cross platform .net
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS Performance
 

Viewers also liked

Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 

Viewers also liked (11)

Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Big Data
Big DataBig Data
Big Data
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Benchmarking, Load Testing, and Preventing Terrible Disasters

stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
Gaurav "GP" Pal
 

Similar to Benchmarking, Load Testing, and Preventing Terrible Disasters (20)

Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
MongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and QueueingMongoDB Tokyo - Monitoring and Queueing
MongoDB Tokyo - Monitoring and Queueing
 
Let the Tiger Roar!
Let the Tiger Roar!Let the Tiger Roar!
Let the Tiger Roar!
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage EnginesBeyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Conceptos Avanzados 1: Motores de Almacenamiento
Conceptos Avanzados 1: Motores de AlmacenamientoConceptos Avanzados 1: Motores de Almacenamiento
Conceptos Avanzados 1: Motores de Almacenamiento
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Mongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam HelmanMongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam Helman
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Benchmarking, Load Testing, and Preventing Terrible Disasters

  • 2. Benchmarking, Load Testing, and Preventing Terrible Disasters
  • 3. What Parse Does We have 500k+ apps running on Parse. Provide services to — •Store user data •Run server side JavaScript •Send push notifications •Handle crash reporting •Generate analytics
  • 4. Parse + MongoDB • Use many of MongoDB’s feature set • Support almost every type of workload you can imagine •Millions of collections and indexes • new ones being created every minute •Run MongoDB exclusively on AWS •We do crazy things with MongoDB
  • 5. Why Should You Listen to Me? • Parse has one of the most complex MongoDB infrastructures(in the world?) • Started using MongoDB in 1.8 • Upgraded 2.6 everywhere 6 months ago • We have some battle wounds from upgrading MongoDB to pass on to you
  • 6. Why Shouldn’t You Listen to Me? MongoDB is a jack of all trades, and there’s certain features that we haven’t touched. •Sharding — We built our own way to shard data •Aggregation/Map Reduce — We don’t touch this at all
  • 7. History of MongoDB Upgrades at Parse 1.8 2.0 2.2 2.4 2.6 3.0 {Doitlive
  • 8. Cowboy Upgrade 1. Review “Upgrade Requirements” and known bugs in JIRA 2. Run intigration/unit tests agains the new version 3. Spin up a hidden secondary. Watch for problems 4. Unhide SECONDARY.. Watch for problems 5. Promote to PRIMARY 6. Declare success! Oh wait I mean watch for problems.
  • 9. What Went Wrong • 60% perf reduction • all geo indexes block global lock until the first document found • unindexable writes suddenly refused • changed the definition of scan limits,
  • 10. A New Approach 1.8 2.0 2.2 2.4 2.6 3.0 { { Doitlive Doitwith production workloads in a test environment
  • 11. Flashback • Open sourced benchmarking tool specifically for MongoDB • Captures production workloads • Replay those workloads over and over again with configurable speeds • Recently merged a pull request to support load testing with Mongo sharing
  • 12. Record Get the config setup: •oplog_server: A secondary that will be used to tail the oplog for write operations •profiler_server: The primary in the target replica set to capture profiling data •duration_sec: Defines how long you want to record
  • 13. Enable Profiling • Keep in mind, it does an additional write for every operation. •./set_mongo_profiling.py -a enable -n $PRIMARY_HOSTNAME
  • 14. Moar Better Recording • What about just capturing it over the wire? • Maybe use mongosniff • MongoDB has a built in pcap library. • Enter mongocaputils • Also open source • Still a little buggy
  • 16. Creating a Consistent Snapshot Need a way to quickly capture a consistent snapshot of your dataset We use EBS snapshots, •locking mongod •creating an EBS snapshot of all the RAIDed volumes on /var/lib/mongodb •unlocking mongod.
  • 17. Quickly Replaying Workloads •Pre-Warming EBS snapshots after each run is slow and time consuming •Pulling down the blocks from S3 takes hours or days if you have terabytes of data. •We decided to use LVM on top of EBS •Does incur I/O overhead •Allows us to do LVM snapshots!
  • 18. How we used LVM Define a restore point before benchmarking •lvcreate -l 10%VG -s -n restore_point /dev/ mongovg/mongoraid Merge Copy-on-Write logical volume to rollback •Stop MongoDB •Unmount Filesystem •lvconvert –merge /dev/mongovg/restore_point
  • 19. Creating the Test Environment • Spin up new EC2 instance and restore the EBS volumes from snapshot •New EBS volumes need to be pre-warmed. Blocks are lazily loaded from S3 • Benchmark server which will run Flashback request and has the workload on disk. •Nothing specials needs to happen here
  • 20. Benchmarking New Shiny Storage Engines In MongoDB 3.0, each storage engine has a different on-disk format So we also need to run an initial sync of each new storage engine against our restored MMAPv1 backup, and then run benchmarks on each format. MMAPv1 (restored from snapshot) RocksDB WiredTiger initial sync initial sync
  • 21. Side Note: The Storage Efficiency of the RocksDB/ WiredTiger is Amazing* *You should totally check out the “Storage Engine Wars” talk by Charity Majors and Igor Canadi 0 1,000 2,000 3,000 4,000 283GB318GB 3,245GB MMAPv1 WiredTiger RocksDB
  • 22. Running the Replay • Two styles to replay: real and stress flashback -ops_filename=OUTPUT -style=real -url=$MONGO_HOST:27017 -workers=50 MongoDB 2.6 MMAPv1 MongoDB 3.0 MMAPv1 MongoDB 3.0 RocksDB Flashback
  • 23. Metrics Gathering • Flashback percentile latencies broken down by operation type. • Useful from a high level • Not so useful when diving into query regressions
  • 24. Logging Pipeline • Mongo logs are hard to parse. • Thankfully you don’t need to worry about it • Just use our open source PEG parser mongologtools • Ship JSON via Scribe to an internal Facebook data diving tool
  • 25. First Results Op 2.6 MMAPv1 3.0 MMAPv1 3.0 RockDB query 2.93ms 4.43ms 3.04ms p50 Query Latency Op 2.6 MMAPv1 3.0 MMAPv1 3.0 RockDB query 177.41ms 619471.47m s 1441442.26 ms p99 Query Latency
  • 26. First Regression •Regression in $nearSphere queries just for 3.0 •SERVER-17469 — patched in 3.0.2 • After the fix average latency for $nearSphere went from •2354 ms to 35 ms
  • 27. More Ad-Hoc Analysis MMAPv1RocksDB # documents scanned durationmsdurationms # documents scanned
  • 28. P99 Latency query insert remove update findandmodify count 0ms 10ms 20ms 30ms 40ms 1 5 1 1 0 2 0 28 1 22 23 2 0 15 11 21 32 8 2.6 MMAPv1 3.0 MMAPv1 3.0 RockDB Some time later…
  • 29.
  • 30. Benchmarks Won’t Find Everything •[RocksDB] Prefix collision could happen between restarts https://github.com/mongodb-partners/mongo/ commit/ da8a90b3b71bf291684ffc5a6d2fd32118ce1a7b •[MongoDB] Secondary reads block replication https://jira.mongodb.org/browse/SERVER-18190
  • 31. Where are we now with testing 3.0? • MongoDB 3.0 with RocksDB is serving some production traffic and it looks amazing. milliseconds API Request
  • 32. Linkage • Flashback • https://github.com/ParsePlatform/flashback • Mongologtools • https://github.com/tmc/mongologtools • MongoDB 3.0 Benchmarking Results • http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing- so-fast-it-makes-your-head-spin/ • nearSphere regression • https://jira.mongodb.org/browse/SERVER-17469 • WT/RocksDB secondary crash • https://jira.mongodb.org/browse/SERVER-17882