"Have you ever crossed your fingers before performing an upgrade or switching storage engines, because you weren't quite sure what would happen? Have you ever been bitten by a slight change in behavior that turned out to be unexpectedly significant for your workload? At Parse we have developed a workflow that lets us repeatedly capture and replay real production workloads offline. This has allowed us to confidently perform upgrades across a large fleet with a minimum amount of canarying, and has helped us load test a variety of storage engines with real workloads so we can compare and understand the performance tradeoffs.
In this talk we will cover best practices for upgrades and migrations, and we will walk through how to use our open-sourced tooling to demonstrate how you can do the same. We will also share some fun war stories about various disasters found and averted *before* putting them into production thanks to offline benchmarking."
3. What Parse Does
We have 500k+ apps running on Parse.
Provide services to —
•Store user data
•Run server side JavaScript
•Send push notifications
•Handle crash reporting
•Generate analytics
4. Parse + MongoDB
• Use many of MongoDB’s feature set
• Support almost every type of workload you can
imagine
•Millions of collections and indexes
• new ones being created every minute
•Run MongoDB exclusively on AWS
•We do crazy things with MongoDB
5. Why Should You Listen
to Me?
• Parse has one of the most complex MongoDB
infrastructures(in the world?)
• Started using MongoDB in 1.8
• Upgraded 2.6 everywhere 6 months ago
• We have some battle wounds from upgrading
MongoDB to pass on to you
6. Why Shouldn’t You
Listen to Me?
MongoDB is a jack of all trades, and
there’s certain features that we haven’t
touched.
•Sharding — We built our own way
to shard data
•Aggregation/Map Reduce — We
don’t touch this at all
8. Cowboy Upgrade
1. Review “Upgrade Requirements” and
known bugs in JIRA
2. Run intigration/unit tests agains the
new version
3. Spin up a hidden secondary. Watch for
problems
4. Unhide SECONDARY.. Watch for
problems
5. Promote to PRIMARY
6. Declare success! Oh wait I mean
watch for problems.
9. What Went Wrong
• 60% perf reduction
• all geo indexes block global
lock until the first document
found
• unindexable writes suddenly
refused
• changed the definition of
scan limits,
10. A New Approach
1.8 2.0 2.2 2.4 2.6 3.0
{
{
Doitlive
Doitwith
production
workloads
in a
test environment
11. Flashback
• Open sourced benchmarking
tool specifically for MongoDB
• Captures production
workloads
• Replay those workloads
over and over again with
configurable speeds
• Recently merged a pull request
to support load testing with
Mongo sharing
12. Record
Get the config setup:
•oplog_server: A secondary that will be used to
tail the oplog for write operations
•profiler_server: The primary in the target replica
set to capture profiling data
•duration_sec: Defines how long you want to
record
13. Enable Profiling
• Keep in mind, it does an additional write for every
operation.
•./set_mongo_profiling.py -a enable -n
$PRIMARY_HOSTNAME
14. Moar Better Recording
• What about just capturing it over the wire?
• Maybe use mongosniff
• MongoDB has a built in pcap library.
• Enter mongocaputils
• Also open source
• Still a little buggy
16. Creating a Consistent
Snapshot
Need a way to quickly capture a consistent
snapshot of your dataset
We use EBS snapshots,
•locking mongod
•creating an EBS snapshot of all the RAIDed
volumes on /var/lib/mongodb
•unlocking mongod.
17. Quickly Replaying
Workloads
•Pre-Warming EBS snapshots after each run is slow
and time consuming
•Pulling down the blocks from S3 takes hours or
days if you have terabytes of data.
•We decided to use LVM on top of EBS
•Does incur I/O overhead
•Allows us to do LVM snapshots!
18. How we used LVM
Define a restore point before benchmarking
•lvcreate -l 10%VG -s -n restore_point /dev/
mongovg/mongoraid
Merge Copy-on-Write logical volume to rollback
•Stop MongoDB
•Unmount Filesystem
•lvconvert –merge /dev/mongovg/restore_point
19. Creating the Test
Environment
• Spin up new EC2 instance and restore the EBS
volumes from snapshot
•New EBS volumes need to be pre-warmed.
Blocks are lazily loaded from S3
• Benchmark server which will run Flashback
request and has the workload on disk.
•Nothing specials needs to happen here
20. Benchmarking New
Shiny Storage Engines
In MongoDB 3.0, each storage engine has a
different on-disk format
So we also need to run an initial sync of each
new storage engine against our restored
MMAPv1 backup, and then run benchmarks
on each format.
MMAPv1
(restored from
snapshot)
RocksDB
WiredTiger
initial sync
initial sync
21. Side Note: The Storage
Efficiency of the RocksDB/
WiredTiger is Amazing*
*You should totally check out the “Storage Engine Wars” talk
by Charity Majors and Igor Canadi
0
1,000
2,000
3,000
4,000
283GB318GB
3,245GB
MMAPv1 WiredTiger RocksDB
22. Running the Replay
• Two styles to replay: real and
stress
flashback
-ops_filename=OUTPUT
-style=real
-url=$MONGO_HOST:27017
-workers=50
MongoDB 2.6
MMAPv1
MongoDB 3.0
MMAPv1
MongoDB 3.0
RocksDB
Flashback
23. Metrics Gathering
• Flashback percentile latencies broken down by
operation type.
• Useful from a high level
• Not so useful when diving into query regressions
24. Logging Pipeline
• Mongo logs are hard to parse.
• Thankfully you don’t need to worry about it
• Just use our open source PEG parser
mongologtools
• Ship JSON via Scribe to an internal Facebook
data diving tool
26. First Regression
•Regression in $nearSphere
queries just for 3.0
•SERVER-17469 — patched in
3.0.2
• After the fix average latency for
$nearSphere went from
•2354 ms to 35 ms