Capacity Planning

MongoDB Capacity Planning
Jay Runkel
Principal Solution Architect
jay.runkel@mongodb.com
@jayrunkel

Capacity Planning
• What is capacity planning?
• Why is it important?
• Which resources are affected?
• How to do it?

https://tingbudongchine.files.wordpress.com/2012/08/lemonde1.jpeg
What is Capacity Planning?

Fine Art of …
Requirements
Resources

Preparing for Launch
• Developers are about to finish final Sprint
• Code is good (so they say  )
• You feeling comfortable to launch soon
• How to deploy?

Requirements
• Availability
– Uptime requirements: RPO and RTO
• Throughput
– Average read/writes/users
– Peek throughput
– Operations per second ? per day? per month?
• Responsiveness
– What's the acceptable latency?
• Higher during peek time?
RTO=recovery time objective RPO=recovery point objective

Resources
• CPU
• Storage
• Memory
• Network

Requirements vs Resources
Throughput
Availability
Responsiveness

Resource Usage
• Storage
• IOPS
• Size
• Data & Loading
Patterns
• CPU
• Speed
• Cores
• Memory
• Working Set
• Network
• Latency
• Throughput

Why?
• Once we launch, we don't want to have avoidable down
time due to poorly selected HW
• As our success grows we want to stay in front of the
demand curve
• We want to meet business and users expectations
• We want to keep our jobs!
• Don't be the "goat"

Important Aspects
• Capacity
– Under
– Over
– Just Right?
• Prediction Models
– User/Load
– OPS/Request
– System Behavior (stress testing anyone?)
• Change Velocity
– Data / Resource-Allocation / Provisioning
– Minimum Viable Product?
– Future Releases / Roadmap

Important Aspects
• When?
– Not too early
– Before is too late!
– Iterative Process
Launch Version 2

http://www.mandywalker.com.au/wp-content/uploads/2013/07/Wall-with-Tools.jpg
Which resources are affected?

CPU
• Compression/Decompression
• Encryption/Decryption
• Non-indexed Data
• Sorting
• Aggregation
– Map/Reduce
– Aggregation Framework
• Data
– Fields
– Nesting
– Arrays/Embedded-Docs

Network
• Latency
– WriteConcern
– ReadPreference
– Batching
• Throughput
– Update/Write Patterns
– Reads/Queries

Network
• Latency
– W:?
– Nearest
– Bulk Write Operations
• Throughput
– Use $set operator
– Use field projection on queries

Storage
• Active
• Archival
• Loading Patterns
• Integration (BI/DW)

Storage Capability
Type IOPS
7200 rpm SATA ~ 75 – 100
15000 rpm SAS ~ 175 – 210
http://en.wikipedia.org/wiki/IOPS

Storage Capability
Type IOPS
7200 rpm SATA ~ 75 – 100
15000 rpm SAS ~ 175 – 210
SSD Intel X25-E (SLC) ~ 5000
SSD Intel X25-M G2 (MLC) ~ 8000

Storage Capability
Type IOPS
7200 rpm SATA ~ 75 – 100
15000 rpm SAS ~ 175 – 210
Amazon EBS ~ 100
Amazon EBS Provisioned Up to ~10,000
Amazon EBS Provisioned IOPS (SSD) Up to ~20,000

Storage Capability
Type IOPS
7200 rpm SATA ~ 75 – 100
15000 rpm SAS ~ 175 – 210
Amazon EBS ~ 100
Amazon EBS Provisioned Up to ~10,000
Amazon EBS Provisioned IOPS (SSD) Up to ~20,000
FusionIO ~135,000
Violin Memory 6000 ~ 1,000,000
Higher IOPS higher the Cost!!!

Storage Considerations
• Work out how much data you need to write per unit of
time!
• Databases will use storage to persist data
– More data = Bigger indexes = More Storage
• MongoDB Stores Information into Documents
• BSON Format
– http://bsonspec.org/

Memory
• Working Set
– Active Data in Memory
– Measured Over Periods
• And other operations
– Sorting
– Aggregation
– Connections
• WiredTiger Storage Engine Cache

Caching in WiredTiger
RAM
DISK
(compressed)
File System
Cache
(compressed)
WT Cache
(uncompressed)
MongoD (WT)
50% of RAM

http://blogdailyherald.com/wp-content/uploads/2013/05/3879-animated_gif-chuck_norris-dodgeball-thumbs_up.gif
How to do it!

Basic Rules
• Determine data size, working set, query throughput
requirements
• Use good measuring and monitoring practices
• Plan ahead but be flexible!
• Iterate
– Review Requirements
– Review Capacity

MongoDB cluster sizing at 30,000 ft
• Disk Space
• RAM
• Query Throughput

• Generate a sample document set
– Write some code?
• Use db.stats() to measure
– Disk space
– Compression
• Do the math
– Estimate production disk requirements
• Sum of disk space across shards > greater than required
storage size
Disk Space

• Sum of disk space across shards > greater than required
storage size
Disk Space: How Many Shards Do I Need?
Example
Data Size = 9 TB
WiredTiger Compression
Ratio: .33
Storage size = 3 TB
Server disk capacity = 2 TB
2 Shards Required

RAM Requirements
• Working Set < RAM
• WorkSet = Indexes plus the set of documents accessed
frequently
• WorkSet in RAM 
– Shorter latency
– Higher Throughput

Estimating Working Set
• Using your sample data set
– Create required indexes based upon queries
– Use db.coll.stats() to get index size
• Do the math to get production index size
• Estimate the working set
– Given the queries
– What are the frequently accessed docs?
– Examples:
• Last x days of data
• Most queried devices

RAM: How Many Shards Do I Need?
Example
Working Set = 428 GB
Server RAM = 128 GB
428/128 = 3.34
4 Shards Required

• Measure max sustained query rate of a single server
(with replication)
– Use prototype/development version
– Use application queries and data
– Measure max sustained performance
• Assume sharding overhead of 20-30%
Query Rate

• Measure max sustained query rate of a single server (with replication)
– build a prototype and measure
• Assume sharding overhead of 20-30%
Query Rate: How Many Shards Do I Need?
Example
Require: 50K ops/sec
Prototype performance: 20
ops/sec (1 replica set)
4 Shards Required: 80
ops/sec * .7 = 56K ops/sec

Measuring & Monitoring
• What to measure
– IOPS
– Page Faults
– Resident Memory (Working Set)
– Connections
– Lock %
• How to measure and monitor  Ops/Cloud Manager
• Command Line Tools
– iostat
– vmstat
– mongostat

The Best Way to Run MongoDB
Ops Manager allows you leverage and automate the best
practices we’ve learned from thousands of deployments in
a comprehensive application that helps you run MongoDB
safely and reliably.
Benefits include:
10x-20x more efficient operations
Complete performance visibility
Protection from data loss
Assisted performance optimization

How It Works
Ops
Manager
mongod mongodmongod
Agent Agent Agent

Monitoring and Alerting
Over 100+ database metrics
Dozens of optimized charts
Custom alerts so incidents don’t
become emergencies

APM lntegration
Monitor MongoDB alongside the rest of
your app infrastructure using our
RESTful API
Leverage packaged integrations with
leading APM platforms

Visual Query Profiler
Identify your slow-running queries with
the click of a button

Index Suggestions
Index recommendations to improve
your deployment

Automated Index Builds
Automate rolling index builds to reduce
operational overhead and the risk of
failovers

Database Automation
Automate tasks that you would have
otherwise performed manually, such
as…
• Deploying a new cluster
• Upgrades
• Adding capacity
• Database restores

Backup with Point-in-time Recovery
Restore to precisely the moment you
need, quickly and safely.
Ops Manager is the only MongoDB
backup solution that offers point-in-time
backups of replica sets and cluster-
wide snapshots of sharded clusters.

http://www.humanandnatural.com/data/media/178/badan_jaran_desert_oasis_china.jpg
Long story short …

Capacity Planning is …
• Needed
– Involves resource allocation
– Hardware specification and sizing
– Cost!
• Vital
– Translate Requirements and Expectations into Experience
and Functionality
• And meeting those
• Requires understanding your application
– Measuring resource needs
– Monitoring
– Iterating
– Repeating process

For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com

http://cl.jroo.me/z3/v/D/C/e/a.baa-Too-many-bicycles-on-the-van.jpg
Questions?
@jayrunkel
jay.runkel@mongodb.com

Capacity Planning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Capacity Planning

Similaire à Capacity Planning (20)

Plus de MongoDB

Plus de MongoDB (20)

Dernier

Dernier (20)

Capacity Planning

Notes de l'éditeur