This document provides a summary of a presentation on scaling MongoDB on Amazon Web Services. The presentation covered why MongoDB should be scaled on AWS due to its ability to rapidly scale on demand in a cost effective manner. It also discussed planning considerations for MongoDB and AWS topologies as well as selecting appropriate EC2 instances and storage options. The presentation then covered deploying MongoDB on AWS using DevOps tools like CloudFormation, Chef, and Puppet. It also discussed security, monitoring options like MongoDB Monitoring Service and CloudWatch, and backup strategies using snapshots.
2. Nice to Meet You!
Mike Saffitz
CTO, Co-Founder, Apptentive
Follow at: @msaffitz • Connect at: mike@apptentive.com
Apptentive
The easiest way for anyone with an app to talk with their customers
Follow at: @apptentive • Connect at: info@apptentive.com
7. Why Scale MongoDB on AWS?
Supports
Diverse Set of
Scenarios
Rapidly Scale
On Demand
Simple To
Administer
Easy
Friendly Query
Syntax
Well
Documented
Flexible
Broad
Language
Support
Competitive
TCO
Cost
Effective
Fine Grain Control
Over Price &
Performance
8. Why Not Scale MongoDB on AWS?
Your Data is
Predominately
Relational in Nature
Don’t Want to Incur the
Administrative Costs
Consider RDS
Hosted Alternatives
Consider DynamoDB
12. MongoDB Topologies: Single ReplicaSet w/ Arbiter
Automatic
Failover
mongod
(primary)
mongod
(secondary)
Contains Full Copy of
Data on the Primary –
Can be Used for Reads
mongod
(arbiter)
Arbiter Only Participates
in Voting to Elect a New
Primary
(Must Have Odd #)
13. MongoDB Topologies: Single ReplicaSet
Automatic
Failover
mongod
(primary)
mongod
(secondary)
Scale Across
Instance
Types
mongod
(secondary)
Data Replicated Within ReplicaSet
14. MongoDB Topologies: Sharded Cluster
App Server
mongos
App Server
…
mongos
mongod
process
config
config
config
Data Partitioned Across Shards
mongod
(primary)
mongod
(secondary)
mongod
(secondary)
Data Replicated Within Shard
…
mongod
(primary)
mongod
(secondary)
mongod
(secondary)
15. MongoDB Topologies: Picking One
• Single Server? Not For Production
• Don’t Shard Prematurely
– ReplicaSets can take you surprisingly far
• … But Don’t Wait Too Long to Shard
– Collections over 256GB may have issues migrating to shards
– Rebalancing consumes IO and can be very slow
• Pick the Right Instance Size for Your Topology…
– We’re going to get to this in a moment
16. AWS Topologies: AZs & Regions
• Obvious: Distribute Across Availability Zones in a
Region
– No Single Point of Failure
• Distributing Across Regions
– Shard per Region versus Shards Across Regions
– Considerations
•
•
•
•
Replication Latency
Data Transfer Costs
Administration Costs
Speedup from Geo-Based Tag Aware Sharding
18. Selecting an Instance: Compute
• Most Likely to Not Be A Significant Factor
– Exceptions: Heavy use of Map/Reduce, Aggregation Framework
– Mongo 2.4 added concurrency via V8
– Important! Only run 64-Bit ; 32-Bit is limited to ~2GB
• Real World Numbers on m1.large:
19. Selecting an Instance: Memory
• Estimate Necessary Working Set
–
db.runCommand( { serverStatus: 1, workingSet: 1 } )
Is pagesInMemory * 4k approaching total RAM? Is overSeconds decreasing / small?
– db.stats()
• Pick the Instance that Matches
• Monitor on MMS
– Page Faults (abstract)
– Queues (better)
– Response Times (best)
20. Selecting an Instance: EBS Optimization
• Run EBS Optimized When Available
– Especially with Provisioned IOPs
• Volume Config Impacts IO Perf Far More than
Instance Selection
21. Storage
• Instance Storage
– Non-Durable
– Fast But Inconsistent Performance
– Can’t Use Snapshots for Backups
• “Standard” EBS
– Slower
– Higher Variability Performance
• Provisioned IOPs EBS
– Consistent Performance
– Don’t Under Provision -- Watch Queue
Length
22. Storage
• RAID 10? Just use LVM on RAID 0
– More: http://blog.mongohq.com/debunking-myth-of-raid-10-asbest-practice-on-aws/
• Use XFS or Ext4
• Mount with noatime, noexec, nodiratime
23. Selecting an Instance: Summary
1. Lead with Working Set Requirements
2. Validate Compute is Sufficient
3. Enable EBS Optimized if Available
4. Use Provisioned IOPS EBS
5. (Confirm Cost is Acceptable)
26. Scaling Deployment
• DevOps: Go for ‘bilities:
– Reliability, Predictability, Repeatability, and Auditability
• The Result is Easy Replaceability and
Scalability
– Build your infrastructure so it can be treated like an appliance
– The impact of your decisions during planning will be significantly
mitigated
27. DevOps Tools
• AWS Marketplace AMIs
– Preconfigured with MongoDB best practices
– Do-it-yourself scaling to ReplicaSets / Shards
– Helpful, but not a DevOps Solution
• AWS CloudFormation
– Templates for Resource Setup & Initial Configuration
• Chef, Puppet, Ansible, SaltStack, & More
– AWS OpsWorks, but limited by chef-solo
28. Security
• Run in a VPC
– Complications: Cross Region, Multiple Source Ingress
• Use KeyFiles & Roles
– KeyFiles: Internal authentication for cluster members
– Roles allow for user-level fine grain access control
• Advanced:
– Keberos support in MongoDB 2.4
– SSL Support in Custom Builds & MongoDB Enterprise
30. Monitoring: MongoDB Monitoring Service
• Very Good, Free Holistic Monitoring
–
–
Important: ReplLag, Page Faults, Lock %
Informative: OpCounters, Connections, Queue Lengths
• Includes Basic Alerting of Host Failures and Metric Thresholds
• Query Profiler Details Slow Queries
–
db.setProfilingLevel(1)
31. Monitoring: Amazon CloudWatch
• Detailed Resource Level Monitoring
– Important: Queue Length, Read/Write Latencies
• Versatile alerting based on Amazon Simple Notification
Service (SNS)
32. Backups
• Delayed Secondary
– Questionable as a primary backup strategy
• Dump/Restore
– Impractical for larger deployments
• MongoDB Service
– Managed, Secure, Point in Time. Unclear suitability for larger deployments
– Expensive
• Snapshots
– Fast, Easy, Scalable. Pay Attention to Consistency (RAID, Shards)
33. Easy Snapshot-Based Backups With Mongolly
• Automatic topology detection, snapshotting, and
snapshot management for EBS-backed MongoDB
Databases
• Easy as: $ mongolly backup
• https://github.com/msaffitz/mongolly
34. Conclusions
• MongoDB + AWS =
• Options For All Deployment / Workload Sizes
– I/O typically the focal point for optimization
• Investing in a DevOps Strategy + Solution
Makes It Near Effortless
35. Please give us your feedback on this
presentation
DAT209
As a thank you, we will select prize
winners daily for completed surveys!