3. About Me
• DBA at AOL (Dulles) for six years
• Background in Sybase
• Now MySQL, PostgreSQL and NoSQL
• Was: Blogsmith, Uncut Video, Travel, Autos,
Journals, Real Estate, Ficlets, Shopping
• Currently: Patch, MapQuest, HSS,
Datalayer, Demand
• I Heart Big Data
3
4. About MongoDB
• “Scalable, high-performance, open source,
document-oriented database”
• Databases (Databases)
• Collections (Tables)
• Documents (Rows)
• Fields (Columns) - K/V Pairs
• Indexes
• No Joins
• Favors Embedding Data instead of FKs
4
5. MongoDB Support
• Operating Systems
• Linux, Windows, Mac OS X, Solaris
• 32bit, 64bit
• Drivers
• Java(MapQuest), Javascript, Perl,
Ruby(Patch), Scala, Erlang, C,
C#(Editions), C++, Haskell, PHP, Python
• R, Smalltalk, node.js, ColdFusion
5
6. MongoDB Use Cases
• Website Data Store
• Caching Tier
• Document and Content Mgmt Systems
• Event Logging
• Real-time Stats/Analytics
• Archiving
• High Volume Problems
6
7. MongoDB Misuse
• Complex Transactional Systems
• Traditional Business Intelligence
• Small Data and/or Small Traffic
• Should NOT be our default datastore
• Use MySQL
• or Use PostgreSQL, CouchBase, Redis,
Riak, Hive/HBase,Vertica, Neteeza, TBD,
etc...
7
8. Best Practices
• Slaves are a MUST pre1.8
• Use 64 bit version
• 32 bit version has 2.5 GB storage limit
• Use xfs or ext4
• Keep eye on oplog size
• Turn off atime & dtime
• Consider using getLastError()
8
9. More Best Practices
• Increase File Descriptor Limits
• Do not use kill -9 (pre-1.8 or non-
journaled)
• At least 3 node replica sets
• db.runCommand(“logRotate”)
• Keep db.<collection>.totalIndexSize() less
than RAM
• Linux dirty_background_ratio (10->5%)
and dirty_ratio (40->10%) (pre 2.6.22)
9
10. Even More
• Use --rest (add 1000 to port)
• Write To a Log
• Take Advantage of 10gen’s MMS
• Use Shortest and Readable Field Names as
Possible
• drop() is Much Faster Than remove()
10
11. Lessons Learned
• Be Careful About Updates
• Choose Shard Key Carefully
• Turn Off Balancer During Peak Periods
• Use Explain
• Aggressively Upgrade Within Major
Versions
• Choose Embed vs Top-Level Collections
Carefully
11
12. Top-Level Collections
• Don’t Belong Conceptually To Another
Collection
• Building Blocks
• Easily Referenceable and Updatable
12
13. Embedding Pros
• Fast Retrieval of Document With Related
Data
• Atomic Updates
• Ownership is obvious
• Maps Better With Structure of Code
13
14. Embedding Cons
• Harder To Query/Reference
• Harder To Do Mass Queries
• 16MB Limit Per Document
• Err On Side Of Embedding
• Note: Concepts Here Borrowed From
Fantastic Preso By Ian White of Sailthru
14
17. Even More Resources
• Follow @MongoQuestion (StackOverflow)
• MongoDB on Quora (@q_mongodb)
• Books
• Training
• Office Hours in NYC and Silicon Valley
• 10gen Support (Email Me To Be Added)
• DC MongoDB Users Group (@MongoDC)
17
18. New MongoDB Release
• 2.0 (Released Last Week)
• Journaling is Default
• Per Collection/Index Compact Command
• Concurrency Improvements
• Reduced Default Stack Size
• Index Performance Enhancements
18
19. More 2.0 Features
• Map Reduce Performance Improvements
• Replica Set Improvements
• Priorities
• Data-center Awareness
• Release Notes
• 2.0 Features Presentation
19
20. Future Releases
• 2.2 (End of 2011?)
• New Aggregation Framework
• More Concurrency Improvements
• Better Freelist Management
• Beyond
• Full-Text Search
• Auto Compaction/Defrag
20