SlideShare a Scribd company logo
1 of 36
Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared
How do we do it today?  We use a relational database but …  We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers
How’s that working out for you?
Costs go up
Productivity goes down
By engineers, for engineers
The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality
Scaling your app Use documents  Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit
Scaling your data model
Documents {   author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",  text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [       { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever”       }    ] }
Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms
Disk Seeks & Data Locality Post Comment Author
Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment
Optimized indexes
Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects
Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects
Random Index Entire index must fit in RAM
Right Aligned Only small portion in RAM
Working set size
Working Set Active Documents + Used Indexes RAM Disk
Page Fault App requests document Document not in memory Evict a page from memory Read block from disk  Return document from memory App 1 5 2 RAM 3 4 Disk
Figuring out working Set > db.foo.stats()  { 	"ns" : "test.foo", 	"count" : 1338330, 	"size" : 46915928, 	"avgObjSize" : 35.05557523181876, 	"storageSize" : 86092032, 	"numExtents" : 12, 	"nindexes" : 2, 	"lastExtentSize" : 20872960, 	"paddingFactor" : 1, 	"flags" : 0, 	"totalIndexSize" : 99860480, 	"indexSizes" : { 		"_id_" : 55877632, 		"x_1" : 43982848 	}, 	"ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index
Disk configurations
Single Disk ~200 seeks / second
RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second
RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second
replication
Replica Sets Read / Write Secondary Read Primary Read Secondary
Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read
Sharding
Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary
400GB Index?
400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!
Summary
Summary Use documents to your advantage! Optimize your indexes  Understand your working set Use a sane disk configuratino Use replicas to scale reads  Use sharding to scale writes & working RAM

More Related Content

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011

Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
MongoSF 2011 - Using MongoDB for IGN's Social Platform
MongoSF 2011 - Using MongoDB for IGN's Social PlatformMongoSF 2011 - Using MongoDB for IGN's Social Platform
MongoSF 2011 - Using MongoDB for IGN's Social PlatformManish Pandit
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Dbchriskite
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?Binary Studio
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the FieldMongoDB
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes SenseMongoDB
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness MongoDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Christopher Biow
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators iammutex
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Tom Grek
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB
 

Similar to Scaling with mongo db - SF Mongo User Group 7-19-2011 (20)

Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
MongoSF 2011 - Using MongoDB for IGN's Social Platform
MongoSF 2011 - Using MongoDB for IGN's Social PlatformMongoSF 2011 - Using MongoDB for IGN's Social Platform
MongoSF 2011 - Using MongoDB for IGN's Social Platform
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?MongoDB 3.2 - a giant leap. What’s new?
MongoDB 3.2 - a giant leap. What’s new?
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Why NoSQL Makes Sense
Why NoSQL Makes SenseWhy NoSQL Makes Sense
Why NoSQL Makes Sense
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness
 
CouchDB
CouchDBCouchDB
CouchDB
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators  10 Key MongoDB Performance Indicators
10 Key MongoDB Performance Indicators
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 

More from Jared Rosoff

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesJared Rosoff
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - ShardingJared Rosoff
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - ReplicationJared Rosoff
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Jared Rosoff
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment TipsJared Rosoff
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBSJared Rosoff
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimizationJared Rosoff
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Jared Rosoff
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsJared Rosoff
 

More from Jared Rosoff (9)

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - Sharding
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - Replication
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment Tips
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 

Recently uploaded

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Recently uploaded (20)

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Scaling with mongo db - SF Mongo User Group 7-19-2011

  • 1. Scaling with MongoDBJared Rosoff (jsr@10gen.com) - @forjared
  • 2.
  • 3. How do we do it today? We use a relational database but … We don’t use joins We don’t use transactions We add read-only slaves We added a caching layer We de-normalized our data We implemented custom sharding We buy bigger servers
  • 4. How’s that working out for you?
  • 7. By engineers, for engineers
  • 8. The landscape Memcached Key / Value Scalability & Performance RDBMS Depth of functionality
  • 9. Scaling your app Use documents Indexes make me happy Knowing your working set Disks are the bottleneck Replication makes reading fun Sharding for profit
  • 11. Documents { author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Best Movie Ever” } ] }
  • 12. Disk Seeks & Data Locality Read = really really fast Seek = 5+ ms
  • 13. Disk Seeks & Data Locality Post Comment Author
  • 14. Disk Seeks & Data Locality Post Author Comment Comment Comment Comment Comment
  • 16. Table scans Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects
  • 17. Tree Lookup Find where x equals 7 4 6 2 7 5 3 1 Looked at 3 objects
  • 18. Random Index Entire index must fit in RAM
  • 19. Right Aligned Only small portion in RAM
  • 21. Working Set Active Documents + Used Indexes RAM Disk
  • 22. Page Fault App requests document Document not in memory Evict a page from memory Read block from disk Return document from memory App 1 5 2 RAM 3 4 Disk
  • 23. Figuring out working Set > db.foo.stats() { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848 }, "ok" : 1 } Size of data Average document size Size on disk (and in memory!) Size of all indexes Size of each index
  • 25. Single Disk ~200 seeks / second
  • 26. RAID0 ~200 seeks / second ~200 seeks / second ~200 seeks / second
  • 27. RAID10 ~400 seeks / second ~400 seeks / second ~400 seeks / second
  • 29. Replica Sets Read / Write Secondary Read Primary Read Secondary
  • 30. Replica Sets Read / Write Read Secondary Secondary Read Primary Read Secondary Secondary Read
  • 32. Secondary Secondary Secondary Secondary MongoS MongoS Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary
  • 34. 400GB Index? Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 100GB Index! 100GB Index! 100GB Index! 100GB Index!
  • 36. Summary Use documents to your advantage! Optimize your indexes Understand your working set Use a sane disk configuratino Use replicas to scale reads Use sharding to scale writes & working RAM

Editor's Notes

  1. Let’s talk about infrastructure costs. You probably started building your application on top of an RDBMS. This is the way we have built enterprise and web applications for years. But the problem is that your RDBMS doesn’t have a smooth cost curve when you scale it up. When you start off, you may be running on a smaller server, totally adequate for your load. When you exceed the capacity of that small server, you need to buy a bigger server. You can’t add a second small server. This process repeats. You exceed the capacity of your new server, and upgrade your hardware. There are two long term problems with this: As you scale up, you end up paying more and more for each transaction that your system processes. A small server may cost you $1,000 per CPU, but when you need 128 processors, you might be paying as much as $100,000 per CPU. Each incremental step up in hardware gets more and more expensive, not cheaper and cheaper. You reach an end of this scaling approach. Once you have scaled up to the biggest hardware platform available on the market, there is nowhere to go; no bigger box to buy. At this point you need to change strategies, even if you can afford those ultra-high-end boxes.
  2. And while we’ve been spending more and more money on Hardware, our developer productivity has gone down too. You will hear this storyover and over again from CIO’s and architects: “Well, we use <insert RDBMS> but we don’t use joins or transactions and we’ve de-normalized our schema.” As our hardware gets more and more expensive, we ask our developers to squeeze more and more performance out of the same box. To achieve this, they go through “herculean efforts” to strip their code of advanced features that once made them productive. De-normalizing data, eliminating joins and transactions, adding caching and sharding layers… These are risky projects that slow down feature velocity.