SlideShare une entreprise Scribd logo
1  sur  29
Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
From zero to humongous 2 About our application  How we chose MongoDB How we use MongoDB
About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug)  Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples  The “check now” button
The Yottaa Network 4
How we chose mongo 5
Requirements Our data set is going to grow very quickly  Scalable by default We have a very small team Focus on application, not infrastructure We are a startup  Requirements change hourly Operations We’re 100% in the cloud 6
Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
Can I Hadoop my way out of this? 	Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
MongoDB!  Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
Scale out is the new scale up 17 App Server App Server App Server
How we’re using MongoDB 18
Our Data Model 19 Document per URL we track  Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
Thinking in rows 20 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  { url: ‘www.google.com’,   location: “NYC”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 2345	}
Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
Thinking in rows 22  Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows   AVG Day 3 AVG 30 days average query range
Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
Storing a sample 24 db.metrics.dailies.update(  	{ url: ‘www.google.com’,         day: ‘9/20/2010’ },  	{ ‘$inc’: {  	  ‘connect.sum’:1234,        ‘connect.count’:1,        ‘connect.sfo.sum’:1234,        ‘connect.sfo.count’:1 } },      { upsert: true }  ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’,   location: “SFO”    connect: 23,  first_byte: 123,  last_byte: 245,    timestamp: 1234	}  Atomically update the weekly data 2 Atomically update the monthly data 3
Drawing connect time graph 26 db.metrics.dailies.find(  	{ url: ‘www.google.com’,         day: { “$gte”: ‘9/1/2010’,                  “$lte”:’9/20/2010’ },  	{ ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents.  AVG 20x fewer Day 3 AVG 30 days == 30 documents
Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
Final thoughts Mongo has been a great choice  80gb of data and counting Majorly compressed after moving from table to document oriented data model  100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon  You are using replication, right?  29

Contenu connexe

En vedette

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
MongoDB
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
MongoDB
 

En vedette (18)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
Lightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDBLightning Talk: Real-Time Analytics from MongoDB
Lightning Talk: Real-Time Analytics from MongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
MongoDB - Ekino PHP
MongoDB - Ekino PHPMongoDB - Ekino PHP
MongoDB - Ekino PHP
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
Real Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at WishReal Time Data Analytics with MongoDB and Fluentd at Wish
Real Time Data Analytics with MongoDB and Fluentd at Wish
 
MongoDB
MongoDBMongoDB
MongoDB
 
[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다[D2 campus seminar]오픈소스로 날아오르다
[D2 campus seminar]오픈소스로 날아오르다
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 

Plus de Jared Rosoff

Plus de Jared Rosoff (10)

MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Mongosv 2011 - Sharding
Mongosv 2011 - ShardingMongosv 2011 - Sharding
Mongosv 2011 - Sharding
 
Mongosv 2011 - Replication
Mongosv 2011 - ReplicationMongosv 2011 - Replication
Mongosv 2011 - Replication
 
Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2Mongosv 2011 - MongoDB on Amazon EC2
Mongosv 2011 - MongoDB on Amazon EC2
 
MongoDB Deployment Tips
MongoDB Deployment TipsMongoDB Deployment Tips
MongoDB Deployment Tips
 
Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011Scaling with mongo db - SF Mongo User Group 7-19-2011
Scaling with mongo db - SF Mongo User Group 7-19-2011
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
Indexing & query optimization
Indexing & query optimizationIndexing & query optimization
Indexing & query optimization
 
Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010Web performance meetup bos 11 18-2010
Web performance meetup bos 11 18-2010
 
Scalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on RailsScalable Event Analytics with MongoDB & Ruby on Rails
Scalable Event Analytics with MongoDB & Ruby on Rails
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Realtime Analytics with MongoDB

  • 1. Scaling Rails @ Yottaa Jared Rosoff @forjared jrosoff@yottaa.com September 20th 2010
  • 2. From zero to humongous 2 About our application How we chose MongoDB How we use MongoDB
  • 3. About our application 3 We collect lots of data 6000+ URLs 300 samples per URL per day Some samples are >1MB (firebug) Missing a sample isn’t a bit deal We visualize data in real-time No delay when showing data “On-Demand” samples The “check now” button
  • 5. How we chose mongo 5
  • 6. Requirements Our data set is going to grow very quickly Scalable by default We have a very small team Focus on application, not infrastructure We are a startup Requirements change hourly Operations We’re 100% in the cloud 6
  • 7. Rails default architecture Performance Bottleneck: Too much load Collection Server Data Source MySQL User Reporting Server “Just” a Rails App
  • 8. Let’s add replication! Performance Bottleneck: Still can’t scale writes MySQL Master Collection Server Data Source Replication MySQL Master User Reporting Server MySQL Master MySQL Master Off the shelf! Scalable Reads!
  • 9. What about sharding? Development Bottleneck: Need to write custom code Collection Server Data Source Sharding MySQL Master MySQL Master MySQL Master User Reporting Server Sharding Scalable Writes!
  • 10. Key Value stores to the rescue? Development Bottleneck: Reporting is limited / hard Collection Server Data Source MySQL Master MySQL Master Cassandra or Voldemort User Reporting Server Scalable Writes!
  • 11. Can I Hadoop my way out of this? Development Bottleneck: Too many systems! MySQL Master MySQL Master Cassandra or Voldemort Collection Server Data Source Hadoop MySQL Master Scalable Writes! Flexible Reports! “Just” a Rails App MySQL Master User Reporting Server MySQL Master MySQL Slave
  • 12. MongoDB! Collection Server Data Source MySQL Master MySQL Master MongoDB User Reporting Server Scalable Writes! “Just” a rails app Flexible Reporting!
  • 13. MongoD App Server Data Source Collection MongoD Load Balancer Passenger Nginx Mongos Reporting User MongoD Sharding! High Concurrency Scale-Out
  • 14. Sharding is critical 14 Distribute write load across servers Decentralize data storage Scale out!
  • 15. Before Sharding 15 App Server App Server App Server Need higher write volume Buy a bigger database Need more storage volume Buy a bigger database
  • 16. After Sharding 16 App Server App Server App Server Need higher write volume Add more servers Need more storage volume Add more servers
  • 17. Scale out is the new scale up 17 App Server App Server App Server
  • 18. How we’re using MongoDB 18
  • 19. Our Data Model 19 Document per URL we track Meta-data Summary Data Most recent measurements Document per URL per Day Detailed metrics Pre-aggregated data
  • 20. Thinking in rows 20 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } { url: ‘www.google.com’, location: “NYC” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }
  • 21. Thinking in rows 21 What was the average connect time for google on friday? From SFO? From NYC? Between 1AM-2AM?
  • 22. Thinking in rows 22 Up to 100’s of samples per URL per day!! Day 1 AVG Result Day 2 An “average” chart had to hit 600 rows AVG Day 3 AVG 30 days average query range
  • 23. Thinking in Documents This document contains all data for www.google.com collected during 9/20/2010 This tells us the average value for this metric for this url / time period Average value from SFO Average value from NYC 23
  • 24. Storing a sample 24 db.metrics.dailies.update( { url: ‘www.google.com’, day: ‘9/20/2010’ }, { ‘$inc’: { ‘connect.sum’:1234, ‘connect.count’:1, ‘connect.sfo.sum’:1234, ‘connect.sfo.count’:1 } }, { upsert: true } ); Which document we’re updating Update the aggregate value Update the location specific value Atomically update the document Create the document if it doesn’t already exist
  • 25. Putting it together 25 Atomically update the daily data 1 { url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 } Atomically update the weekly data 2 Atomically update the monthly data 3
  • 26. Drawing connect time graph 26 db.metrics.dailies.find( { url: ‘www.google.com’, day: { “$gte”: ‘9/1/2010’, “$lte”:’9/20/2010’ }, { ‘connect’:true} ); Data for google We just want connect time data Compound index to make this query fast The range of dates for the chart db.metrics.dailies.ensureIndex({url:1,day:-1})
  • 27. More efficient charts 27 1 Document per URL per Day Day 1 AVG Result Day 2 Average chart hits 30 documents. AVG 20x fewer Day 3 AVG 30 days == 30 documents
  • 28. Real Time Updates 28 Single query to fetch all metric data for a URL Fast enough that browser can poll constantly for updated data without impacting server
  • 29. Final thoughts Mongo has been a great choice 80gb of data and counting Majorly compressed after moving from table to document oriented data model 100’s of updates per second 24x7 Not using Sharding in production yet, but planning on it soon You are using replication, right? 29