SlideShare une entreprise Scribd logo
1  sur  27
Running MongoDB in the Cloud

          Tony Tam
          @fehguy
What this Talk is About

Wordnik left the cloud and came back
 •   What?!?
 •   Why we left
 •   Decisions
 •   Why we came back (and what we did differently)
Who is Wordnik?

• World’s fastest updating English dictionary
 •   Based on input of text at ~8k words/second
 •   Word Graph as basis to our analysis
     •   Synchronous & asynchronous processing
• 10’s of Billions of documents in NR
  storage
• Concept & Meaning Discovery Engine
• > 20M daily REST API calls, billions
  served
So Why the Detour?

• Architectural Choices
• Business Choices
• Feedback, tooling, infrastructure
• Learning
• Changes in use case
• Progress!
Architecture History

• EC2-based LAMP Stack
 •   POC (and seed funding)
 •   A manageable corpus < 1M records
• REST API
 •   Web + public
 •   MySQL in master/slave
 •   ~1B documents
 •   Operational nightmare
Architecture History
• MongoDB
 •   First-order MySQL issues solved
 •   But it got slow…
• Real Servers to the rescue!
 •   Faster, bigger disks
• MongoDB for Corpus, Structured Data
 •   Faster Reads + Writes!
 •   More metal (72GB RAM)
 •   More cores
 •   “cold” query from 400ms to < 100
Why Change?

Easy!
• Can’t beat metal…except
 •   Quick expansion
 •   Batch jobs/experiments
 •   Add a datacenter
 •   Full cluster migration
 •   The bill for unused capacity
Architectural Mindshift

1. Anything can die, anytime
2. Centralized, redundant state (see point 1)
3. Server performance is *different*
  •   CPU, I/O, Memory—choose one
  •   Smart design makes it work!
Architectural Mindshift

• Your software will need to change!
 •   So will the components you rely on
Your Infrastructure   Cloud
                                   Hero
• Deploying Servers
 •   Going to need a lot!
• Configuration
• Updates to your software

                    What about
                     Data?
Let’s make this Work!

• MySQL Master Slave
  •   Take a snapshot (yes, this will block)
  •   Keep your binlogs!

change master to MASTER_HOST='app1',
MASTER_USER='XXXX', MASTER_PASSWORD='XXXX',
MASTER_LOG_FILE='app1-relay.0038774',
MASTER_LOG_POS=6754205951;
Let’s make this Work!

But…
• Your master is down!
 •   Quick, promote a slave!
 •   Point the other slaves to the new master
• As for the clients…
                                 “Well, we
                               never really
                               tried that…”
Better with Mongo

• Easy up, easy down!
 •   Startup: Sync your data, and announce to clients
     when ready for business
 •   Shutdown: Announce your departure and leave
• Replica sets
 rs.add("db4.wordnik.com:27017");
 rs.remove("db1.wordnik.com:27017");
Better with Mongo
But what about Performance?

• Software Design
    •   It’s slow! (What is *it*?)
    •   Profile everything
import com.wordnik.util.perf._
...
def findUser(id:Long): User = {
    Profile("UserDao::findUserById", dao.findUserById(id))
}

         http://github.com/wordnik/wordnik-oss
But what about Performance?
But what about Performance?

• “It’s the database!”
  •   What is it?
• Mapping layer
  •   Mysql (12+ joins) => 50 records/sec
  •   Mongo JSON  POJO => 1000 records/sec
  •   Mongo DBO  POJO => 35,000 records/sec
• How do you know?
                                      Profile
                                        it!
It’s Still Slow!

• It’s the index!
  •   How do you know?
  •   AHHHHH
It’s Still Slow!

• Balance your B-Tree
 •   Can't always keep index in ram. MMF "does it's
     thing"
 •   Right-balanced b-tree keeps necessary index hot
 •   If you hit indexes on disk, mute your pager
                                                   1
                                                   7




                                                       1   2
                                                       5   7
But it’s Still Slow!

• Look at your Schema design
  •   Design to limit index size/number
  •   _id is your friend—make it meaningful
  •   Record size consistency
      •   Hierarchal Data beware!
      •   Split documents even in same collection!
db.posts.find({_id:/^tony_posts_/})
{_id:"tony_posts_1”, posts:[...]}
{_id:"tony_posts_2”, posts:[...]}       YOUR
{_id:"tony_posts_3”, posts:[...]}     app knows
                                         best
Really, it’s STILL slow!

• Your monolithic app/DB won’t scale same
  on VMs
• Specialize!
 •   Wordnik uses SOA

                 Powered API
                               swagger.wordnik.com

 •   Data tiers follow service types
 •   Smaller *everything*
Really, it’s STILL slow!

• Your monolithic app/DB won’t scale same
  on VMs
• Specialize!
 •   Wordnik uses SOA                   A contract
                                         for your
                               swagger.wordnik.com
                 Powered API
                                          clients
 •   Data tiers follow service types
 •   Smaller *everything*
Be the Boss of your Data

• Your app *should* be smarter than your
  DB
 •   Lots of users?
 •   Lots of blog posts?
 •   Lots of images?
 •   Shard? On what?
• Data dimensionality
 •   Keep active data hot
 •   Don’t try to boil the ocean
Cloud Computing + Mongo

• It can work extremely well
  •   No “Save as Cloud!” menu item
• Shifting constraints
  •   Optimize for RAM on VM
  •   Virtual disk => virtual performance
• Be “Deployable”
  •   Mongo Replica Sets are made for this
Cloud Computing + Mongo

• System Durability
 •   Design your software for abuse
 •   Your old design doesn’t apply
 •   Add APM hooks, now!
• Dissect your app
 •   Build to micro services with dedicated MongoDB
     clusters
• Deployment Infrastructure
 •   Don’t wait until it’s too late
See More
•   See more about Wordnik APIs
                    http://developer.wordnik.com

•   Migrating from MySQL to MongoDB
http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik

•   Maintaining your MongoDB Installation
              http://www.slideshare.net/fehguy/mongo-sv-tony-tam

•   Swagger API Framework
                          http://swagger.wordnik.com

•   Mapping Benchmark
               https://github.com/fehguy/mongodb-benchmark-tools

•   Wordnik OSS Tools
                   https://github.com/wordnik/wordnik-oss
Questions?

Contenu connexe

Tendances

Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Oren Eini
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachJeremy Zawodny
 
GoSF Summerfest - Why Go at Apcera
GoSF Summerfest - Why Go at ApceraGoSF Summerfest - Why Go at Apcera
GoSF Summerfest - Why Go at ApceraDerek Collison
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scalesOren Eini
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesMalin Weiss
 
Internet scaleservice
Internet scaleserviceInternet scaleservice
Internet scaleserviceDaeMyung Kang
 
Engineering an Encrypted Storage Engine
Engineering an Encrypted Storage EngineEngineering an Encrypted Storage Engine
Engineering an Encrypted Storage EngineMongoDB
 
RavenDB Presentation
RavenDB PresentationRavenDB Presentation
RavenDB PresentationMark Rodseth
 
GlobalsDB: Its significance for Node.js Developers
GlobalsDB: Its significance for Node.js DevelopersGlobalsDB: Its significance for Node.js Developers
GlobalsDB: Its significance for Node.js DevelopersRob Tweed
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsDerek Collison
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistJeremy Zawodny
 
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)MongoSF
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDBBrian Ritchie
 
How Different are MongoDB Drivers
How Different are MongoDB DriversHow Different are MongoDB Drivers
How Different are MongoDB DriversNorberto Leite
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny
 
What's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by companyWhat's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by companyMongoDB APAC
 

Tendances (20)

Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
 
GoSF Summerfest - Why Go at Apcera
GoSF Summerfest - Why Go at ApceraGoSF Summerfest - Why Go at Apcera
GoSF Summerfest - Why Go at Apcera
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in MinutesSenchaCon 2016 - How to Auto Generate a Back-end in Minutes
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
 
Internet scaleservice
Internet scaleserviceInternet scaleservice
Internet scaleservice
 
Engineering an Encrypted Storage Engine
Engineering an Encrypted Storage EngineEngineering an Encrypted Storage Engine
Engineering an Encrypted Storage Engine
 
RavenDB Presentation
RavenDB PresentationRavenDB Presentation
RavenDB Presentation
 
GlobalsDB: Its significance for Node.js Developers
GlobalsDB: Its significance for Node.js DevelopersGlobalsDB: Its significance for Node.js Developers
GlobalsDB: Its significance for Node.js Developers
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platforms
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
How Different are MongoDB Drivers
How Different are MongoDB DriversHow Different are MongoDB Drivers
How Different are MongoDB Drivers
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
What's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by companyWhat's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by company
 

Similaire à Running MongoDB in the Cloud

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?DATAVERSITY
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlantaboorad
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planningasya999
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloudboorad
 
Discover MongoDB - Israel
Discover MongoDB - IsraelDiscover MongoDB - Israel
Discover MongoDB - IsraelMichael Fiedler
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community EngineCommunity Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community enginemathraq
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
AWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsAWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsMongoDB
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile appsMugunth Kumar
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 

Similaire à Running MongoDB in the Cloud (20)

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlanta
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
Discover MongoDB - Israel
Discover MongoDB - IsraelDiscover MongoDB - Israel
Discover MongoDB - Israel
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
AWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsAWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and Results
 
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile apps
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

Plus de Tony Tam

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksTony Tam
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with SwaggerTony Tam
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with SwaggerTony Tam
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorTony Tam
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerTony Tam
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Tony Tam
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Tony Tam
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-apiTony Tam
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startupsTony Tam
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at WordnikTony Tam
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing SwaggerTony Tam
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDBTony Tam
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikTony Tam
 

Plus de Tony Tam (15)

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with Swagger
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with Swagger
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger Inflector
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + Swagger
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-api
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startups
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at Wordnik
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing Swagger
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at Wordnik
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Running MongoDB in the Cloud

  • 1. Running MongoDB in the Cloud Tony Tam @fehguy
  • 2. What this Talk is About Wordnik left the cloud and came back • What?!? • Why we left • Decisions • Why we came back (and what we did differently)
  • 3. Who is Wordnik? • World’s fastest updating English dictionary • Based on input of text at ~8k words/second • Word Graph as basis to our analysis • Synchronous & asynchronous processing • 10’s of Billions of documents in NR storage • Concept & Meaning Discovery Engine • > 20M daily REST API calls, billions served
  • 4. So Why the Detour? • Architectural Choices • Business Choices • Feedback, tooling, infrastructure • Learning • Changes in use case • Progress!
  • 5. Architecture History • EC2-based LAMP Stack • POC (and seed funding) • A manageable corpus < 1M records • REST API • Web + public • MySQL in master/slave • ~1B documents • Operational nightmare
  • 6. Architecture History • MongoDB • First-order MySQL issues solved • But it got slow… • Real Servers to the rescue! • Faster, bigger disks • MongoDB for Corpus, Structured Data • Faster Reads + Writes! • More metal (72GB RAM) • More cores • “cold” query from 400ms to < 100
  • 7. Why Change? Easy! • Can’t beat metal…except • Quick expansion • Batch jobs/experiments • Add a datacenter • Full cluster migration • The bill for unused capacity
  • 8. Architectural Mindshift 1. Anything can die, anytime 2. Centralized, redundant state (see point 1) 3. Server performance is *different* • CPU, I/O, Memory—choose one • Smart design makes it work!
  • 9. Architectural Mindshift • Your software will need to change! • So will the components you rely on
  • 10. Your Infrastructure Cloud Hero • Deploying Servers • Going to need a lot! • Configuration • Updates to your software What about Data?
  • 11. Let’s make this Work! • MySQL Master Slave • Take a snapshot (yes, this will block) • Keep your binlogs! change master to MASTER_HOST='app1', MASTER_USER='XXXX', MASTER_PASSWORD='XXXX', MASTER_LOG_FILE='app1-relay.0038774', MASTER_LOG_POS=6754205951;
  • 12. Let’s make this Work! But… • Your master is down! • Quick, promote a slave! • Point the other slaves to the new master • As for the clients… “Well, we never really tried that…”
  • 13. Better with Mongo • Easy up, easy down! • Startup: Sync your data, and announce to clients when ready for business • Shutdown: Announce your departure and leave • Replica sets rs.add("db4.wordnik.com:27017"); rs.remove("db1.wordnik.com:27017");
  • 15. But what about Performance? • Software Design • It’s slow! (What is *it*?) • Profile everything import com.wordnik.util.perf._ ... def findUser(id:Long): User = { Profile("UserDao::findUserById", dao.findUserById(id)) } http://github.com/wordnik/wordnik-oss
  • 16. But what about Performance?
  • 17. But what about Performance? • “It’s the database!” • What is it? • Mapping layer • Mysql (12+ joins) => 50 records/sec • Mongo JSON  POJO => 1000 records/sec • Mongo DBO  POJO => 35,000 records/sec • How do you know? Profile it!
  • 18. It’s Still Slow! • It’s the index! • How do you know? • AHHHHH
  • 19. It’s Still Slow! • Balance your B-Tree • Can't always keep index in ram. MMF "does it's thing" • Right-balanced b-tree keeps necessary index hot • If you hit indexes on disk, mute your pager 1 7 1 2 5 7
  • 20. But it’s Still Slow! • Look at your Schema design • Design to limit index size/number • _id is your friend—make it meaningful • Record size consistency • Hierarchal Data beware! • Split documents even in same collection! db.posts.find({_id:/^tony_posts_/}) {_id:"tony_posts_1”, posts:[...]} {_id:"tony_posts_2”, posts:[...]} YOUR {_id:"tony_posts_3”, posts:[...]} app knows best
  • 21. Really, it’s STILL slow! • Your monolithic app/DB won’t scale same on VMs • Specialize! • Wordnik uses SOA Powered API swagger.wordnik.com • Data tiers follow service types • Smaller *everything*
  • 22. Really, it’s STILL slow! • Your monolithic app/DB won’t scale same on VMs • Specialize! • Wordnik uses SOA A contract for your swagger.wordnik.com Powered API clients • Data tiers follow service types • Smaller *everything*
  • 23. Be the Boss of your Data • Your app *should* be smarter than your DB • Lots of users? • Lots of blog posts? • Lots of images? • Shard? On what? • Data dimensionality • Keep active data hot • Don’t try to boil the ocean
  • 24. Cloud Computing + Mongo • It can work extremely well • No “Save as Cloud!” menu item • Shifting constraints • Optimize for RAM on VM • Virtual disk => virtual performance • Be “Deployable” • Mongo Replica Sets are made for this
  • 25. Cloud Computing + Mongo • System Durability • Design your software for abuse • Your old design doesn’t apply • Add APM hooks, now! • Dissect your app • Build to micro services with dedicated MongoDB clusters • Deployment Infrastructure • Don’t wait until it’s too late
  • 26. See More • See more about Wordnik APIs http://developer.wordnik.com • Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik • Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam • Swagger API Framework http://swagger.wordnik.com • Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools • Wordnik OSS Tools https://github.com/wordnik/wordnik-oss