SlideShare a Scribd company logo
1 of 49
MongoDB
in Production at
     Sailthru


                             Ian White
                   MongoDB NY Meetup
                            1/11/11 (!)
Sailthru
• API-based transactional email led to...
• Mass campaign email led to...
• Intelligence and user behavior
• Three engineers built the ESP we always
  wanted to use
• Clients: Huffpo, AOL, Swirl, Thrillist
How We Got To
 MongoDB from SQL
• JSON was part of Sailthru infrastructure
  from start (SQL columns and S3)
• Kept a close eye on CouchDB project
• MongoDB felt like natural fit
• Used for user profiles and analytics
• Migrated one table at a time (very, very
  carefully)
Our Cloud (roughly)
                                      load                load
                                    balancer            balancer



                                 web                                web                 web
                 web (ui)                      web (api)
                               (horizon)                           (link)            (failover)




db1        db2           db4       db5            db6         db7              db8          db9   db3




      q1            q2                     proc                     jmailer1          jmailer2
Sailthru Architecture
• User interface to display stats, build
  campaigns and templates, etc (PHP/EC2)
• API, link rewriting, and onsite endpoints
  (PHP/EC2)
• Core mailer engine (Java/EC2 and colo)
• Modified-postfix SMTP servers (colo)
• 9 database servers
MongoDB Overview

• Nine instances on EC2 (4 two-member
  replica sets, 1 backup server)
• About 40 collections
• Largest collection 300mil records, 60GB
• 1000 writes/sec, 2000 reads/sec
Users are Documents

• Users aren’t records split among multiple
  tables
• End user’s lists, clickstream interests,
  geolocation, browser, time of day, purchase
  history becomes one ever-growing
  document
User Profile
{ "_id" : ObjectId("4b2d368aed948543a5fca4b4"), "browser" : { "Chrome" : 3, "Firefox" : 1, "iPhone" : 2 }, "click_count" : 1, "click_time" :
 "Wed Feb 17 2010 09:03:37 GMT-0500 (EST)", "client_id" : 450, "email" : "ibwhite@gmail.com", "email_hour" : { "13" : 1, "14" : 2, "16" : 2,
 "17" : 2, "18" : 3, "21" : 2 }, "geo" : { "city" : { "New York, NY US" : 3, "Sterling, VA US" : 1 }, "count" : 6, "country" : { "US" : 6 },
       "state" : { "NY US" : 3, "VA US" : 1 }, "zip" : { "10011 US" : 1, "10065 US" : 1 } }, "horizon" : { "admob" : 1, "advertising" : 3,
"afghanistan" : 1, "aig" : 2, "airline-industry" : 2, "alleyinsider" : 45, "analyst-research" : 1, "apple" : 25, "apple-tablet" : 5, "att" :
8, "bailout" : 5, "banks" : 6, "barack-obama" : 25, "ben-bernanke" : 1, "big-tech" : 17, "billionaires" : 1, "boats" : 1, "bonus" : 6, "bp" :
1, "budget" : 1, "cable" : 1, "caribbean" : 2, "cars" : 5, "chart-of-the-day" : 3, "china" : 3, "clusterstock" : 36, "cnbc" : 1, "comcast" :
   1, "commodities" : 3, "conan-obrien" : 6, "crime" : 2, "curbedcom" : 1, "death-of-tv" : 1, "debt" : 7, "deepwater-horizon-oil-spill" : 1,
      "dell" : 4, "development" : 1, "dick-fuld" : 1, "economy" : 10, "education" : 1, "employment" : 2, "entertainment" : 7, "europe" : 1,
   "facebook" : 4, "features" : 13, "financial-crisis" : 7, "financial-services" : 2, "fox" : 4, "fraud" : 1, "futures" : 1, "gadgets" : 21,
 "gas" : 1, "gawker" : 5, "gold" : 3, "goldman-sachs" : 1, "google" : 7, "green" : 5, "green-tech" : 2, "health" : 5, "health-care-reform" :
      7, "hedge-funds" : 3, "hires-and-fires" : 1, "housing-crisis" : 1, "hp" : 4, "hulu" : 2, "humor" : 1, "iad" : 1, "international" : 3,
    "investing" : 5, "ios" : 1, "ipad" : 2, "iphone" : 10, "jay-leno" : 5, "jim-cramer" : 1, "jobs" : 2, "john-gruber" : 2, "law-firms" : 1,
         "lawreview" : 3, "lehman-brothers" : 1, "litigation" : 5, "luxury" : 1, "mac" : 1, "magazines" : 1, "markets" : 7, "media" : 20,
        "mercedesbenz" : 4, "microsoft" : 1, "mining" : 1, "mobile" : 14, "mobile-ads" : 2, "moguls" : 1, "money" : 6, "money-media" : 2,
"moneygame" : 16, "morningstar" : 3, "mortgages" : 1, "mtv" : 1, "nbc" : 6, "new-york" : 1, "new-york-times" : 4, "news" : 9, "newspapers" :
   5, "nouriel-roubini" : 6, "oil" : 1, "online" : 10, "optimum-energy" : 4, "paul-krugman" : 3, "people" : 5, "politics" : 26, "radio" : 1,
     "real-estate" : 2, "recession" : 4, "regulation" : 12, "sai" : 15, "satellite-radio" : 1, "scandals" : 5, "security" : 1, "senate" : 4,
 "silicon-alley-insider" : 1, "sirius" : 1, "social-networking" : 3, "sports" : 1, "startups" : 1, "steve-jobs" : 1, "stimulus" : 1, "stock-
 market" : 5, "stocks" : 3, "tax-cuts" : 1, "taxes" : 1, "tbi" : 163, "tbi-live" : 3, "terrorism" : 3, "the-atlantic" : 1, "the-way-we-live-
  now" : 1, "themoneygame" : 3, "thewire" : 17, "tim-geithner" : 3, "time-warner-cable" : 1, "transportation" : 7, "treasury" : 2, "tv" : 7,
"tv-everywhere" : 1, "twitter" : 3, "uk" : 1, "unemployment" : 2, "us-government" : 8, "verizon" : 4, "video" : 6, "wall-st-cheat-sheet" : 1,
  "wall-street" : 25, "wall-street-journal" : 1, "warren-buffett" : 1, "white-house" : 4, "wwdc-2010" : 1, "yachts" : 1, "10gen" : 1, "2010-
   world-cup" : 1 }, "horizon_count" : 303, "horizon_time" : "Tue Dec 07 2010 15:26:35 GMT-0500 (EST)", "lists" : [ "TBI Research 1 - Beta",
"Dedicated Email", "TBI Research", "411" ], "lists_signup" : { "BI_iphone App" : null, "Clusterstock Chart Of The Day" : null, "Clusterstock
Select" : null, "Dedicated Email" : "Tue Dec 22 2009 13:29:43 GMT-0500 (EST)", "Dedicated Email - The Ladders" : null, "Green Sheet Select" :
null, "Insider 411" : null, "Insider 411 - Economist" : null, "Insider 411 - Ooyala" : null, "Insider 411 - The Wire Promo" : null, "Insider
  411- Economist" : null, "Law Review Select" : null, "Media Select" : null, "Silicon Alley Insider Chart Of The Day" : null, "Silicon Alley
     Insider Select" : null, "TBI Research" : "Tue Jan 05 2010 13:58:09 GMT-0500 (EST)", "TBI Research 1 - Beta" : "Mon Nov 09 2009 12:34:58
  GMT-0500 (EST)", "TBI Select" : null, "The Money Game Select" : null, "War Room Select" : null, "z_sailthru" : null, "10 Things Before the
      Opening Bell" : null, "411" : "Wed Jul 07 2010 11:28:03 GMT-0400 (EDT)" }, "open_count" : 11, "open_time" : "Tue Dec 07 2010 13:30:31
  GMT-0500 (EST)", "optout_templates" : [ ], "order" : 12, "signup_time" : "Mon Nov 09 2009 12:34:58 GMT-0500 (EST)", "site_hour" : { "20" :
 1 }, "status" : null, "status_time" : "Thu Jan 06 2011 11:09:54 GMT-0500 (EST)", "ts" : "Thu Jan 06 2011 11:09:54 GMT-0500 (EST)", "urls" :
                            [ "http://www.businessinsider.com/" ], "urls_count" : 1, "vars" : { "name" : "eonwhite" } }
Profiles Accessible
       Everywhere
• Put abandoned shopping cart notifications
  within a mass email
{if profile.purchase_incomplete}
 <p>This is what’s in your cart:</p>
 {foreach profile.purchase_incomplete.items as item}
   {item.qty} <a href=”{item.url}”>{item.title}</a><br/>
 {/foreach}
{/if}
Profiles Accessible
       Everywhere
• Show a section of content conditional on
  the user’s location

{if profile.geo.city[‘New York, NY US’] > 0}
  <div>Come to the New York Meetup on the 27th!</div>
{/if}
Profiles Accessible
        Everywhere
• Show different content depending on user
   interests as measured by on-site behavior
{select}
  {case horizon_interest('black,dark')}
    <img src="http://example.com/dress-image-black.jpg" />
  {/case}
  {case horizon_interest('green')}
    <img src="http://example.com/dress-image-green.jpg" />
  {/case}
  {case horizon_interest('purple,polka_dot,pattern')}
    <img src="http://example.com/dress-image-polkadot.jpg" />
  {/case}
{/select}
Profiles Accessible
        Everywhere
• Pick top content from a data feed based on
   tags


{set(‘myheadlines’,horizon_select(allheadlines,10))}

{foreach myheadlines as h}
  <a href=”{h.url}”>{h.title}</a><br/>
{/foreach}
Other Advantages of
     MongoDB
• High performance
• Take any parameters from our clients
• Really flexible development
• Great for analytics (internal and external)
• No more downtime for schema migrations
  or reindexing
How We Run mongod
•   mongod --dbpath /path/to/db --logpath /path/to/log/
    mongodb.log --logappend --fork --rest --replSet
    main1


• Don’t ever run without replication
• Don’t ever kill -9
• Don’t run without writing to a log
• Run behind a firewall
• Take frequent mongodump backups
• Use --rest, it’s handy
Separate DBs By
       Collections
• Lower-effort than auto-sharding
• Separate databases for different usage
  patterns
• Consider consequences of database failure/
  unavailability
• But make sure your backup and monitoring
  strategy is prepared for multiple DBs
main DB
• core database functionality, aggregate stats,
  editing, low overall usage
• smaller instances than the other databases
• all collections that don’t have scaling challenges go
  in here
• will probably never have to shard this
email DB
• holds every message ever sent, plus link rewriting
• contains our largest collections (half billion docs)
• high write demands at peak send times
• will probably be the first thing we have to look at
  sharding
horizon DB
• browsing data for onsite usage
• high number of reads from a very small collection
  (aggregate site data) - this may get cached soon
• not that many writes now, will get higher
• logically separated so that a failure caused by
  traffic spike will not affect other operations
profile DB

• contains only user profiles - around 30 million
• separated out because access is much more
  random and much more of the total dataset must
  be in memory
• lots of big expensive queries that must happen on
  slave/secondary
Monitoring

• Some stuff to monitor: faults/sec, index
  misses, % locked, queue size, load average
• we check basic status once/minute on all
  database servers (SMS alerts if down), email
  warnings on thresholds every 10 minutes
• some cacti graphs (looking to improve)
Migrating From MySQL

• Take it one collection at a time (not table)
• Change code to write to both MySQL and
  MongoDB
• Write and run script to backfill old data
• Remove code that writes to MySQL
Thoughts On Migrating

• Take advantage of MongoDB’s flexibility
• Rethink your schema
• Reduce the number of tables/collections
DESIGN
Develop Your Mental
 Model of MongoDB

• You don’t need to look at the internals
• But try to gain a working understanding of
  how MongoDB operates, especially RAM
  and indexes
Disk Access
          Will Kill You
• (on EC2 anyway)
• ... so working set RAM is crucial
• Watch faults/sec in mongostat verrrry
  closely... it is the sign of impending doom
• With SSD maybe this isn’t quite as much of
  an issue
Some Design
    Questions To Ask
• What is the most common read scenario?
• How common are reads vs writes?
• Embed vs top-level collection?
• Denormalize (double-store data)?
• How many/which indexes?
• Arrays vs hashes for embedding?
• Optimize in favor of your major use cases
“But premature
  optimization is evil”
• Knuth said that about code, which is
  flexible and easy to optimize later
• Data is not as flexible as code
• So doing some planning for performance is
  usually good when it comes to your data
Questions To Ask


• How big will this collection get?
• The bigger the collection, the more
  planning it needs
Favor Human-Readable
     Foreign Keys
• DBRefs are a bit cumbersome
• Referencing by MongoId often means doing
  extra lookups
• Build human-readable references to save
  you doing lookups and manual joins
Example



• Store the Template and the Email as strings
    on the message object
•   { template: “Internal - Blast Notify”, email:
    “support-alerts@sailthru.com” }


• No external reference lookups required
• The tradeoff is basically just disk space
Embed vs Top-Level
     Collections?
• The great question of MongoDB schema
  design
• If you can ask the question at all, you might
  want to err on the side of embedding
• Don’t embed if the embedding could get
  huge
• Don’t feel too bad about denormalizing by
  embedding AND storing in a top-level
  collection
Embedding Pros
• Super-fast retrieval of document with
  related data
• Atomic updates
• “Ownership” of embedded document is
  obvious
• Usually maps well to code structures
Embedding Cons

• Harder to get at, do mass queries
• Does not size up infinitely, will hit 4MB limit
• Hard to create references to embedded
  object
• Can’t index within the embedded objects
Indexes
• Index all highly frequent queries
• Do less-indexed queries only on slaves
• Reduce the size of indexes whereever you
  can on big collections
• Don’t sweat the medium-sized collections,
  focus on the big wins
Take Advantage of
     Multikey Indexes
• Order matters
• If you have an index on {client_id:
  1, email: 1 }

• Then you also have the {client_id:
  1} index “for free”

• but not {   email: 1}
Use your _id


• You must use an _id for every collection,
  which will cost you index size
• So do something useful with _id
Take advantage of fast
      ^indexes
• Messages have _ids like: 32423.00000341
• Need all messages in blast 32423:
• db.message.blast.find(
        { _id: /^32423./ } );

•   (Yeah, I know the . is ugly. Don’t use a dot if you do this.)
Organize Indexes To
Minimize Working RAM
• Finding the most recent messages sent to a
  user:
• The obvious index is { client_id: 1, email: 1,
  send_time: 1 }
• A more efficient index: { month: 1,
  client_id: 1, email: 1, send_time: 1 }
SOME TIPS AND
RECOMMENDATIONS
    (all just my opinion, take it or leave it)
Minimize Documents
   Moving On Disk
• Documents get moved when they exceed
  their initial size + padding factor
• You will see “moved” in the log
• So if there are fields that are likely to get
  populated later, pre-populate them with
  empty data on insert (to get Mongo to
  preallocate more space)
Autoincrement in
       MongoDB?
• Can be safely emulated with no race
  conditions with findAndModify
• Generally best avoided, especially with any
  collection with high numbers of inserts
• But useful for human-readable ids!
• 521 vs 4b2d368aed948543a5fca4b4
• We use it for clients and blasts (which is
  what our Support team needs)
Notes on Types

• Currency: beware of floating-point
  problems, store in pennies
• Dates: BSON Dates are better for
  timestamps, not abstract days; store as
  YYYYMMDD ints or strings
Consider Before You
   Use A Mapper
• ODMs are less necessary than ORMs since
  there is much much less mapping to do
• If it helps you, cool -- just make sure you’re
  not using one out of relational reflex
• If you are building a scalable system you
  don’t want to abstract away performance
No Silver Bullet:
   Use The Right Tool
• We store a fair amount of archival data
  (TB) in flatfiles on S3
• Big data that does not need random or
  frequent access and would be unwieldy in
  the database
• Could this data be in MongoDB? Maybe in
  GridFS? Yes. But ultimately cheaper on S3.
Queues Are Your
         Friend
• Sailthru had them when we needed them
  (MySQL). We still have them because
  they’re so useful
• Allows you to “spread out” the updates
  from peak request load
• Allows you to shut off writes in
  emergencies or during database upgrades
  without losing data
Have An Upgrade Plan
• Create a mode for your site where writes
  queue instead of going to MongoDB
• Turn off writes
• Point reads to slave (if not using repl sets)
• Do what you have to do (upgrade, etc)
• Point reads back to master
• Turn on writes again
When Things Go
        Wrong
• mongostat is your friend
• So is the REST interface
• So is the log, grep it for slow queries
• iostat -x 2 to see if disks are saturated
• Always be monitoring to be warned of
  coming problems
Oh Yeah, By The Way


• We’re hiring developers and sysadmins
• jobs@sailthru.com
Questions?

  ian@sailthru.com
twitter.com/eonwhite

More Related Content

What's hot

Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkMongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsMongoDB
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBMongoDB
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformaticsosintegrators
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkTyler Brock
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedMongoDB
 
Edição de Texto Rico com React e Draft.js
Edição de Texto Rico com React e Draft.jsEdição de Texto Rico com React e Draft.js
Edição de Texto Rico com React e Draft.jsGuilherme Vierno
 

What's hot (20)

Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Webinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to BasicsWebinar: Getting Started with MongoDB - Back to Basics
Webinar: Getting Started with MongoDB - Back to Basics
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDB
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformatics
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
 
Edição de Texto Rico com React e Draft.js
Edição de Texto Rico com React e Draft.jsEdição de Texto Rico com React e Draft.js
Edição de Texto Rico com React e Draft.js
 

Viewers also liked

247 overviewmongodbevening-bangalore
247 overviewmongodbevening-bangalore247 overviewmongodbevening-bangalore
247 overviewmongodbevening-bangaloreMongoDB APAC
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
An afternoon with mongo db new delhi
An afternoon with mongo db new delhiAn afternoon with mongo db new delhi
An afternoon with mongo db new delhiRajnish Verma
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMatias Cascallares
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
An Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerAn Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerMongoDB
 

Viewers also liked (10)

Pelicamigrator
PelicamigratorPelicamigrator
Pelicamigrator
 
247 overviewmongodbevening-bangalore
247 overviewmongodbevening-bangalore247 overviewmongodbevening-bangalore
247 overviewmongodbevening-bangalore
 
Rpsonmongodb
RpsonmongodbRpsonmongodb
Rpsonmongodb
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
What's new in MongoDB 2.6
What's new in MongoDB 2.6What's new in MongoDB 2.6
What's new in MongoDB 2.6
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
An afternoon with mongo db new delhi
An afternoon with mongo db new delhiAn afternoon with mongo db new delhi
An afternoon with mongo db new delhi
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single click
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
An Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerAn Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops Manager
 

Similar to MongoDB in Production at Sailthru: How We Architected for Scale

IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System ModernisationMongoDB
 
Dynamic Apps with WebSockets and MQTT - IBM Impact 2014
Dynamic Apps with WebSockets and MQTT - IBM Impact 2014Dynamic Apps with WebSockets and MQTT - IBM Impact 2014
Dynamic Apps with WebSockets and MQTT - IBM Impact 2014Bryan Boyd
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Pivotal Open Source: Using Fluentd to gain insights into your logs
Pivotal Open Source:  Using Fluentd to gain insights into your logsPivotal Open Source:  Using Fluentd to gain insights into your logs
Pivotal Open Source: Using Fluentd to gain insights into your logsKiyoto Tamura
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responsesdarrelmiller71
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkMongoDB
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
PyData Berlin Meetup
PyData Berlin MeetupPyData Berlin Meetup
PyData Berlin MeetupSteffen Wenz
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoringspil-engineering
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台jins0618
 
Introduction to Processing
Introduction to ProcessingIntroduction to Processing
Introduction to Processingsiufu
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source BridgeChris Anderson
 
Bringing Data Analytics to the Edge
Bringing Data Analytics to the EdgeBringing Data Analytics to the Edge
Bringing Data Analytics to the EdgeTon Machielsen
 
Montreal Elasticsearch Meetup
Montreal Elasticsearch MeetupMontreal Elasticsearch Meetup
Montreal Elasticsearch MeetupLoïc Bertron
 

Similar to MongoDB in Production at Sailthru: How We Architected for Scale (20)

IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
 
Dynamic Apps with WebSockets and MQTT - IBM Impact 2014
Dynamic Apps with WebSockets and MQTT - IBM Impact 2014Dynamic Apps with WebSockets and MQTT - IBM Impact 2014
Dynamic Apps with WebSockets and MQTT - IBM Impact 2014
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Pivotal Open Source: Using Fluentd to gain insights into your logs
Pivotal Open Source:  Using Fluentd to gain insights into your logsPivotal Open Source:  Using Fluentd to gain insights into your logs
Pivotal Open Source: Using Fluentd to gain insights into your logs
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responses
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Couchdb Nosql
Couchdb NosqlCouchdb Nosql
Couchdb Nosql
 
PyData Berlin Meetup
PyData Berlin MeetupPyData Berlin Meetup
PyData Berlin Meetup
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台
 
Introduction to Processing
Introduction to ProcessingIntroduction to Processing
Introduction to Processing
 
CouchDB Open Source Bridge
CouchDB Open Source BridgeCouchDB Open Source Bridge
CouchDB Open Source Bridge
 
Bringing Data Analytics to the Edge
Bringing Data Analytics to the EdgeBringing Data Analytics to the Edge
Bringing Data Analytics to the Edge
 
Montreal Elasticsearch Meetup
Montreal Elasticsearch MeetupMontreal Elasticsearch Meetup
Montreal Elasticsearch Meetup
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

MongoDB in Production at Sailthru: How We Architected for Scale

  • 1. MongoDB in Production at Sailthru Ian White MongoDB NY Meetup 1/11/11 (!)
  • 2. Sailthru • API-based transactional email led to... • Mass campaign email led to... • Intelligence and user behavior • Three engineers built the ESP we always wanted to use • Clients: Huffpo, AOL, Swirl, Thrillist
  • 3. How We Got To MongoDB from SQL • JSON was part of Sailthru infrastructure from start (SQL columns and S3) • Kept a close eye on CouchDB project • MongoDB felt like natural fit • Used for user profiles and analytics • Migrated one table at a time (very, very carefully)
  • 4. Our Cloud (roughly) load load balancer balancer web web web web (ui) web (api) (horizon) (link) (failover) db1 db2 db4 db5 db6 db7 db8 db9 db3 q1 q2 proc jmailer1 jmailer2
  • 5. Sailthru Architecture • User interface to display stats, build campaigns and templates, etc (PHP/EC2) • API, link rewriting, and onsite endpoints (PHP/EC2) • Core mailer engine (Java/EC2 and colo) • Modified-postfix SMTP servers (colo) • 9 database servers
  • 6. MongoDB Overview • Nine instances on EC2 (4 two-member replica sets, 1 backup server) • About 40 collections • Largest collection 300mil records, 60GB • 1000 writes/sec, 2000 reads/sec
  • 7. Users are Documents • Users aren’t records split among multiple tables • End user’s lists, clickstream interests, geolocation, browser, time of day, purchase history becomes one ever-growing document
  • 8. User Profile { "_id" : ObjectId("4b2d368aed948543a5fca4b4"), "browser" : { "Chrome" : 3, "Firefox" : 1, "iPhone" : 2 }, "click_count" : 1, "click_time" : "Wed Feb 17 2010 09:03:37 GMT-0500 (EST)", "client_id" : 450, "email" : "ibwhite@gmail.com", "email_hour" : { "13" : 1, "14" : 2, "16" : 2, "17" : 2, "18" : 3, "21" : 2 }, "geo" : { "city" : { "New York, NY US" : 3, "Sterling, VA US" : 1 }, "count" : 6, "country" : { "US" : 6 }, "state" : { "NY US" : 3, "VA US" : 1 }, "zip" : { "10011 US" : 1, "10065 US" : 1 } }, "horizon" : { "admob" : 1, "advertising" : 3, "afghanistan" : 1, "aig" : 2, "airline-industry" : 2, "alleyinsider" : 45, "analyst-research" : 1, "apple" : 25, "apple-tablet" : 5, "att" : 8, "bailout" : 5, "banks" : 6, "barack-obama" : 25, "ben-bernanke" : 1, "big-tech" : 17, "billionaires" : 1, "boats" : 1, "bonus" : 6, "bp" : 1, "budget" : 1, "cable" : 1, "caribbean" : 2, "cars" : 5, "chart-of-the-day" : 3, "china" : 3, "clusterstock" : 36, "cnbc" : 1, "comcast" : 1, "commodities" : 3, "conan-obrien" : 6, "crime" : 2, "curbedcom" : 1, "death-of-tv" : 1, "debt" : 7, "deepwater-horizon-oil-spill" : 1, "dell" : 4, "development" : 1, "dick-fuld" : 1, "economy" : 10, "education" : 1, "employment" : 2, "entertainment" : 7, "europe" : 1, "facebook" : 4, "features" : 13, "financial-crisis" : 7, "financial-services" : 2, "fox" : 4, "fraud" : 1, "futures" : 1, "gadgets" : 21, "gas" : 1, "gawker" : 5, "gold" : 3, "goldman-sachs" : 1, "google" : 7, "green" : 5, "green-tech" : 2, "health" : 5, "health-care-reform" : 7, "hedge-funds" : 3, "hires-and-fires" : 1, "housing-crisis" : 1, "hp" : 4, "hulu" : 2, "humor" : 1, "iad" : 1, "international" : 3, "investing" : 5, "ios" : 1, "ipad" : 2, "iphone" : 10, "jay-leno" : 5, "jim-cramer" : 1, "jobs" : 2, "john-gruber" : 2, "law-firms" : 1, "lawreview" : 3, "lehman-brothers" : 1, "litigation" : 5, "luxury" : 1, "mac" : 1, "magazines" : 1, "markets" : 7, "media" : 20, "mercedesbenz" : 4, "microsoft" : 1, "mining" : 1, "mobile" : 14, "mobile-ads" : 2, "moguls" : 1, "money" : 6, "money-media" : 2, "moneygame" : 16, "morningstar" : 3, "mortgages" : 1, "mtv" : 1, "nbc" : 6, "new-york" : 1, "new-york-times" : 4, "news" : 9, "newspapers" : 5, "nouriel-roubini" : 6, "oil" : 1, "online" : 10, "optimum-energy" : 4, "paul-krugman" : 3, "people" : 5, "politics" : 26, "radio" : 1, "real-estate" : 2, "recession" : 4, "regulation" : 12, "sai" : 15, "satellite-radio" : 1, "scandals" : 5, "security" : 1, "senate" : 4, "silicon-alley-insider" : 1, "sirius" : 1, "social-networking" : 3, "sports" : 1, "startups" : 1, "steve-jobs" : 1, "stimulus" : 1, "stock- market" : 5, "stocks" : 3, "tax-cuts" : 1, "taxes" : 1, "tbi" : 163, "tbi-live" : 3, "terrorism" : 3, "the-atlantic" : 1, "the-way-we-live- now" : 1, "themoneygame" : 3, "thewire" : 17, "tim-geithner" : 3, "time-warner-cable" : 1, "transportation" : 7, "treasury" : 2, "tv" : 7, "tv-everywhere" : 1, "twitter" : 3, "uk" : 1, "unemployment" : 2, "us-government" : 8, "verizon" : 4, "video" : 6, "wall-st-cheat-sheet" : 1, "wall-street" : 25, "wall-street-journal" : 1, "warren-buffett" : 1, "white-house" : 4, "wwdc-2010" : 1, "yachts" : 1, "10gen" : 1, "2010- world-cup" : 1 }, "horizon_count" : 303, "horizon_time" : "Tue Dec 07 2010 15:26:35 GMT-0500 (EST)", "lists" : [ "TBI Research 1 - Beta", "Dedicated Email", "TBI Research", "411" ], "lists_signup" : { "BI_iphone App" : null, "Clusterstock Chart Of The Day" : null, "Clusterstock Select" : null, "Dedicated Email" : "Tue Dec 22 2009 13:29:43 GMT-0500 (EST)", "Dedicated Email - The Ladders" : null, "Green Sheet Select" : null, "Insider 411" : null, "Insider 411 - Economist" : null, "Insider 411 - Ooyala" : null, "Insider 411 - The Wire Promo" : null, "Insider 411- Economist" : null, "Law Review Select" : null, "Media Select" : null, "Silicon Alley Insider Chart Of The Day" : null, "Silicon Alley Insider Select" : null, "TBI Research" : "Tue Jan 05 2010 13:58:09 GMT-0500 (EST)", "TBI Research 1 - Beta" : "Mon Nov 09 2009 12:34:58 GMT-0500 (EST)", "TBI Select" : null, "The Money Game Select" : null, "War Room Select" : null, "z_sailthru" : null, "10 Things Before the Opening Bell" : null, "411" : "Wed Jul 07 2010 11:28:03 GMT-0400 (EDT)" }, "open_count" : 11, "open_time" : "Tue Dec 07 2010 13:30:31 GMT-0500 (EST)", "optout_templates" : [ ], "order" : 12, "signup_time" : "Mon Nov 09 2009 12:34:58 GMT-0500 (EST)", "site_hour" : { "20" : 1 }, "status" : null, "status_time" : "Thu Jan 06 2011 11:09:54 GMT-0500 (EST)", "ts" : "Thu Jan 06 2011 11:09:54 GMT-0500 (EST)", "urls" : [ "http://www.businessinsider.com/" ], "urls_count" : 1, "vars" : { "name" : "eonwhite" } }
  • 9. Profiles Accessible Everywhere • Put abandoned shopping cart notifications within a mass email {if profile.purchase_incomplete} <p>This is what’s in your cart:</p> {foreach profile.purchase_incomplete.items as item} {item.qty} <a href=”{item.url}”>{item.title}</a><br/> {/foreach} {/if}
  • 10. Profiles Accessible Everywhere • Show a section of content conditional on the user’s location {if profile.geo.city[‘New York, NY US’] > 0} <div>Come to the New York Meetup on the 27th!</div> {/if}
  • 11. Profiles Accessible Everywhere • Show different content depending on user interests as measured by on-site behavior {select} {case horizon_interest('black,dark')} <img src="http://example.com/dress-image-black.jpg" /> {/case} {case horizon_interest('green')} <img src="http://example.com/dress-image-green.jpg" /> {/case} {case horizon_interest('purple,polka_dot,pattern')} <img src="http://example.com/dress-image-polkadot.jpg" /> {/case} {/select}
  • 12. Profiles Accessible Everywhere • Pick top content from a data feed based on tags {set(‘myheadlines’,horizon_select(allheadlines,10))} {foreach myheadlines as h} <a href=”{h.url}”>{h.title}</a><br/> {/foreach}
  • 13. Other Advantages of MongoDB • High performance • Take any parameters from our clients • Really flexible development • Great for analytics (internal and external) • No more downtime for schema migrations or reindexing
  • 14. How We Run mongod • mongod --dbpath /path/to/db --logpath /path/to/log/ mongodb.log --logappend --fork --rest --replSet main1 • Don’t ever run without replication • Don’t ever kill -9 • Don’t run without writing to a log • Run behind a firewall • Take frequent mongodump backups • Use --rest, it’s handy
  • 15. Separate DBs By Collections • Lower-effort than auto-sharding • Separate databases for different usage patterns • Consider consequences of database failure/ unavailability • But make sure your backup and monitoring strategy is prepared for multiple DBs
  • 16. main DB • core database functionality, aggregate stats, editing, low overall usage • smaller instances than the other databases • all collections that don’t have scaling challenges go in here • will probably never have to shard this
  • 17. email DB • holds every message ever sent, plus link rewriting • contains our largest collections (half billion docs) • high write demands at peak send times • will probably be the first thing we have to look at sharding
  • 18. horizon DB • browsing data for onsite usage • high number of reads from a very small collection (aggregate site data) - this may get cached soon • not that many writes now, will get higher • logically separated so that a failure caused by traffic spike will not affect other operations
  • 19. profile DB • contains only user profiles - around 30 million • separated out because access is much more random and much more of the total dataset must be in memory • lots of big expensive queries that must happen on slave/secondary
  • 20. Monitoring • Some stuff to monitor: faults/sec, index misses, % locked, queue size, load average • we check basic status once/minute on all database servers (SMS alerts if down), email warnings on thresholds every 10 minutes • some cacti graphs (looking to improve)
  • 21. Migrating From MySQL • Take it one collection at a time (not table) • Change code to write to both MySQL and MongoDB • Write and run script to backfill old data • Remove code that writes to MySQL
  • 22. Thoughts On Migrating • Take advantage of MongoDB’s flexibility • Rethink your schema • Reduce the number of tables/collections
  • 24. Develop Your Mental Model of MongoDB • You don’t need to look at the internals • But try to gain a working understanding of how MongoDB operates, especially RAM and indexes
  • 25. Disk Access Will Kill You • (on EC2 anyway) • ... so working set RAM is crucial • Watch faults/sec in mongostat verrrry closely... it is the sign of impending doom • With SSD maybe this isn’t quite as much of an issue
  • 26. Some Design Questions To Ask • What is the most common read scenario? • How common are reads vs writes? • Embed vs top-level collection? • Denormalize (double-store data)? • How many/which indexes? • Arrays vs hashes for embedding? • Optimize in favor of your major use cases
  • 27. “But premature optimization is evil” • Knuth said that about code, which is flexible and easy to optimize later • Data is not as flexible as code • So doing some planning for performance is usually good when it comes to your data
  • 28. Questions To Ask • How big will this collection get? • The bigger the collection, the more planning it needs
  • 29. Favor Human-Readable Foreign Keys • DBRefs are a bit cumbersome • Referencing by MongoId often means doing extra lookups • Build human-readable references to save you doing lookups and manual joins
  • 30. Example • Store the Template and the Email as strings on the message object • { template: “Internal - Blast Notify”, email: “support-alerts@sailthru.com” } • No external reference lookups required • The tradeoff is basically just disk space
  • 31. Embed vs Top-Level Collections? • The great question of MongoDB schema design • If you can ask the question at all, you might want to err on the side of embedding • Don’t embed if the embedding could get huge • Don’t feel too bad about denormalizing by embedding AND storing in a top-level collection
  • 32. Embedding Pros • Super-fast retrieval of document with related data • Atomic updates • “Ownership” of embedded document is obvious • Usually maps well to code structures
  • 33. Embedding Cons • Harder to get at, do mass queries • Does not size up infinitely, will hit 4MB limit • Hard to create references to embedded object • Can’t index within the embedded objects
  • 34. Indexes • Index all highly frequent queries • Do less-indexed queries only on slaves • Reduce the size of indexes whereever you can on big collections • Don’t sweat the medium-sized collections, focus on the big wins
  • 35. Take Advantage of Multikey Indexes • Order matters • If you have an index on {client_id: 1, email: 1 } • Then you also have the {client_id: 1} index “for free” • but not { email: 1}
  • 36. Use your _id • You must use an _id for every collection, which will cost you index size • So do something useful with _id
  • 37. Take advantage of fast ^indexes • Messages have _ids like: 32423.00000341 • Need all messages in blast 32423: • db.message.blast.find( { _id: /^32423./ } ); • (Yeah, I know the . is ugly. Don’t use a dot if you do this.)
  • 38. Organize Indexes To Minimize Working RAM • Finding the most recent messages sent to a user: • The obvious index is { client_id: 1, email: 1, send_time: 1 } • A more efficient index: { month: 1, client_id: 1, email: 1, send_time: 1 }
  • 39. SOME TIPS AND RECOMMENDATIONS (all just my opinion, take it or leave it)
  • 40. Minimize Documents Moving On Disk • Documents get moved when they exceed their initial size + padding factor • You will see “moved” in the log • So if there are fields that are likely to get populated later, pre-populate them with empty data on insert (to get Mongo to preallocate more space)
  • 41. Autoincrement in MongoDB? • Can be safely emulated with no race conditions with findAndModify • Generally best avoided, especially with any collection with high numbers of inserts • But useful for human-readable ids! • 521 vs 4b2d368aed948543a5fca4b4 • We use it for clients and blasts (which is what our Support team needs)
  • 42. Notes on Types • Currency: beware of floating-point problems, store in pennies • Dates: BSON Dates are better for timestamps, not abstract days; store as YYYYMMDD ints or strings
  • 43. Consider Before You Use A Mapper • ODMs are less necessary than ORMs since there is much much less mapping to do • If it helps you, cool -- just make sure you’re not using one out of relational reflex • If you are building a scalable system you don’t want to abstract away performance
  • 44. No Silver Bullet: Use The Right Tool • We store a fair amount of archival data (TB) in flatfiles on S3 • Big data that does not need random or frequent access and would be unwieldy in the database • Could this data be in MongoDB? Maybe in GridFS? Yes. But ultimately cheaper on S3.
  • 45. Queues Are Your Friend • Sailthru had them when we needed them (MySQL). We still have them because they’re so useful • Allows you to “spread out” the updates from peak request load • Allows you to shut off writes in emergencies or during database upgrades without losing data
  • 46. Have An Upgrade Plan • Create a mode for your site where writes queue instead of going to MongoDB • Turn off writes • Point reads to slave (if not using repl sets) • Do what you have to do (upgrade, etc) • Point reads back to master • Turn on writes again
  • 47. When Things Go Wrong • mongostat is your friend • So is the REST interface • So is the log, grep it for slow queries • iostat -x 2 to see if disks are saturated • Always be monitoring to be warned of coming problems
  • 48. Oh Yeah, By The Way • We’re hiring developers and sysadmins • jobs@sailthru.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n