This document discusses common use cases for MongoDB and why it is well-suited for them. It describes how MongoDB can handle high volumes of data feeds, operational intelligence and analytics, product data management, user data management, and content management. Its flexible data model, high performance, scalability through sharding and replication, and support for dynamic schemas make it a good fit for applications that need to store large amounts of data, handle high throughput of reads and writes, and have low latency requirements.
2. Intro to NoSQL and
MongoDB
Folllow-up: (completed)
@hungarianhc
kevin@10gen.com How to Get Started
with your MongoDB
Pilot Project
(August 7th)
3. Emerging NoSQL Space
RDBMS RDBMS
RDBMS
Data Data
NoSQL
Warehouse Warehouse
The beginning Last 10 years Today
4. Qualities of NoSQL
Workloads
Flexible data models High Throughput Large Data Sizes
• Lists, Nested Objects • Lots of reads • Aggregate data size
• Sparse schemas • Lots of writes • Number of objects
• Semi-structured data
• Agile Development
Low Latency Cloud Computing Commodity
• Both reads and writes • Run anywhere Hardware
• Millisecond latency • No assumptions about • Ethernet
hardware • Local disks
• No / Few Knobs
5. MongoDB was designed for
this
Flexible data models High Throughput Large Data Sizes
• Lists, Nested Objects • Lots of reads • Aggregate data size
• schemas
• SparseJSON based • writes
• Lots of Replica Sets to • Number of objects shards
• 1000’s of
• Semi-structuredmodel
object data scale reads in a single DB
• Dynamic
• Agile Development • Sharding to • Partitioning of
schemas scale writes data
Low Latency Cloud Computing Commodity
• Both reads and writes • Run anywhere Hardware
• In-memory
• Millisecond latency • No • Scale-out to
assumptions about • Ethernet
• Designed for
cache overcome
hardware • Local disks
• No / Few Knobs “typical” OS and
• Scale-out hardware
local file system
working set limitations
8. High Volume Data Feeds
Machine • More machines, more sensors, more
Generated data
Data • Variably structured
Stock Market • High frequency trading
Data
Social Media • Multiple sources of data
Firehose • Each changes their format constantly
9. High Volume Data Feed
Flexible document
model can adapt to
changes in sensor
format
Asynchronous writes
Data
Data
Sources
Data
Sources
Data Write to memory with
Sources periodic disk flush
Sources
Scale writes over
multiple shards
10. Operational Intelligence
• Large volume of state about users
Ad Targeting • Very strict latency requirements
Customer • Expose report data to millions of customers
Facing • Report on large volumes of data
• Reports that update in real time
Dashboards
Social Media • Need to join the conversation _now_
Monitoring
11. Operational Intelligence
Parallelize queries
Low latency reads
across replicas and
shards
API
In database
aggregation
Dashboards
Flexible schema
adapts to changing
input data
Can use same cluster
to collect, store, and
report on data
12. Behavioral Profiles
Rich profiles
collecting multiple
complex actions
1 See Ad
Scale out to support { cookie_id: “1234512413243”,
high throughput of advertiser:{
apple: {
activities tracked actions: [
2 See Ad { impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
Click { purchase: ‘laptop’, time: 354 }
3 ]
}
}
}
Dynamic schemas
make it easy to track
Indexing and
4 Convert vendor specific
querying to support
attributes
matching, frequency
capping
13. Product Data
E-Commerce
• Diverse product portfolio
Product • Complex querying and filtering
Catalog
• Scale for short bursts of high volume traffic
Flash Sales • Scalable, but consistent view of inventory
14. Product Data
Indexing and rich
query API for easy
searching and sorting
db.products.
find({ “details.author”: “David Eggers” }).
sort({ “title” : -1 });
Flexible data model
for similar, but
different objects
{ sku: “00a9f3a”, { sku: “00e8da9b”,
type: “Book”, type: “MP3”,
details: { details: {
author: “David Eggers”, artist: “John Coltrane”,
title: “You shall know our velocity”, title: “A love supreme”,
isbn: “0-9703355-5-5” length: 123
} }
} }
15. Content Management
• Comments and user generated
News Site content
• Personalization of content, layout
Multi-Device • Generate layout on the fly for each
rendering device that connects
• No need to cache static pages
• Store large objects
Sharing • Simple modeling of metadata
16. Content Management
Geo spatial indexing
Flexible data model for location based
GridFS for large
for similar, but searches
object storage
different objects
{ camera: “Nikon d4”,
location: [ -122.418333, 37.775 ]
}
{ camera: “Canon 5d mkII”,
people: [ “Jim”, “Carol” ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}
{ origin: “facebook.com/photos/xwdf23fsdf”,
license: “Creative Commons CC0”,
size: {
dimensions: [ 124, 52 ],
units: “pixels”
Horizontal scalability }
for large data sets }
17. User Data Management
• User state and session
Video Games management
• Scale out to large graphs
Social Graphs
• Easy to search and process
Identity • Authentication, Authorization
Management and Accounting
18. User Game State
Flexible documents
Easy to store entire supports new game
player state in a features without
single document. schema migration
Sharding enables
whole data set to be
JSON data model
in memory, ensuring
maps well to
low latency
HTML5/JS & Flash
based clients
19. Social Graphs
Native support for
Arrays makes it easy
to store connections
inside user profile
Sharding partitions
user profiles across Documents enable
Social Graph available servers disk locality of all
profile data for a user
21. Good fits for MongoDB
Application Characteristic Why MongoDB might be a good fit
Large number of objects to Sharding lets you split objects across multiple
store servers
High write or read throughput Sharding + Replication lets you scale read and
write traffic across multiple servers
Low Latency Access Memory Mapped storage engine caches
documents in RAM, enabling in-memory
performance. Data locality of documents can
significantly improve latency over join based
approaches
Variable data in objects Dynamic schema and JSON data model enable
flexible data storage without sparse tables or
complex joins
Cloud based deployment Sharding and replication let you work around
hardware limitations in clouds.