MongoDB ne fonctionne pas comme les autres bases de données. Son modèle de données orienté documents, son partitionnement en gammes et sa cohérence forte sont bien adaptés à certains problèmes et moins adaptés à d'autres. Dans ce séminaire Web, nous étudierons des exemples réels d'utilisation de MongoDB mettant à profit ces fonctionnalités uniques. Nous évoquerons le cas de clients spécifiques qui utilisent MongoDB et nous verrons la façon dont ils ont implémenté leur solution. Nous vous montrerons également comment construire une solution du même type pour votre entreprise.
2. Part 1 of a series
Real-Time
Analytics with
Mongodb
April 12th
Content
Management with
MongoDB
May 17th
@forjared
3. TodayLast 10 years
Emerging NoSQL Space
RDBMS
Data
Warehou
se
NoSQL
RDBMS
Data
Warehou
se
The beginning
RDBMS
4. Qualities of NoSQL
Workloads
Flexible data models
• Lists, Nested Objects
• Sparse schemas
• Semi-structured data
• Agile Development
High Throughput
• Lots of reads
• Lots of writes
Large Data Sizes
• Aggregate data size
• Number of objects
Low Latency
• Both reads and writes
• Millisecond latency
Cloud Computing
• Run anywhere
• No assumptions about
hardware
• No / Few Knobs
Commodity
Hardware
• Ethernet
• Local disks
5. MongoDB was designed for
this
Flexible data models
• Lists, Nested Objects
• Sparse schemas
• Semi-structured data
• Agile Development
High Throughput
• Lots of reads
• Lots of writes
Large Data Sizes
• Aggregate data size
• Number of objects
Low Latency
• Both reads and writes
• Millisecond latency
Cloud Computing
• Run anywhere
• No assumptions about
hardware
• No / Few Knobs
Commodity
Hardware
• Ethernet
• Local disks
• JSON based
object model
• Dynamic
schemas
• Replica Sets to
scale reads
• Sharding to
scale writes
• 1000’s of shards
in a single DB
• Partitioning of
data
• In-memory
cache
• Scale-out
working set
• Scale-out to
overcome
hardware
limitations
• Designed for
“typical” OS and
local file system
6. Example customers
User Data Management High Volume Data Feeds
Content Management Operational Intelligence Product Data Management
8. High Volume Data Feeds
• More machines, more sensors, more
data
• Variably structured
Machine
Generated
Data
• High frequency trading
Stock Market
Data
• Multiple sources of data
• Each changes their format constantly
Social Media
Firehose
9. High Volume Data Feed
Data
Sources
Asynchronous writes
Flexible document
model can adapt to
changes in sensor
format
Write to memory with
periodic disk flush
Data
Sources
Data
Sources
Data
Sources
Scale writes over
multiple shards
10. Operational Intelligence
• Large volume of state about users
• Very strict latency requirementsAd Targeting
• Expose report data to millions of customers
• Report on large volumes of data
• Reports that update in real time
Customer
Facing
Dashboards
• Need to join the conversation _now_
Social Media
Monitoring
11. Operational Intelligence
Dashboards
API
Low latency reads
Parallelize queries
across replicas and
shards
In database
aggregation
Flexible schema
adapts to changing
input data
Can use same cluster
to collect, store, and
report on data
12. Behavioral Profiles
1
2
3
See Ad
See Ad
4
Click
Convert
{ cookie_id: ‚1234512413243‛,
advertiser:{
apple: {
actions: [
{ impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
{ purchase: ‘laptop’, time: 354 }
]
}
}
}
Rich profiles
collecting multiple
complex actions
Scale out to support
high throughput of
activities tracked
Indexing and
querying to support
matching, frequency
capping
Dynamic schemas
make it easy to track
vendor specific
attributes
13. Product Data
• Diverse product portfolio
• Complex querying and filtering
E-Commerce
Product
Catalog
• Scale for short bursts of high volume traffic
• Scalable, but consistent view of inventoryFlash Sales
14. Product Data
{ sku: ‚00e8da9b‛,
type: ‚MP3‛,
details: {
artist: ‚John Coltrane‛,
title: ‚A love supreme‛,
length: 123
}
}
{ sku: ‚00a9f3a‛,
type: ‚Book‛,
details: {
author: ‚David Eggers‛,
title: ‚You shall know our velocity‛,
isbn: ‚0-9703355-5-5‛
}
}
Flexible data model
for similar, but
different objects
Indexing and rich
query API for easy
searching and sorting
db.products.
find({ ‚details.author”: ‚David Eggers‛ }).
sort({ ‚title‛ : -1 });
15. Content Management
• Comments and user generated
content
• Personalization of content, layout
News Site
• Generate layout on the fly for each
device that connects
• No need to cache static pages
Multi-Device
rendering
• Store large objects
• Simple modeling of metadata
Sharing
16. Content Management
{ camera: ‚Nikon d4‛,
location: [ -122.418333, 37.775 ]
}
{ camera: ‚Canon 5d mkII‛,
people: [ ‚Jim‛, ‚Carol‛ ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}
{ origin: ‚facebook.com/photos/xwdf23fsdf‛,
license: ‚Creative Commons CC0‛,
size: {
dimensions: [ 124, 52 ],
units: ‚pixels‛
}
}
Flexible data model
for similar, but
different objects
Horizontal scalability
for large data sets
Geo spatial indexing
for location based
searches
GridFS for large
object storage
17. User Data Management
• User state and session
managementVideo Games
• Scale out to large graphs
• Easy to search and process
Social Graphs
• Authentication, Authorization
and Accounting
Identity
Management
18. User Game State
Flexible documents
supports new game
features without
schema migration
Sharding enables
whole data set to be
in memory, ensuring
low latency
JSON data model
maps well to
HTML5/JS & Flash
based clients
Easy to store entire
player state in a
single document.
19. Social Graph
Social Graphs
Documents enable
disk locality of all
profile data for a user
Sharding partitions
user profiles across
available servers
Native support for
Arrays makes it easy
to store connections
inside user profile
21. Good fits for MongoDB
Application Characteristic Why MongoDB might be a good fit
Large number of objects to
store
Sharding lets you split objects across multiple
servers
High write or read throughput Sharding + Replication lets you scale read and
write traffic across multiple servers
Low Latency Access Memory Mapped storage engine caches
documents in RAM, enabling in-memory
performance. Data locality of documents can
significantly improve latency over join based
approaches
Variable data in objects Dynamic schema and JSON data model enable
flexible data storage without sparse tables or
complex joins
Cloud based deployment Sharding and replication let you work around
hardware limitations in clouds.
In the beginning, there was RDBMS, and if you needed to store data, that was what you used. But RDBMS is performance critical, and BI workloads tended to suck up system resources. So we carved off the data warehouse as a place to store a copy of the operational data for use in analytical queries. This offloaded work from the RDBMS and bought us cycles to scale higher. Today, we’re seeing another split. There’s a new set of workloads that are saturating RDBMS, and these are being carved off into yet another tier of our data architecture: the NoSQL store.
These are some of the qualities of workloads that necessitate a move to NoSQL. Each of these qualities is difficult to achieve in an RDBMS, but is well addressed by NoSQL data stores.
These are some of the qualities of workloads that necessitate a move to NoSQL. Each of these qualities is difficult to achieve in an RDBMS, but is well addressed by NoSQL data stores.