MongoDB's flexible schema makes it a great fit for your next content management application as its data model makes it easy to catalog multiple content types with diverse meta data. In this session, we'll review schema design for content management, using GridFS for storing binary files, and how you can leverage MongoDB's auto-sharding to partition your content across multiple servers.
2. Agenda
• MongoDB Features and Overview
• Sample Content Management System (CMS)
Application
• Schema Design Considerations
• Building Feeds and Querying Data
• Replication, Failover, and Scaling
• Case Studies
• Further Resources
3. MongoDB Features
• JSON Document Model with
Dynamic Schemas
• Auto-Sharding for Horizontal
Scalability
• Text Search
• Aggregation Framework and
MapReduce
• Full, Flexible Index Support
and Rich Queries
• Built-In Replication for High
Availability
• Advanced Security
• Large Media Storage with
GridFS
5. CMS Application Overview
• Business news service
• Hundreds of stories per day
• Millions of website visitors per month
• Comments
• Related stories
• Tags
• Company profiles
10. Sample Relational DB Structure
story
id
headline
copy
authorid
slug
…
author
id
first_name
last_name
title
…
tag
id
name
…
comment
Id
storyid
name
Email
comment_text
…
related_story
id
storyid
related_storyid
…
link_story_tag
Id
storyid
tagid
…
11. Sample Relational DB Structure
• Number of queries per page load?
• Caching layers add complexity
• Tables may grow to millions of rows
• Joins will become slower over time as db
increases in size
• Schema changes
• Scaling database to handle more reads
12. MongoDB Schema Design
• “Dynamic Schema”, however, schema design is
important
• JSON documents
• Design for the use case and work backwards
• Avoid a relational model in MongoDB
• No joins or transactions, most related information
should be contained in the same document
• Atomic updates on documents, equivalent of
transaction
21. // Display a story, related stories, and first page of comments
db.story.find( { url: “apple-reports-second-quarter-earnings” });
// Display a story, related stories, and second page of comments
db.story.find( { url: “apple-reports-second-quarter-earnings” });
db.comment.find( { story_id : 1234 }).limit(25).skip(25).sort({ date
: -1 });
// All Stories for a given tag
db.story.find( { tags: “Earnings” });
Querying MongoDB
22. // Display data for a company, latest stories
db.company.find( { url: “apple-inc” });
// Display data for a company, all stories
db.company.find( { url: “apple-inc” });
db.story.find( { company_id : 1234 });
Querying MongoDB
23. // Inserting new stories are easy, just submit JSON document
db.story.insert( { headline: “Apple Reports Revenue”... });
// Adding story tags
db.story.update( { _id : 375 }, { $addToSet : { tags : "AAPL" } }
)
// Adding a comment (if embedding comments in story)
db.story.update( { _id : 375 }, { $push: { comments: { name:
„Jason‟, „comment: „Great Story‟} } } )
Inserting and Updating
Stories
24. // Index on story slug
db.story.ensureIndex( { url: 1 });
// Index on story tags
db.story.ensureIndex( { tags: 1 });
MongoDB Indexes for CMS
26. // Very simple to gather specific information for a feed
db.story.find( { tags: { $in : [“Earnings”, “AAPL”] } }).sort(
{ date : -1 });
Query Tags and Sort by Date
28. Replication
• Extremely easy to set up
• Replica node can trail primary node and
maintain a copy of the primary database
• Useful for disaster
recovery, failover, backups, and specific
workloads such as analytics
• When Primary goes down, a Secondary will
automatically become the new Primary
31. Scaling Horizontally
• Important to keep working data set in RAM
• When working data set exceeds RAM, easy to
add additional machines and segment data
across machines (sharding)
34. Runs unified data store serving hundreds of
diverse web properties on MongoDB
Case Study
Problem Why MongoDB Results
• Hundreds of diverse
web properties built on
Java-based CMS
• Rich documents forced
into ill-suited model
• Adding new data
types, tables to RDBMS
killed read performance
• Flexible schema
• Rich querying and support
for secondary index
support
• Easy to manage
replication and scaling
• Developers can focus on
end-user features instead
of back-end storage
• Simplified day-to-day
operations
• Simple to add new
brands, content types, etc.
to platform
35. Serves targeted content to users using MongoDB-
powered identity system
Case Study
Problem Why MongoDB Results
• 20M+ unique visitors
per month
• Rigid relational schema
unable to evolve with
changing data types
and new features
• Slow development
cycles
• Easy-to-manage
dynamic data model
enables limitless
growth, interactive
content
• Support for ad hoc
queries
• Highly extensible
• Rapid rollout of new
features
• Customized, social
conversations
throughout site
• Tracks user data to
increase
engagement, revenue
36. Powers content-serving web platform on MongoDB
to deliver dynamic data to users
Case Study
Problem Why MongoDB Results
• Static web content
• Siloed data
stores, disparate
technologies
• Unable to aggregate
and integrate data for
dynamic content
• Support for agile
development
• Easy to use and
maintain
• Low subscription and
HW costs
• Ability to serve dynamic
content
• Decreased TCO
• Replaced multiple
technologies with single
MongoDB database
37. Resource Location
MongoDB Downloads 10gen.com/download
Free Online Training education.10gen.com
Webinars and Events 10gen.com/events
White Papers 10gen.com/white-papers
Case Studies 10gen.com/customers
Presentations 10gen.com/presentations
Documentation docs.mongodb.org
Additional Info info@10gen.com
For More Information
Resource Location
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search (with MongoDB 2.4), geospatial, and more