In this talk we will focus on several of the reasons why developers have come to love the richness, flexibility, and ease of use that MongoDB provides. First we will give a brief introduction of MongoDB, comparing and contrasting it to the traditional relational database. Next, we’ll give an overview of the APIs and tools that are part of the MongoDB ecosystem. Then we’ll look at how MongoDB CRUD (Create, Read, Update, Delete) operations work, and also explore query, update, and projection operators. Finally, we will discuss MongoDB indexes and look at some examples of how indexes are used.
2. Agenda
• A bit of history…
• Introducing MongoDB
• MongoDB CRUD Operations
• Working with Indexes in MongoDB
• Aggregation Framework
• MongoDB Ecosystem
4. RDBMS Strengths
• Data stored is very compact
• Rigid schemas have led to powerful query
capabilities
• Data is optimized for joins and storage
• Robust ecosystem of tools, libraries, integrations
• 40 years old!
9. Enter “Big Data”
• Gartner defines it with 3Vs
• Volume
– Vast amounts of data being collected
• Variety
– Evolving data
– Uncontrolled formats, no single schema
– Unknown at design time
• Velocity
– Inbound data speed
– Fast read/write operations
– Low latency
10. Mapping Big Data to RDBMS
• Difficult to store uncontrolled data formats
• Scaling via big iron or custom data
marts/partitioning schemes
• Schema must be known at design time
• Impedance mismatch with agile development
and deployment techniques
• Doesn‟t map well to native language constructs
12. Goals
• Scale horizontally over commodity systems
• Incorporate what works for RDBMSs
– Rich data models, ad-hoc queries, full indexes
• Drop what doesn‟t work well
– Complex schemas, multi-row transactions, complex joins
• Do not homogenize APIs
• Match agile development and deployment
workflows
13. Key Features
• Data represented as documents (JSON)
– Flexible-schema
– Storage/wire format is BSON
• Full CRUD support (Create, Read, Update, Delete)
– Atomic in-place updates
– Ad-hoc queries: Equality, RegEx, Ranges,Geospatial,Text
• Secondary indexes
• Replication – redundancy, failover
• Sharding – partitioning for read/write scalability
16. MongoDB is full featured
MongoDB
{
first_name: „Paul‟,
surname: „Miller‟,
city: „London‟,
location: [45.123,47.232],
cars: [
{ model: „Bently‟,
year: 1973,
value: 100000, … },
{ model: „Rolls Royce‟,
year: 1965,
value: 330000, … }
}
}
Rich Queries
• Find Paul’s cars
• Find everybody who owns a car built
between 1970 and 1980
Geospatial • Find all of the car owners in London
Text Search
• Find all the cars described as having
leather seats
Aggregation
• What’s the average value of Paul’s car
collection
Map Reduce
• What is the ownership pattern of colors
by geography over time? (is purple
trending up in China?)
21. > use blog
> var post = {
author: "markh",
date: new Date(),
title: "My First Blog Post",
body: "MongoDB is an open source document-oriented database
system developed and supported by 10gen.",
tags: ["MongoDB"]
}
> db.posts.insert(post)
Create – insert()
22. > var post = {
"_id" : 1,
"author" : "markh",
"title" : "MetLife builds innovative customer service application
using MongoDB",
"body" : "MetLife built a working prototype in two weeks and
was live in U.S. call centers in 90 days.",
"date" : ISODate("2013-05-07T00:00:00.000Z"),
"tags" : ["MongoDB", "Database", "Big Data"]
}
> db.posts.update({ _id:1 }, post, { upsert : true })
// upsert option with <query> argument on _id -- same as save()
Upsert
23. > db.posts.findOne()
{
"_id" : ObjectId("517ed472e14b748a44dc0549"),
"author" : "markh",
"date" : ISODate("2013-05-29T20:13:37.349Z"),
"title" : "My First Blog Post",
"body" : "MongoDB is an open source document-oriented
database system developed and supported by 10gen.",
"tags" : ["MongoDB"]
}
// _id is unique but can be anything you like
Read – findOne()
24. > db.posts.findOne({author:"markh"})
{
"_id" : ObjectId("517ed472e14b748a44dc0549"),
"author" : "markh",
"date" : ISODate("2013-05-29T20:13:37.349Z"),
"title" : "My First Blog Post",
"body" : "MongoDB is an open source document-oriented
database system developed and supported by 10gen.",
"tags" : ["MongoDB"]
}
Read – findOne()
25. > db.posts.find({author:"markh"})
{
"_id" : ObjectId("517ed472e14b748a44dc0549"),
"author" : "markh",
"date" : ISODate("2013-05-29T20:13:37.349Z"),
"title" : "My First Blog Post",
"body" : "MongoDB is an open source document-oriented
database system developed and supported by 10gen.",
"tags" : ["MongoDB"]
}
…
Read – find()
30. > var post = {
author: "markh",
date : ISODate("2013-05-29T20:13:37.349Z"),
title: "MongoDB is the #1 NoSQL Database",
body: "MongoDB is an open source document-oriented database
system developed and supported by 10gen.",
tags: ["MongoDB"]
}
> db.posts.update(
{ _id:ObjectId("517ed472e14b748a44dc0549") },
post
)
Update
36. Indexes are the single biggest
tunable performance factor in
MongoDB
Absent or suboptimal indexes
are the most common
avoidable MongoDB
performance problem.
37. // Default (unique) index on _id
// create an ascending index on “author”
> db.posts.ensureIndex({author:1})
> db.posts.find({author:"markh"})
Indexing a single value
45. // username in users collection must be unique
db.users.ensureIndex( { username: 1 }, { unique: true } )
Uniqueness Constraints
46. // Only documents with comments.userid will be indexed
db.posts.ensureIndex(
{ "comments.userid": 1 } ,
{ sparse: true }
)
// Allow multiple documents to not have a sku field
db.products.ensureIndex( {sku: 1}, {unique: true, sparse: true} )
Sparse Indexes
48. Pipeline
• Process a stream of documents
– Original input is a collection
– Final output is a result document
• Series of operators
– Filter or transform data
– Input/output chain
ps ax | grep mongod | head -n 1
72. • MongoDB is a full-featured, general purpose
database
• Flexible document data model provides
– Greater flexibility
– Greater agility
• MongoDB is built for "Big Data"
• Healthy, strong, and growing ecosystem /
community
Conclusion
Magento ER Diagramhttp://www.magento-exchange.com/magento-database/magento-1-4-database-er-diagram/
BSON types:stringinteger (32- or 64-bit)double (64-bit IEEE 754 floating point number)date (integer number of milliseconds since the Unix epoch)byte array (binary data)boolean (true and false)null
BSON types:stringinteger (32- or 64-bit)double (64-bit IEEE 754 floating point number)date (integer number of milliseconds since the Unix epoch)byte array (binary data)boolean (true and false)null
These are called targeted modifications
So it’s imperative we understand them
Indexes can be costly if you have too manysoooo....
GeoJSON is an open format (based on the JSON standard) for encoding a variety of geographic data structures. MongoDB supports the following GeoJSON objects:PointLineStringPolygon
$geoWithin - GeoJSON Objects Bounded by a Polygon$geoIntersects operator queries for locations that intersect a specified GeoJSON object.
unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!TODO: Maybe add a red exclamation point for dropDups.
MongoDB doesn't enforce a schema – documents are not required to have the same fields.Sparse indexes only contain entries for documents that have the indexed field.Without sparse, documents without field 'a' have a null entry in the index for that field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple 'null' values violate the unique constraint.XXX: Is there a visual that makes sense here?
MongoDB uses BSON as the data storage and network transfer format for “documents”.