Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
2. Who is Talking To You?
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at JPMorganChase
and Bear Stearns
• Over 30 years of designing and building systems
• Big and small
• Super-specialized to broadly useful in any vertical
• “Traditional” to completely disruptive
• Advocate of language leverage and strong factoring
• Inventor of perl DBI/DBD
• Not an award winner for PowerPoint
• Still programming – using emacs, of course
3. Agenda
• What is MongoDB?
• What are some good use cases?
• How do I use it?
• How do I deploy it?
4. MongoDB: The Leading NoSQL Database
Document
Data Model
Open-
Source
Fully Featured
High Performance
Scalable
{
name: “John Smith”,
pfxs: [“Dr.”,”Mr.”],
address: “10 3rd St.”,
phones: [
{ number: “555-1212”,
type: “land” },
{ number: “444-1212”,
type: “mobile” }
]
}
5. 5
The best way to run
MongoDB
Automated.
Supported.
Secured.
Features beyond those in the
community edition:
Enterprise-Grade Support
Commercial License
Ops Manager or Cloud Manager Premium
Encrypted & In-Memory Storage Engines
MongoDB Compass
BI Connector (SQL Bridge)
Advanced Security
Platform Certification
On-Demand Training
MongoDB Enterprise Edition
6. Company Vital Stats
500+ employees 2000+ customers
Over $311 million in funding
Offices in NY & Palo Alto and
across EMEA, and APAC
14. Agenda
• What is MongoDB?
• What are some good use cases?
• How do I use it?
• How do I deploy it?
15. 15
MongoDB 3.0 Set The Stage…
7x-10x Performance, 50%-80% Less Storage
How: WiredTiger Storage Engine
• Same data model, query language, & ops
• 100% backwards compatible API
• Non-disruptive upgrade
• Storage savings driven by native
compression
• Write performance gains driven by
– Document-level concurrency control
– More efficient use of HW threads
• Much better ability to scale vertically
MongoDB 3.0MongoDB 2.6
Performance
16. 16
MongoDB 3.2 :
Efficient Enterprise MongoDB
• Much better ability to scale vertically
+
• Document Validation Rules
• Encryption at rest
• BI Connector (SQL bridge)
• MongoDB Compass
• New Relic & AppDynamics integration
• Backup snapshots on filesystem
• Advanced Full-text languages
• $lookup (“left outer JOIN”)
More
general-purpose
solutions
17. 17
MongoDB Sweet Spot Use Cases
Big Data
Product &
Asset Catalogs Security &
Fraud
Internet of
Things
Database-as-
a- Service
Mobile
Apps
Customer
Data
Management
Single View Social &
Collaboration
Content
Management
Intelligence
Agencies
Top Investment
and Retail Banks
Top Global
Shipping Company
Top Industrial
Equipment
Manufacturer
Top Media
Company
Top Investment
and Retail Banks
Complex Data
Management
Top Investment
and Retail Banks
Embedded /
ISV
Cushman &
Wakefield
18. Agenda
• What is MongoDB?
• What are some good use cases?
• How do I use it?
• How do I deploy it?
20. 20
Unpack and Start The Server
$ tar xf mongodb-osx-x86_64-enterprise-3.2.0.tgz
$ mkdir -p ~/mydb/data
$ mongodb-osx-x86_64-enterprise-3.2.0/bin/mongod
> --dbpath ~/mydb/data
> --logpath ~/mydb/mongod.log
> --fork
about to fork child process, waiting until server is
ready for connections.
forked process: 6517
child process started successfully, parent exiting
21. 21
Verify Operation
$ mongodb-osx-x86_64-enterprise-3.2.0/bin/mongo
MongoDB shell version: 3.2.0
connecting to: 127.0.0.1:27017/test
Server has startup warnings:
2016-01-01T12:44:01.646-0500 I CONTROL [initandlisten]
2016-01-01T12:44:01.646-0500 I CONTROL [initandlisten] ** WARNING:
soft rlimits too low. Number of files is 256, should be at least
1000
MongoDB Enterprise > use mug
switched to db mug
MongoDB Enterprise > db.foo.insert({name:”bob”,hd: new ISODate()});
MongoDB Enterprise > db.foo.insert({name:"buzz"});
MongoDB Enterprise > db.foo.insert({pets:["dog","cat"]});
MongoDB Enterprise > db.foo.find();
{ "_id" : ObjectId("5686cef538ea4981e63111dd"), "name" : "bob", "hd"
: ISODate("2016-01-01T19:09:41.442Z") }
{ "_id" : ObjectId("5686…79d5"), "name" : "buzz" }
{ "_id" : ObjectId("5686…79d6"), "pets" : [ "dog", "cat" ] }
22. 22
The Simple Java App
import com.mongodb.client.*;
import com.mongodb.*;
import java.util.Map;
public class mug1 {
public static void main(String[] args) {
try {
MongoClient mongoClient = new MongoClient();
MongoDatabase db = mongoClient.getDatabase("mug”);
MongoCollection coll = db.getCollection("foo");
MongoCursor c = coll.find().iterator();
while(c.hasNext()) {
Map doc = (Map) c.next();
System.out.println(doc);
}
} catch(Exception e) {
// ...
}
}
}
28. A Slightly Bigger Example
Relational MongoDB
{ vers: 1,
customer_id : 1,
name : {
“f”:"Mark”,
“l”:"Smith” },
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Customer
ID
First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay White Dallas
3 Meagan White London
4 Edward Daniels Boston
Phone Number Type DNC
Customer
ID
1-212-555-1212 home T 0
1-212-555-1213 home T 0
1-212-555-1214 cell F 0
1-212-777-1212 home T 1
1-212-777-1213 cell (null) 1
1-212-888-1212 home F 2
29. 29
MongoDB Queries Are Expressive
SQL select A.did, A.lname, A.hiredate, B.type,
B.number from contact A left outer join phones B
on (B.did = A.did) where b.type = ’home' or
A.hiredate > '2014-02-02'::date
MongoDB CLI db.contacts.find({"$or”: [
{"phones.type":”home”},
{"hiredate": {”$gt": new ISODate("2014-
02-02")}}
]});
Find all contacts with at least one home phone or
hired after 2014-02-02
30. 30
MongoDB Aggregation Is Powerful
Sum the different types of phones and create a list
of the owners if there is more than 1 of that type
> db.contacts.aggregate([
{$unwind: "$phones"}
,{$group: {"_id": "$phones.t", "count": {$sum:1},
"names": {$push: "$name"} }}
,{$match: {"count": {$gt: 1}}}
]);
{ "_id" : "home", "count" : 2, "names" : [
{ "f" : "John", "l" : "Doe" },
{ "f" : "Mark", "l" : "Smith" } ] }
{ "_id" : "cell", "count" : 4, "names" : [
{ "f" : "John", "l" : "Doe" },
{ "f" : "Meagan", "l" : "White" },
{ "f" : "Edward", "l" : "Daniels” }
{ "f" : "Mark", "l" : "Smith" } ] }
31. 31
$lookup: “Left Outer Join++”
> db.leases.aggregate([ ]);
{
"_id" : ObjectId("5642559e0d4f2076a43584fc"),
"leaseID" : "A5",
"sku" : "GD652",
"origDate" : ISODate("2010-01-01T00:00:00Z"),
"histDate" : ISODate("2010-10-28T00:00:00Z"),
"monthlyDue" : 10,
"vers" : 11,
"delinq" : { "d30" : 10, "d60" : 10, "d90" : 60
},
"credit" : 0
}
// 66 more ….
Step 1: Get a sense of the raw material
32. 32
$lookup: “Left Outer Join++”
Step 2: Group leases by SKU and capture count and max value of 90
day delinquency
> db.leases.aggregate([
{$group: { _id: "$sku", n:{$sum:1},
max90:{$max:"$delinq.d90"} }}
]);
{ "_id" : "AC775", "n" : 27, "max90" : 20 }
{ "_id" : "AB123", "n" : 26, "max90" : 5 }
{ "_id" : "GD652", "n" : 14, "max90" : 80 }
33. 33
$lookup: “Left Outer Join++”
Step 3: Reverse sort and then limit to the top 2
> db.leases.aggregate([
{$group: { _id: "$sku", n:{$sum:1},
max90:{$max:"$delinq.d90"} }}
,{$sort: {max90:-1}}
,{$limit: 2}
]);
{ "_id" : "GD652", "n" : 14, "max90" : 80 }
{ "_id" : "AC775", "n" : 27, "max90" : 20 }
36. Agenda
• What is MongoDB?
• What are some good use cases?
• How do I use it?
• How do I deploy it?
37. 37
• Single-click provisioning
• Scaling & upgrades
• Admin tasks
• Monitoring with charts
• Dashboards and alerts on 100+
metrics
• Backup and restore with point-in-
time recovery
• Support for sharded clusters
MongoDB Ops/Cloud Manager
41. 41
HA and DR Are Isomorphic
PRIMARY
Application
DRIVER
secondary secondary Dual Data
Center HA/DR
Replica Set
secondary
Arbiter
(DC3 or cloud)
Data Center 1 Data Center 2
43. 43
Horizontal Scalability Through Sharding
PRIMARY
Application
DRIVER
secondary
secondary
PRIMARY
secondary
secondary
PRIMARY
secondary
secondary
mongos
Three Sharding Models:
1. Range
2. Tag
3. Hash
…
Shard 1
Symbols A-D
Shard 2
Symbols E-H
Shard n
Symbols ?-Z
44. 44
For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
HELLO!
This is Buzz Moschetti at MongoDB, and welcome to today’s webinar entitled “Thinking in Documents”, part of our Back To Basics series.
If your travel plans today do not include exploring the document model in MongoDB then please exit the aircraft immediately and see an agent at the gate
Otherwise – WELCOME ABOARD for about the next hour.
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
There are three themes that are important to grasp when Thinking In Documents in MongoDB and these will be reinforced throughout the presentation.
First, great schema design…. I’ll repeat it: great…
This is not new or something revolutionary to MongoDB.
It is something we have been doing all along in the construction of solutions. Sometimes well, sometimes not.
It’s just that the data structures and APIs used by MongoDB make it much easier to satisfy the first two bullet points.
Particularly for an up-stack software engineer kind of person like myself, the ease and power of well-harmonizing your persistence with your code – java, javascript, perl, python – is a vital part of
ensuring your overall information architecture is robust.
Part of the exercise also involves candidly addressing legacy RDBMS issues that we see over and over again after 40 years, like schema explosion and field overloading and flattening
Boils down to success = “schema” + code
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
Very briefly, a little bit about the person talking to you today over the net.
Yep!
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
A document is not a PDF or a MS word artifact.
A document a term for a rich shape. Structures of structures of lists of structures that ultimately at the leaves have familiar scalars like int, double, datetimes, and string.
In this example we see also that we’re carrying a thumbnail photo in a binary byte array type; that’s natively supported as well.
This is different than the traditional row-column approach used in RDBMS.
Another important difference is that In MongoDB, it is not required for every document in a collection to be the same shape; shapes can VARY
With the upcoming release of 3.2, we will be supporting documentation validation so in those designs where certain fields and their types are absolutely mandatory, we’ll be able to enforce that at the DB engine level similar to – but not exactly like – traditional schemas.
Truth is in most non-trivial systems, even with RDBMS and stored procs, etc. plenty of validation and logic is being handled outside the database..
Now here is something very important:
For the purposes of the webinar, we will be seeing this “to-string” representation of a document as it is emitted from the MongoDB CLI.
This is easy to read and gets the structural design points across nicely.
But make no mistake: you want most of your actual software interaction with MongoDB (and frankly any DB) to be via high fidelity types, not a big string with whitespaces and CR and quotes and whatnot.
This is a very, very exciting part of MongoDB.
No need to come up with userDefined column 1, column 2, etc.
We see here that Kristina and Mike have very different substructures inside the personalData field.
We call this polymorphism: the variation of shape from document-to-document within the same collection.
The library application logic only is looking for a field called “personalData”; actions will be taken dynamically based on the shape and types in the substructure!
For example, It is a very straightforward exercise to recursively “walk” the structure and construct a panel in a GUI – especially if you are using AngularJS and the MEAN stack
(MongoDB / Express / Angular / Node.js )
No need to use XML or blobs or serialized objects. It’s all native MongoDB -- and documents are represented in the form most easily natively manipulated by the language
Every field is queryable and if so desired, indexable! Documents that do not contain fields in a query predicate are simply treated as unset.
Drivers in each language represent documents in a language-native form most appropriate for that language.
Java has maps, python has dictionaries. You deal with actual objects like Dates, not strings that must be constructed or parsed.
Another important note: We’ll be using query functionality to kill 2 birds with one stone:
To show the shape of the document
To show just a bit of the MongoDB query language itself including dotpath notation to “dig into” substructures
Note also that in MongoDB, documents go into collections in the same shape they come out so we won’t focus on insert.
This is a very different design paradigm from RDBMS, where, for example, the read-side of an operation implemented as an 8 way join is very different than the set of insert statements (some of them in a loop) required for the write side.
Let’s get back to data design.
…
Traditional data design is characterized by some of the points above, and this is largely because the design goals and constraints of legacy RDBMS engines heavily influence the data being put into them.
These platforms were designed when CPUs were slow and memory was VERY expensive.
Perhaps more interesting is that the languages of the time – COBOL, FORTRAN, APL, PASCAL, C – were very compile time oriented and very rectangular in their expression of data structures.
One could say rigid schema combined with these languages was in fact well-harmonized.
Overall, the platform is VERY focused on the physical representation of data.
For example, although most have been conflated, the legacy types of char, varchar, text, CLOB etc. to represent a string suggest a strong coupling to byte-wise storage concerns.
Documents, on the other hand, are more like business entities.
You’ll want to think of your data moving in and out as objects.
And the types and features of Document APIs are designed to be well-harmonized with today’s programming languages – Java, C#, python, node.js, Scala , C++ -- languages that are not nearly as compile-time oriented and offer great capabilities to dynamically manipulate data and perform reflection/introspection upon it.
There are three themes that are important to grasp when Thinking In Documents in MongoDB and these will be reinforced throughout the presentation.
First, great schema design…. I’ll repeat it: great…
This is not new or something revolutionary to MongoDB.
It is something we have been doing all along in the construction of solutions. Sometimes well, sometimes not.
It’s just that the data structures and APIs used by MongoDB make it much easier to satisfy the first two bullet points.
Particularly for an up-stack software engineer kind of person like myself, the ease and power of well-harmonizing your persistence with your code – java, javascript, perl, python – is a vital part of
ensuring your overall information architecture is robust.
Part of the exercise also involves candidly addressing legacy RDBMS issues that we see over and over again after 40 years, like schema explosion and field overloading and flattening
Boils down to success = “schema” + code
Some quick logistics.
In the last 5 to 10 mins today, we will answer the most common questions that have appeared in the webinar.
Some quick logistics.
In the last 5 to 10 mins today, we will answer the most common questions that have appeared in the webinar.
Customer Data Management (e.g., Customer Relationship Management, Biometrics, User Profile Management)
Product and Asset Catalogs (e.g., eCommerce, Inventory Management)
Social and Collaboration Apps: (e.g., Social Networks and Feeds, Document and Project Collaboration Tools)
Mobile Apps (e.g., for Smartphones and Tablets)
Content Management (e.g, Web CMS, Document Management, Digital Asset and Metadata Management)
Internet of Things / Machine to Machine (e.g., mHealth, Connected Home, Smart Meters)
Security and Fraud Apps (e.g., Fraud Detection, Cyberthreat Analysis)
DbaaS (Cloud Database-as-a-Service)
Data Hub (Aggregating Data from Multiple Sources for Operational or Analytical Purposes)
Big Data (e.g., Genomics, Clickstream Analysis, Customer Sentiment Analysis)
There are three themes that are important to grasp when Thinking In Documents in MongoDB and these will be reinforced throughout the presentation.
First, great schema design…. I’ll repeat it: great…
This is not new or something revolutionary to MongoDB.
It is something we have been doing all along in the construction of solutions. Sometimes well, sometimes not.
It’s just that the data structures and APIs used by MongoDB make it much easier to satisfy the first two bullet points.
Particularly for an up-stack software engineer kind of person like myself, the ease and power of well-harmonizing your persistence with your code – java, javascript, perl, python – is a vital part of
ensuring your overall information architecture is robust.
Part of the exercise also involves candidly addressing legacy RDBMS issues that we see over and over again after 40 years, like schema explosion and field overloading and flattening
Boils down to success = “schema” + code
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
We’ll close with something really cool – document validation that adapts to change over time.
Assuming you “soft version” your documents by including a logical version ID ( in this case, a simple integer in field v) , you can maintain multiple different shapes of documents in one collection, each of them validation enforced to the version rules appropriate at the time. And again, because it is at the DB engine level, enforcement is guaranteed through all drivers.
SUPER POWERFUL!
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
There are three themes that are important to grasp when Thinking In Documents in MongoDB and these will be reinforced throughout the presentation.
First, great schema design…. I’ll repeat it: great…
This is not new or something revolutionary to MongoDB.
It is something we have been doing all along in the construction of solutions. Sometimes well, sometimes not.
It’s just that the data structures and APIs used by MongoDB make it much easier to satisfy the first two bullet points.
Particularly for an up-stack software engineer kind of person like myself, the ease and power of well-harmonizing your persistence with your code – java, javascript, perl, python – is a vital part of
ensuring your overall information architecture is robust.
Part of the exercise also involves candidly addressing legacy RDBMS issues that we see over and over again after 40 years, like schema explosion and field overloading and flattening
Boils down to success = “schema” + code
On behalf of all of us at MongoDB , thank you for attending this webinar!
I hope what you saw and heard today gave you some insight and clues into what you might face in your own data design efforts.
Remember you can always reach out to us at MongoDB for guidance.
With that, code well and be well.