Engineers often ask "how do I know if I should build my application on MongoDB?" IT executives ask a similar question, "which applications in my application portfolio should I migrate to MongoDB?" This presentation will present a framework for answering these questions.
We will cover two sets of criteria: (1) how to determine when to migrate a legacy application to MongoDB and (2) when should MongoDB be used for new applications? The presentation will also include a brief introduction to MongoDB to provide enough MongoDB technical background for analyzing when to use MongoDB?
Learning Objectives:
The basics of MongoDB document model, query capabilities, and architecture required for analyzing when to use MongoDB?
Criteria for determining when to use MongoDB to re-platform legacy applications
Criteria for determining when to use MongoDB for new applications
2. AGENDA
• When to use MongoDB? Are we asking the right question?
• Why MongoDB?
• Evaluating Use Case Suitability for MongoDB
• When you shouldn’t use MongoDB?
9. BEING SUCCESSFUL WITH MONGODB
5x
Productivity*
We help our customers to increase
overall output, e.g. in terms of
development or ops productivity.
80%
Cost reduction*
We help our customers to dramatically
lower their total cost of ownership for data
storage and analytics by up to 80%.
* Dependent on type of implementation
While the detailed definition of success metrics look different for each customer, 2 key factors are
consistent across all of our engagements:
11. CAN WE USE MONGODB?
• If we get
‒ 5x developer productivity
‒ 80% cost reduction
• Shouldn’t we consider this
alternative first?
Assess
MongoDB Fit
MongoD
B?
Build In
MongoDB
Look at
Alternatives
yes
no
22. DO MORE WITH YOUR DATA
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Search
Find all the cars described as having
leather seats. Count them by model.
(text, facets, collation)
Aggregation
Calculate the average value of Paul’s
car collection
Graph
Find all the cars own by Paul’s family
(descendants)
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)
26. NEW DATA FIELDS AND TYPES
• New sensor version à new field
ALTER TABLE device_data
ADD lbs_fuel int;
• 5000 aircraft x
1 year of data x
1 reading per minute
> 2B Rows
TailNumber lbs fuel ts speed
New
Column
2BRows
How long will this take?
28. DAY 1: INITIAL EFFORTS FOR BOTH TECHNOLOGIES
DDL: create table contact ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name ) values ( ?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
}
return m;
}
SQL
DDL: none
save(Map m)
{
collection.insert(m);
}
mongoDB
Map fetch(String id)
{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Let’s assume for argument’s sake that both
approaches take the same amount of time
29. DAY 2: ADD SIMPLE FIELDS
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
• Capturing title and hireDate is part of adding a new business feature
• It was pretty easy to add two fields to the structure
• …but now we have to change our persistence code
Brace yourself (again) …..
30. SQL DAY 2 (CHANGES IN BOLD)
DDL: alter table contact add title varchar(8);
alter table contact add hireDate date;
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”);
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate from contact where id = ?”);
}
save(Map m)
{
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
contactInsertStmt.execute();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
if(rs.next()) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
}
return m;
}
Consequences:
1. Code release schedule linked to database
upgrade (new code cannot run on old schema)
2. Issues with case sensitivity starting to creep in
(many RDBMS are case insensitive for column
names, but code is case sensitive)
3. Changes require careful mods in 4 places
4. Beginning of technical debt
31. MONGODB DAY 2
save(Map m)
{
collection.insert(m);
}
Map fetch(String id)
{
Map m = null;
DBObject dbo = new BasicDBObject();
dbo.put(“id”, id);
c = collection.find(dbo);
if(c.hasNext()) }
m = (Map) c.next();
}
return m;
}
Advantages:
1. Zero time and money spent on overhead code
2. Code and database not physically linked
3. New material with more fields can be added into
existing collections; backfill is optional
4. Names of fields in database precisely match key
names in code layer and directly match on name, not
indirectly via positional offset
5. No technical debt is created✔ NO CHANGE
32. DAY 3: ADD LIST OF PHONE NUMBERS
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
n1.put(“type”, “work”);
n1.put(“number”, “1-800-555-1212”));
list.add(n1);
n2.put(“type”, “home”));
n2.put(“number”, “1-866-444-3131”));
list.add(n2);
m.put(“phones”, list);
• It was still pretty easy to add this data to the structure
• .. but meanwhile, in the persistence code …
REALLY brace yourself…
33. SQL DAY 3 CHANGES: OPTION 2:
PROPER APPROACH WITH MULTIPLE PHONE NUMBERS
DDL: create table phones ( … )
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
c2stmt = connection.prepareStatement(“insert into phones (id, type,
number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select id, name, title, hiredate, type, number from contact, phones
where phones.id = contact.id and contact.id = ?”);
}
save(Map m)
{
startTrans();
contactInsertStmt.setString(1, m.get(“id”));
contactInsertStmt.setString(2, m.get(“name”));
contactInsertStmt.setString(3, m.get(“title”));
contactInsertStmt.setDate(4, m.get(“hireDate”));
for(Map onePhone : m.get(“phones”)) {
c2stmt.setString(1, m.get(“id”));
c2stmt.setString(2, onePhone.get(“type”));
c2stmt.setString(3, onePhone.get(“number”));
c2stmt.execute();
}
contactInsertStmt.execute();
endTrans();
}
Map fetch(String id)
{
Map m = null;
fetchStmt.setString(1, id);
rs = fetchStmt.execute();
int i = 0;
List list = new ArrayList();
while (rs.next()) {
if(i == 0) {
m = new HashMap();
m.put(“id”, rs.getString(1));
m.put(“name”, rs.getString(2));
m.put(“title”, rs.getString(3));
m.put(“hireDate”, rs.getDate(4));
m.put(“phones”, list);
}
Map onePhone = new HashMap();
onePhone.put(“type”, rs.getString(5));
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
i++;
}
return m;
}
This took time and money
34. SQL DAY 5: ZOMBIES! (ZERO OR MORE BETWEEN ENTITIES)
init()
{
contactInsertStmt = connection.prepareStatement
(“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,?
)”);
c2stmt = connection.prepareStatement(“insert into phones (id, type,
number) values (?, ?, ?)”;
fetchStmt = connection.prepareStatement
(“select A.id, A.name, A.title, A.hiredate, B.type, B.number from
contact A left outer join phones B on (A.id = B. id) where A.id = ?”);
}
Whoops! And it’s also wrong!
We did not design the query accounting for contacts that have
no phone number. Thus, we have to change the join to an
outer join.
But this ALSO means we have to change the unwind logic
This took more time and money!
while (rs.next()) {
if(i == 0) {
// …
}
String s = rs.getString(5);
if(s != null) {
Map onePhone = new HashMap();
onePhone.put(“type”, s);
onePhone.put(“number”, rs.getString(6));
list.add(onePhone);
}
}
…but at least we have a DAL…
right?
40. SCALING MONGODB: AUTOMATIC SHARDING
Three types: hash-based, range-based, location-aware
Increase or decrease capacity as you go
Automatic balancing
47. REMEMBER THIS?
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Rich Queries
Find everybody in London with a car
built between 1970 and 1980
Geospatial
Find all of the car owners within 5km
of Trafalgar Sq.
Search
Find all the cars described as having
leather seats. Count them by model.
(text, facets, collation)
Aggregation
Calculate the average value of Paul’s
car collection
Graph
Find all the cars own by Paul’s family
(descendants)
Map Reduce
What is the ownership pattern of
colors by geography over time?
(is purple trending up in China?)
49. MONGODB CONNECTOR FOR BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI tool
into equivalent MongoDB queries that are sent to
MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then visualize
the data based on user requirements
50. “We reduced 100+ lines of integration code to just a single line after moving to the MongoDB Spark connector.”
- Early Access Tester, Multi-National Banking Group Group
Analytics Application
Scala, Java, Python, R APIs
SQL
Machine
Learning
Libraries
Streaming Graph
Spark
Worker
Spark
Worker
Spark
Worker
Spark
Worker
MongoDB Connector for Spark
ADVANCED ANALYTICS
MongoDB Connector for Apache Spark
• Native Scala connector, certified by Databricks
• Exposes all Spark APIs & libraries
• Efficient data filtering with predicate pushdown,
secondary indexes, & in-database aggregations
• Locality awareness to reduce data movement
• Updated with Spark 2.0 support
51. WHAT DOES THIS MEAN?
• Developer productivity
• Wider range of use cases
• Changing requirements
• Complex queries and analytics
MongoDB
52. WIDE VARIETY OF USE CASES
Single View Internet of Things Mobile Real-Time Analytics
App Modernization Content ManagementBlockchain
56. EXISTING APPLICATIONS
• Is there a critical requirement that isn’t being met?
‒ Performance/Scalability
‒ Agility
‒ Variable data sources/formats
‒ Availability/Resiliency
‒ Cost
‒ Cloud
• Revision or re-platform?
57. EXISTING APPLICATION CHALLENGES
Requirements Challenges MongoDB Features
Performance/Scalability Can’t meeting query volume
Query Latency issues
Data volume exceeding server(s)
capacity
Document Model
WiredTiger
Sharding
Commodity Hardware
Cloud/Atlas
Availability/Resiliency Need automatic failover:
• Zero down time when loss of node,
network, or data center
• No engineering effort required to
restore service
Replica sets
• Automated failover
• Zero downtime maintenance
Cloud Migration Cloud migration
No cloud provider lock in
Atlas
58. EXISTING APPLICATION CHALLENGES
Requirements Challenges MongoDB Features
Agility – Shorten time to value Feature backlog
Developers focused on maintenance
instead of innovation
Flexible document model
Powerful query language
Driver architecture
Variable data sources/format New data sources
Data format changes continuously
Flexible document model
Cost Mainframe MIPS
RAC clusters
Additional expensive components for
replication, failover
Commodity Hardware
Open Source
Atlas
Replica sets
59. CRITERIA FOR ASSESSING MONGODB FIT
• Performance/Scalability
• Availability/Resiliency
• Cloud
• Agility
• Variable Data
• Cost
• Data naturally modeled as
documents?
• Complex queries
• Analytics
• Strong consistency
60. ADDITIONAL CRITERIA
Requirements Challenges MongoDB Features
Data naturally modeled as documents Complex code for shredding and
reconstituting objects
Flexible document model
Complex Queries
Analytics
Complex application code
Complex architectures including
search engine, Hadoop, ETL, CDC
Limited application functionality
Long time to market
Secondary indexes
Powerful query language
Aggregration Framework
BI Connector
Spark integration
Strong Consistency Users require most up-to-date view of
data
Complex application code required to
handle edge cases
Strong consistency
Read and write concerns
64. NEW APPLICATIONS
• MongoDB makes sense for vast majority of use cases
• Why not MongoDB?
‒ Fear/comfort level with new technology
‒ Don’t know how to support MongoDB?
‒ Don’t want to learn new technology
‒ Expensive enterprise license that is “free” to project team
• Our other solution is good enough
70. MANY REASONS FOR MONGODB?
• Strong Consistency
‒ Documents
‒ Indexes
‒ Consistency across multi-data center
deployments
• Expressive Query Language and
Secondary Indexes
‒ More powerful than SQL
‒ Analytics
‒ Dynamic index creation
• Scalability/Performance
‒ PBs of Data
‒ Millions of ops/sec
• High Availability
‒ Automated failover < 2 seconds
‒ Supports Active-Active and Active-
Passive multi-data center
deployments
• Deploy Anywhere
‒ On-Prem, AWS, Azure, Google
• Ease of Management
‒ Best in class operations tooling
‒ Configure once: one cluster spans
multi-data centers