2. Slide 2Slide 2 www.edureka.in/mongodbTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why is a NoSQL database needed?
Benefits of NoSQL over RDBMS dbs
Comparing NoSQL vs SQL dbs
How MongoDB® solves the problem
Use cases of MongoDB®
Job opportunity and trends on MongoDB®
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
Objectives of this Session
3. Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 3 www.edureka.in/mongodb
RDBMS/OLTP/Real Time
NoSQL/New SQL/BigData
DSS/OLAP/DW
Oracle
MySQL
MS SQL
DB2
Netezza
SAP Hana
Oracle Express
Mongo DB®
HBase
Cassandra
Couch DB
Database Categories
4. Slide 4Slide 4 www.edureka.in/mongodbTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why NoSQL
Volume - 15 - 20 petabyte data in Govt. of India “AADHAR” project, NYSE, Facebook
Velocity - Fastest growing app, Facebook, Jet, Video stream
Variety - Data types e.g. Images, Videos, Text etc
5. Slide 5Slide 5 www.edureka.in/mongodbTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Horizontal Scaling (scale out) versus Vertical Scaling (scale up)
Scale-out - Cache dependent ‘Read’ and ‘Write’ operations
Scale-up - Limited by Memory and Processing (CPU) capabilities
Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.
Monitoring and managing the distributed architecture
Agility with fastest changing world
Why NoSQL
6. Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 6 www.edureka.in/mongodb
Structured Data
Text, Log Files,
Click Streams,
Blogs, Tweets,
Audio, Video,
etc
Unstructured and
Semi–structured Data
Data Growth Pattern
7. Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 7 www.edureka.in/mongodb
NoSQL to the Rescue
Distributed Architecture: easy to scale-out, easy to manage large number of nodes
First Reads: Satisfying ‘write once, read many times’ behaviour
Easy Replication: The failover solution
Higher Performance: An architecture providing much higher per-node performance than the traditional
SQL-based databases
Schema Free: Data model with dynamic and flexible schema
Fast Development: No need of the additional ORM layer
To achieve the same, we compromise on:
No joins
No ACID transaction
8. Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 8 www.edureka.in/mongodb
NoSQL to the Rescue
9. Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 9 www.edureka.in/mongodb
CAP
We must understand the CAP
theorem when we talk about
NoSQL databases or in fact
when designing any distributed
system.
CAP theorem states that there are 3 basic requirements which exist in a special relation when designing
applications for a distributed architecture.
Consistency
Availability
Partition
Tolerance
CAP Theorem
This means that the system is always on (guaranteed
service availability), no downtime.
This means that the system continues to function even if the
communication among the servers is unreliable, i.e. the servers
may be partitioned into multiple groups that cannot communicate
with one another.
This means that the data in the database remains consistent after
the execution of an operation. For example, after an update
operation all clients see the same data.
10. Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 10 www.edureka.in/mongodb
CAP provides the basic requirements for a distributed system
to follow 2 of the 3 requirements.
Theoretically it is impossible to fulfill all 3 requirements.
Therefore, all the current NoSQL database follows the different
combinations of the C, A, P from the CAP theorem.
CAP Theorem and NoSQL Databases
CA - Single site cluster, therefore all nodes are always
in contact. When a partition occurs, the system blocks.
CP - Some data may not be accessible, but the rest is
still consistent/accurate.
AP - System is still available under partitioning, but
some of the data returned may be inaccurate.
11. Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 11 www.edureka.in/mongodb
Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
Basically Available
Soft State indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
Soft State
Eventual Consistency indicates that the system will become consistent over time,
given that the system doesn't receive input during that time.
Eventual Consistency
A BASE system gives up on consistency.
NoSQL Database - A BASE not ACID System
12. Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 12 www.edureka.in/mongodb
~ 150 No SQL Database
are there in Market
~150
NoSQL Database – Not a Panacea
13. Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 13 www.edureka.in/mongodb
Graph
Store
Key –
value
Stores
Wide
Column
Stores%
Document
Base
Document databases pair each key with a complex
data structure known as a document.
Documents can contain many different key-value
pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about
networks, such as social connections.
Graph stores include Neo4J and HyperGraphDB.
Key-value stores are the simplest NoSQL
databases.
Every single item in the database is stored as an
attribute name (or "key"), together with its value.
Wide-column stores such as Cassandra and
HBase are optimized for queries over large
datasets, and store columns of data together,
instead of rows.
Categories of NoSQL Database
14. Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 14 www.edureka.in/mongodb
Key Value
Store
Memcached
Coherence
Redis
Tabular
Big Table
Hbase
Accumulo
Document
Oriented
MongoDB®
Couch DB
Cloudant
Graph Stores
Neo4J
Oracle NoSQL
HyperGraphDB
Types of NoSQL Databases
15. Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 15 www.edureka.in/mongodb
Compromising
Features of RDBMS
Step 2
Step 3
Selecting a NoSQL Database
Step 1 Right Data Model
Pros and Cons of
Consistency
16. Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 16 www.edureka.in/mongodb
1000 TPS
Caching Layer
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
1000 TPS
WEB APPLICATION
RDBMS1
Applications Changing Data
RDBMS1
A Traditional Database Solution
17. Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 17 www.edureka.in/mongodb
WEB APPLICATION
5000 TPS
A NoSQL Database Solution
Applications Changing Data
Application grows with
user base and data volume
5000 TPS
18. Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 18 www.edureka.in/mongodb
Suppose a client needs database design for his blog or website with below features, then what could be the difference
between RDBMS and MongoDB® schema design approach?
Website has the following requirements.
Every post has the unique title, description and url.
Every post can have one or more tags.
Every post has the name of its publisher and total number of likes.
Every post has comments given by users along with their name, message, data-time and likes.
On each post there can be zero or more comments.
Data Modeling Example in RDBMS and MongoDB
®
19. Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 19 www.edureka.in/mongodb
In RDBMS schema design for above requirements will have minimum 3 tables.
posts
id
title
description
url
likes
post_by
tags
id
post_id
tag
comments
Comment_id
post_id
by_user
message
date_time
likes
∞
1 1
∞
Data Modeling Example in RDBMS and MongoDB®
20. Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 20 www.edureka.in/mongodb
While in MongoDB® schema design will have one
collection post and has the following structure.
So while showing the data, in RDBMS we need to join
three tables and in Mongodb® data will be shown from
one collection only.
{
_id: POST_ID
title: TITLE_OF_POST,
description: POST_DESCRIPTION,
by: POST_BY,
url: URL_OF_POST,
tags: [ TAG1, TAG2, TAG3],
likes: TOTAL_LIKES,
comments: [
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
},
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
}
]
}
Data Modeling Example in RDBMS and MongoDB
21. Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 21 www.edureka.in/mongodb
Where to Use MongoDB®?
RDBMS replacement for Web Applications
Semi-structured Content Management
Real-time Analytics & High-Speed Logging
Caching and High Scalability
Web 2.0, Media, SAAS, Gaming
http://www.mongodb.org/about/production-deployments/
22. Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 22 www.edureka.in/mongodb
Metlife uses MongoDB® for “The Wall”, an innovative customer service application
which provides a 360-degree, consolidated view of MetLife customers, including
policy details and transactions across lines of business.
ebay has a number of projects running on MongoDB® for search suggestions,
metadata storage, cloud management and merchandizing categorization.
MongoDB® is the repository that powers MTV Networks’ next-generation CMS,
which is used to manage and distribute content for all of MTV Networks’ major
websites.
MongoDB® is used for back-end storage on the SourceForge front pages, project
pages, and download pages for all projects.
Craigslist uses MongoDB® to archive billions of records.
ADP uses MongoDB® for its high performance, scalability, reliability and its ability
to preserve the data manipulation capabilities of traditional relational databases.
Real World Use Cases of MongoDB
23. Slide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 23 www.edureka.in/mongodb
CNN Turk uses MongoDB
®
for its infrastructure and content management system, including the
tv.cnnturk.com.
Foursquare uses MongoDB
®
to store venues and user ‘check-ins’ into venues, sharding the
data over more than 25 machines on Amazon EC2.
Justin.tv is the easy, fun, and fast way to share live video online. MongoDB
®
powers Justin.tv’s
internal analytics tools for virality, user retention, and general usage stats that out-of-the-box
solutions can’t provide.
ibibo (‘I build, I bond’) is a social network using MongoDB
®
for its dashboard feeds. Each feed
is represented as a single document containing an average of 1000 entries; the site currently
stores over two million of these documents in MongoDB
®
.
Real World Use Cases of MongoDB
®
24. Slide 24 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 24 www.edureka.in/mongodb
Financial Services
Risk Analytics and Reporting
Reference Data Management
Market Data Management
Portfolio Management
Order Capture
Time Series Data
Government
Surveillance Data Aggregation
Crime Data Management and Analytics
Citizen Engagement Platform
Program Data Management
Healthcare Record Management
Industry/Domains where MongoDB® is Used
25. Slide 25 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 25 www.edureka.in/mongodb
Health Care
360-degree Patient View
Population Management for At-risk Demographics
Lab Data Management and Analytics
Mobile Apps for Doctors and Nurses
Electronic Healthcare Records (EHR)
Media and Entertainment
Content Management and Delivery
User Data Management
Digital Asset Management
Mobile and Social Apps
Content Archiving
Industry/Domains where MongoDB® is Used
26. Slide 26 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 26 www.edureka.in/mongodb
Retail
Rich Product Catalogs
Customer Data Management
New Services
Digital Coupons
Real-time Price Optimization
Telecommunication
Consumer Cloud
Product Catalog
Customer Service Improvement
Machine-to-Machine (M2M) Platform
Real-time Network Analysis and Optimization
Industry/Domains where MongoDB® is Used
27. Slide 27 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 27 www.edureka.in/mongodb
How Popular is MongoDB® in the Industry?
Google search provides a wide indicator of MongoDB® adoption in the industry.
28. Slide 28 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 28 www.edureka.in/mongodb
Job Opportunity and Trends in MongoDB®
29. Slide 29 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 29 www.edureka.in/mongodb
Where Not to Use MongoDB®?
Highly transactional applications
Applications with traditional database system where requirements such as foreign-key
constraints etc are needed.