No SQL and MongoDB - Hyderabad Scalability Meetup

RDBMS: Past and
Present
Web Scale challenges today
Data explosion in past few years
Single web request may fire 10s/100s of
queries!
Agile development
Hardware challenges - leverage low cost cloud
infrastructure
Introduced in 1970s
Solved prevalent data storage issues

CAP Theorem - It is impossible for a
distributed computer system to
simultaneously provide all three at the
same time
The Need
A
C P
MongoDB, Redis,
Hbase, BigTable
Cassandra, SimpleDB,
DynamoRDBMS

5
Solutions
Availability
Automatic
Replication
Auto
Sharding
Integrated
Caching
Dynamic
Schema
Consistency
Document
Database
Graph
Stores
Key-Value
Stores
Column
Stores
NoSQL

6
Database
Types
NoSQ
L
Documen
t
Database
Graph
Stores
Key-
Value
Stores
Column
Stores

Document Database
What is it?
• Documents are independent units
• Can store semi-structured Data
with ease
Where is it
useful?
• Ex. Product information in an
ecommerce site.
Popular
DBs
• MongoDB, CouchDB

8
Graph stores
What is it?
• Based on graph theory
• Employ nodes, properties, and
edges
Where is it
useful?
• Ex. Social graphs
Popular
DBs
• Neo4j, AllegroGraph, GraphDB

Key-value stores
What is it?
• Stores key-value pairs.
• Several variations, such as in-
memory DBs
Where is it
useful?
• Ex. Quick access of data based on
a key
Popular
DBs
• Redis, Memcache

Column stores
What is it?
• Stores data in same columns at
same place, rather than data from
same rows
Where is it
useful?
• Ex. Semi-structured data
• Useful for large data with
aggregations
Popular
DBs
• HBase, BigTable (Google)

A Document database
Instead of storing data in rows and columns as one
would with a relational database, MongoDB stores a
binary form of JSON documents (BSON)
Does not impose flat, rigid schemas across many
tables like Relational Databases

Features of MongoDB
Document data model with dynamic
schemas
Full, flexible index support and rich queries
Auto-Sharding for horizontal scalability
Built-in replication for high availability
Text search
Advanced security
Aggregation Framework and MapReduce
Large media storage with GridFS

How does a row look?
{
FirstName:"Jonathan",
Address:"15 Wanamassa Point
Road",
Children:[
{Name:"Michael",Age:10},
{Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5},
{Name:"Elena", Age:2}
]
}

Terms and Concepts
SQL Terms/Concepts MongoDB Terms/Concepts
database database
table collection
row document or BSON document
column field
index index
table joins
embedded documents and
linking
primary key primary key
Specify any unique column or
column combination as primary
key.
In MongoDB, the primary key is
automatically set to the _id field.
aggregation (e.g. group by) aggregation framework

Common Operations - Create Table
SQL Schema Statements MongoDB Schema Statements
CREATE TABLE users (
id INT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
)
Implicitly created on first insert
operation. The primary key _id is
automatically added if _id field is
not specified.
db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
} )
Explicitly create a collection:
db.createCollection("users")

Common Operations – Alter
Table
SQL Alter Statements MongoDB Alter Statements
ALTER TABLE users
ADD join_date DATETIME
ALTER TABLE users
DROP COLUMN join_date
Collections do not describe or enforce
the structure of its documents.
Alternatively:
db.users.update(
{ },
{ $set: { join_date: new Date() } },
{ multi: true }
)
db.users.update(
{ },
{ $unset: { join_date: "" } },
{ multi: true }
)

Common Operations - Insert
SQL Insert Statements MongoDB Insert Statements
INSERT INTO users(user_id,
age,
status)
VALUES ("bcd001",
45,
"A")
db.users.insert( {
user_id: "bcd001",
age: 45,
status: "A"
} )

Common Operations - Select
SQL Select Statements MongoDB Select Statements
SELECT user_id, status
FROM users
WHERE status = "A“
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)

Common Operations - Update
SQL Update Statements MongoDB Update Statements
UPDATE users
SET status = "C"
WHERE age > 25
db.users.update(
{ age: { $gt: 25 } },
{ $set: { status: "C" } },
{ multi: true }
)

Common Operations - Delete
SQL Delete Statements MongoDB Delete Statements
DELETE FROM users
WHERE status = "D“
DELETE FROM users
db.users.remove( { status: "D" } )
db.users.remove( )

Case Study:
Designing A Product
Catalog

Problem Overview
Product Catalog
Designing an E-Commerce product
catalog system using MongoDB as a
storage engine
Product catalogs must have the
capacity to store many differed types
of objects with different sets of
attributes.

A Quick Look at
Relational Approaches to
this problem

Relational Data Models - 1
Concrete Table Inheritance: create a table for each product
category
CREATE TABLE `product_audio_album` (
`sku` char(8) NOT NULL,
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
...,
PRIMARY KEY(`sku`))
...
CREATE TABLE `product_film` (
...
Downside:
 You must create a new table for every new category of
products.
 You must explicitly tailor all queries for the exact type of

Single Table Inheritance: Single table for all products, add new
columns to store data for a new product
CREATE TABLE `product` (
...
`artist` varchar(255) DEFAULT NULL,
...
`title` varchar(255) DEFAULT NULL,
`rating` char(8) DEFAULT NULL,
...,
PRIMARY KEY(`sku`))
 Downside: More flexible, but at expense of space

Multiple Table Inheritance
CREATE TABLE `product` (
`title` varchar(255) DEFAULT NULL,
`price`, ...
PRIMARY KEY(`sku`))
CREATE TABLE `product_audio_album` (
...,
PRIMARY KEY(`sku`),
FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))
...
CREATE TABLE `product_film` (
...
Downside: More flexible and saves space, but JOINs are very expensive

Entity Attribute Values
Entity Attribute Value
sku_00e8da9b type Audio Album
sku_00e8da9b title A Love Supreme
sku_00e8da9b ... ...
sku_00e8da9b artist John Coltrane
sku_00e8da9b genre Jazz
sku_00e8da9b genre General
... ... ...
Downside: Totally flexible, but non-trivial queries
need large number of JOINs

Non-relational Data Model
 Use a single MongoDB collection to store
all the product data
 Dynamic schema means that each
document need not conform to the same
schema
 The document for each product only needs
to contain attributes relevant to that product.

So how does data look in
MongoDB with the non-relational
approach?

{
sku: "00e8da9b",
type: "Audio Album",
title: "A Love Supreme",
description: "by John Coltrane",
asin: "B0000A118M",
shipping: {
…
},
pricing: {
…
},
details: {
…
}
}

When to Choose MongoDB over RDBMS

2/17/2015
Best Practices for MongoDB
 NoSQL products (and among them
MongoDB) should be used to meet
specific challenges.

2/17/2015
High Write Load
 - MongoDB by default prefers high
insert rate over transaction safety.
 - Preferably low business value for
each record
 - Good examples are logs, streaming
data, bulk loads

2/17/2015
High Availability in an Unreliable
Environment
 - Setting replicaSet (set of servers that
act as Master-Slaves) is easy and fast.
 - Instant recovery (automatic) from
failures of nodes (or data-center)

2/17/2015
Growth in data size with time
 - Partitioning tables is complicated in
RDBMS
 - IF your data is going to cross a few
GB for each table, you should
consider where you want to store it
 - MongoDB provides simple sharding
mechanism to shard the data and
horizontally scale your application

2/17/2015
Location Based Service
 - Use MongoDB if you store geo-
locations and wish to perform
proximity queries or related searches
 - MongoDB geo queries are fast and
accurate
 - Several use cases of geo-locations
in production apps

2/17/2015
Large data sets with Unstable
schema
 - Your data is reasonably large then its
complicated to change your schema
 - When you work in Agile model your
product can change shape
dynamically
 - MongoDB is schema-less

2/17/2015
No Dedicated DBA!
 - Complicated operations such as
normalization, joins are avoided in
MongoDB
 - Backup, storage mechanism
provided out of the box (MMS)

Scaling: Sharding
- Scale linearly as data grows
- Add more nodes
- Choose a shard key wisely

Scaling: Replica Sets
- Make your system highly available
- Read Only Replicas for reporting, help
reduce load
- Read Consistency across Replicas

More Scaling?
- Capped Collections
- Use SSDs
- More RAM
- Faster cores rather than more cores
(mongod not optimized for multi-core)
- Consider Aggregation framework for
complex reports
- Text Search Support!

2/17/2015
Real-world case study
 http://www.slideshare.net/oc666/mong
odb-user-group-billrun
 - BillRun, a next generation Open
Source billing solution that utilizes
MongoDB as its data store.
 - This billing system runs in production
in the fastest growing cellular operator
in Israel, where it processes over
500M CDRs (call data records) each
month.

2/17/2015
Schema-less design
 - enables rapid introduction of new
CDR types to the system.
 - It lets BillRun keep the data store
generic.

2/17/2015
Scale
 - BillRun production site already
manages several TB in a single table.
 - Not limited by adding new fields or
being limited by growth

2/17/2015
Rapid replicaSet
- enables meeting regulation with
easy to setup multi data center DRP
and HA solution.

2/17/2015
Sharding
 - enables linear and scale out growth
without running out of budget.

2/17/2015
Geo API
 - is being utilized to analyze users
usage and determining where to
invest in cellular infrastructure

2/17/2015
HuMongous
 With over 2,000/s CDR inserts,
MongoDB architecture is great for a
system that must support high insert
load. Yet you can guarantee
transactions with findAndModify
(which is slower) and two-phase
commit (application wise).

References and further
readings!
 - MongoDB documentation:
http://docs.mongodb.org/manual/
 - Tutorials and certificate programs:
https://education.10gen.com/
 References:
 - http://java.dzone.com/articles/when-
use-mongodb-rather-mysql
 -
http://www.mysqlperformanceblog.co
m/2013/08/01/schema-design-in-
mongodb-vs-schema-design-in-mysql/

{
Topic:"MongoDB By Example",
Presenter:"Ritesh Gupta",
Info:{
Mail:["ritesh.gupta@techvedika.com"]
Designation:"Sr Architect",
Company:"TechVedika"
Url:"www.techvedika.com"
}
}

No SQL and MongoDB - Hyderabad Scalability Meetup

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à No SQL and MongoDB - Hyderabad Scalability Meetup

Similaire à No SQL and MongoDB - Hyderabad Scalability Meetup (20)

Plus de Hyderabad Scalability Meetup

Plus de Hyderabad Scalability Meetup (15)

Dernier

Dernier (20)

No SQL and MongoDB - Hyderabad Scalability Meetup

Notes de l'éditeur