Distilled mongo db by Boris Trofimov

distilled

Boris Trofimov
Team Lead@Sigma Ukraine

@b0ris_1
btrofimoff@gmail.com

Agenda
●
Part 1. Why NoSQL
– SQL benefints and critics
– NoSQL challange
●
Part 2. MongoDB
– Overview
– Console and query example
– Java Integration
– Data consistancy
– Scaling
– Tips

SQL

●
Simplicity
●
Uniform representation
●
Runtime schema modifications

SELECT DISTINCT p.LastName, p.FirstName
FROM Person.Person AS p
JOIN HumanResources.Employee AS e
ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN
(SELECT Bonus
FROM Sales.SalesPerson AS sp
WHERE e.BusinessEntityID = sp.BusinessEntityID);

Strong consistency
SQL features like
Foreign and Primary Keys, Unique
fields

ACID (atomicity, consistency, isolation,
durability) transactions

Business transactions ~ system transactions

Big gap between domain and
relational model

Performance Issues

JOINS Minimization Query Optimization Choosing right transaction strategy

Consistency costs too much

Normalization Impact Performance issues

Schema migration issues
Consistency issues
Reinventing bicycle
Involving external tools like DBDeploy

Scaling options

Consistency issues
Poor scaling options

SQL Opposition

●
Object Databases by OMG
●
ORM
●
?

No SQL Yes
●
Transactionaless in usual understanding
●
Schemaless, no migration
●
Closer to domain
●
Focused on aggregates
●
Trully scalable

Aggregate oriented Databases
●
Document databases implement idea of Aggregate
oriented database.
●
Aggregate is a storage atom
●
Aggregate oriented databsaes are closer to application
domain.
●
Ensures atomic operations with aggregate
●
Aggregate might be replicated or sharded efficiently
●
Major question: to embed or not to embed

// in customers
{
"id":1,
"name":"Medvedev",
"billingAddress":[{"city":"Moscow"}]
}

// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
"productName": "iPhone 5"
}
],
"shippingAddress":[{"city":"Moscow"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Moscow"}
}
],
}

Relational Model Document Model

MongoDB Basics

MongoDB is document-
oriented and DBMS
MongoDB is Client-Server
DBMS
JSON/JavaScript is major
language to access

Mongo DB = Collections + Indexes

Collections
Name
Documents

Indexes

Two documents from the same
collection might be completly different

Simple creating (during first insert).

Document
Identifier (_id)

Body i JSON (Internally BSON)
{
"fullName" : "Fedor Buhankin",
"course" : 5,
"univercity" : "ONPU",
"faculty" : "IKS",
"_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" }
"homeAddress" : "Ukraine, Odessa 23/34",
"averageAssessment" : 5,
"subjects" : [
"math",
"literature",
"drawing",
"psychology"
]
}

●
Major bricks: scalar value, map and list
●
Any part of the ducument can be indexed
●
Max document size is 16M

// in customers
{

Simple Select "id":1,
"name":"Medvedev",
}

// in orders
SELECT * FROM ORDERS; {
"id":99,
"customerId":1,
"orderItems":[
{
"productId":47,
"price": 444.45,
}
db.orders.find() ],
"shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

// in customers
{

Simple Condition "id":1,
"name":"Medvedev",
}

// in orders
SELECT * FROM ORDERS WHERE {
"id":99,
"customerId":1,
customerId = 1; "orderItems":[
{
"productId":47,
"price": 444.45,
}
db.orders.find( {"customerId":1} ) ],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

// in customers
{

Simple Comparison "id":1,
"name":"Medvedev",
}

// in orders
SELECT * {
"id":99,
FROM orders "customerId":1,
"orderItems":[
{
WHERE customerId > 1 "productId":47,
"price": 444.45,
}
],
db.orders.find({ "customerId" : { $gt: 1 } } ); "orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

// in customers
{

AND Condition "id":1,
"name":"Medvedev",
}

// in orders
SELECT * {
"id":99,
"orderItems":[
{
WHERE customerId = 1 AND "productId":47,
"price": 444.45,
orderDate is not NULL "productName": "iPhone 5"
}
],
"orderPayment":[
{
db.orders.find( { customerId:1, orderDate : "ccinfo":"1000-1000-1000-1000",
{ $exists : true } } ); "billingAddress": {"city": "Moscow"}
}
]
}

// in customers
{

OR Condition "id":1,
"name":"Medvedev",
}

// in orders
SELECT * {
"id":99,
"orderItems":[
{
WHERE customerId = 100 OR "productId":47,
"price": 444.45,
orderDate is not NULL "productName": "iPhone 5"
}
],
"orderPayment":[
{
db.orders.find( { $or:[ {customerId:100}, "ccinfo":"1000-1000-1000-1000",
{orderDate : { $exists : false }} ] } );
}
]
}

// in customers
{

Select fields "id":1,
"name":"Medvedev",
}

// in orders
SELECT orderId, orderDate {
"id":99,
"orderItems":[
{
WHERE customerId = 1 "productId":47,
"price": 444.45,
}
db.orders.find({customerId:1}, ],
{orderId:1,orderDate:1}) "shippingAddress":[{"city":"Moscow"}],
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
}
]
}

// in customers
{

Inner select "id":1,
"name":"Medvedev",
}

// in orders
SELECT * {
"id":99,
FROM "customerId":1,
"orderItems":[
Orders {
WHERE "productId":47,
"price": 444.45,
Orders.id IN ( "productName": "iPhone 5"
}
SELECT id FROM orderItem ],
"orderPayment":[
WHERE productName LIKE '%iPhone%' {
"ccinfo":"1000-1000-1000-1000",
) "txnId":"abelif879rft",
db.orders.find( }
]
{"orderItems.productName":/.*iPhone.*/}
}
)

// in customers
{

NULL checks "id":1,
"name":"Medvedev",
}

// in orders
SELECT * {
"id":99,
"orderItems":[
{
WHERE orderDate is NULL "productId":47,
"price": 444.45,
}
],
db.orders.find( "orderPayment":[
{ orderDate : { $exists : false } } {
"ccinfo":"1000-1000-1000-1000",
); "txnId":"abelif879rft",
}
]
}

More examples
• db.orders.sort().skip(20).limit(10)

• db.orders.count({ "orderItems.price" : { $gt: 444 })

• db.orders.find( { orderItems: { "productId":47, "price": 444.45,
"productName": "iPhone 5" } } );

• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )

Queries between collections
●
Remember, MongoDB = no JOINs

●
1 approach: Perform multiple queries (lazy loading)
●
2 approach: use MapReduce framework
●
3 approach: use Aggregation Framework

Map Reduce Framework
●
Is used to perform complex grouping with collection
documents
●
Is able to manipulate over multiple collections
●
Uses MapReduce pattern
●
Use JavaScript language
●
Support sharded environment
●
The result is similar to materialized views

Map Reduce Concept
Launch map Launch reduce
For every elem

a11 map
map b1
1

a22 map
map b2
2

a33 map
map b3
3

a44 map
map b4
4
reduce
reduce c

a55 map
map b5
5

a66 map
map b6
6

... ...

ann map
map bn
n

f map : A → B f reduce : B[ ]→ C

How it works
Input Implement REDUCE function

Implement MAP function
Collection X

MAP Execute MAP func:
Mark each document
with specific color

REDUCE
Execute REDUCE func:
Merge each colored set
into single element

Output

Take amount of orders for each customer
db.cutomers_orders.remove();

mapUsers = function() {
emit( this.customerId, {count: 1, this.customerId} );
};

reduce = function(key, values) {
var result = {count: 0, customerId:key};

values.forEach(function(value) {
result.count += value.count;
});

return result;
};

db.customers.mapReduce(mapUsers, reduce, {"out": {"replace"
"cutomers_orders"}});

Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]

Aggregation and
Aggregation Framework
●
Simplify most used mapreduce operarions like
group by criteria
●
Restriction on pipeline size is 16MB
●
Support sharded environment (Aggregation
Framework only)

Indexes

●
Anything might be indexed
●
Indexes improve performance
●
Implementation uses B-trees

Access via API
Use Official MongoDB Java Driver (just include mongo.jar)
Mongo m = new Mongo();
// or
Mongo m = new Mongo( "localhost" );
// or
Mongo m = new Mongo( "localhost" , 27017 );
// or, to connect to a replica set, supply a seed list of members
Mongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017),
new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)))

DB db = m.getDB( "mydb" );

DBCollection coll = db.getCollection("customers");

ArrayList list = new ArrayList();

list.add(new BasicDBObject("city", "Odessa"));

BasicDBObject doc= new BasicDBObject();

doc.put("name", "Kaktus");
doc.put("billingAddress", list);

coll.insert(doc);

Closer to Domain model
●
Morphia http://code.google.com/p/morphia/
●
Spring Data for MongoDB
http://www.springsource.org/spring-data/mongodb

Major features:
●
Type-safe POJO centric model
●
Annotations based mapping behavior
●
Good performance
●
DAO templates
●
Simple criterias

Example with Morphia
@Entity("Customers")
class Customer {
@Id ObjectId id; // auto-generated, if not set (see ObjectId)
@Indexed String name; // value types are automatically persisted

List<Address> billingAddress; // by default fields are @Embedded

Key<Customer> bestFriend; //referenceto external document
@Reference List<Customer> partners = new ArrayList<Customer>(); //refs are
stored and loaded automatically

// ... getters and setters

//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist...
@PostLoad void postLoad(DBObject dbObj) { ... }
}

Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")

morphia.map(Customer.class);

Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...));

Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();

To embed or not to embed
●
Separate collections are good if you need
to select individual documents, need
more control over querying, or have huge
documents.

●
Embedded documents are good when
you want the entire document, size of the
document is predicted. Embedded
documents provide perfect performance.

Schema migration
●
Schemaless
●
Main focus is how the aplication will behave when
new field will has been added
●
Incremental migration technque (version field)
Use Cases :
– removing field
– renaming fields
– refactoring aggregate

Data Consistency
●
Transactional consistency
– domain design should take into account aggregate atomicity
●
Replication consistency
– Take into account Inconsistency window (sticky sessions)
●
Eventual consistency
●
Accept CAP theorem
– it is impossible for a distributed computer system to simultaneously provide all
three of the following guarantees: consistency, availability and partition
tolerance.

Scaling options

●
Autosharding
●
Master-Slave replication
●
Replica Set clusterization
●
Sharding + Replica Set

Sharding
●
MongoDB supports autosharding
●
Just specify shard key and pattern
●
Sharding increases writes
●
Major way for scaling the system

Master-Slave replication
●
One master, many slaves
●
Slaves might be hidden or can be used to read
●
Master-Slave increase
reades and provides
reliability

Replica Set clusterization
●
The replica set automatically elects a primary (master)
●
Master shares the same state between all replicas
●
Limitation (limit: 12 nodes)
●
WriteConcern option
●
Benefits:
– Failover and Reliability
– Distributing read load
– maintance without downtime

Sharding + ReplicaSet
●
Allows to build huge scalable failover database

MongoDB Criticism

●
Dataloss reports on heavy-write configurations
●
Atomic operatons over multiple documents

When not to use

●
Heavy cross-document atomic operations
●
Queries against varying aggregate structure

Tips
●
Do not use autoincrement ids
●
Small names are are preffered
●
By default DAO methods are async
●
Think twise on collection design
●
Use atomic modifications for a document

Out of scope
●
MapReduce options
●
Indexes
●
Capped collections

Further reading

http://www.mongodb.org

Martin Fowler NoSQL Distilled

Kyle Banker, MongoDB in Action

Distilled mongo db by Boris Trofimov

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Distilled mongo db by Boris Trofimov

Similar to Distilled mongo db by Boris Trofimov (20)

More from Alex Tumanoff

More from Alex Tumanoff (20)

Distilled mongo db by Boris Trofimov