2. Speaker
Software architect/developer Pronetics/Sourcesense
Founder Spring Italian User Group
Chairman JugSardegna
Committer/Contributor OpenNMS - MongoDB
Author Spring 2.5 Aspect Oriented Programming
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
4. Some MongoDB production deployments
http://www.mongodb.org/display/DOCS/Production+Deployments
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
5. Main Features
● Document Oriented
● Easy scalability
- Documents (objects) map nicely to programming language data types - "slaveOK" reads are distributed over replicated servers
- Dynamically-typed (schemaless) for easy schema evolution - Automatic sharding (auto-partitioning of data across servers)
- No joins and no transactions for high performance - Reads and writes are distributed over shards
and easy scalability - No joins and no transactions make distributed queries easy and fast
● High Performance
● High Availability
- No joins and no transactions makes reads and writes fast - Replicated servers with automatic master failover
- Indexes with indexing into embedded documents and arrays
● Indexing
- Optional asynchronous writes
● Stored JavaScript
● Rich Query Language
● Fixed-size collection
● File storage
● MapReduce
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
6. No Sql Injection
Mongo is invulnerable to injection attacks, no code execution
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
7. Document as Basic unit of Data in BSON (Binary JSON) format
{ "name" : "MongoDB",
"info" : { "storage" : "Binary JSON (BSON)",
"full index" : "true", Document: an Ordered set of keys with associated values
"scale" : "Autosharding",
"query" : "Rich document-base queries",
"replication" : "Replica sets",
"atomic modifiers" : "Fast in place update",
"binary content" : "GridFS",
"batch operation" : "Map/Reduce”,
"js server side" : ”true”
}
"greeting": {"international" : "Hello, world!", "italy" :"Ciao Mondo !" }
}, "_id” : "024x6f279578a64bb0666945"
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
8. Grouping
SQL
Table contains Rows
MONGO
Collection and subcollections contains Documents
* Document Limit: Larger than 4 Mb, the entire text of War and Peace is 3.14Mb
Collection are are created dynamically and automatically grow in size to fit additional data
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
9. A single instance of MongoDB can host
multiple independent databases, each of
which can have its own collections
and permissions.
Photo from http://www.aibento.net/
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
10. No Join's cost
● SQL
SELECT * FROM posts
INNER JOIN posts_tags ON posts.id = posts_tags.post_id
INNER JOIN tags ON posts_tags.tag_id == tags.id
WHERE tags.text = 'politics' AND posts.vote_count > 10;
● MONGO
db.posts.find({'tags': 'politics', 'vote_count': {'$gt': 10}});
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
11. Collections Schema free
Documents within a single collection can have any number of different "shapes”
In theory, each document in a collection
can have a completely different structure;
in practice, a collection's documents
will be relatively uniform.
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
13. Common operations
Fastest : Fire and Forget
Command with response : getLastError
Examples with
Server side Javascript via mongo shell
Java via MongoDB Official 10gen Driver
Scala via Casbah Official 10gen scala driver + Salat serializer
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
16. Updating (The schema can be changed dinamically)
//mongo shell
var hero = db.dcComicsColl.findOne({"alias" : "batman"});
hero.gadget = {"car" : "batmobile”};
db.dcComicsColl.update({”aias” : ”batman"}, hero, true);
//java
DbObject query = BasicDBObjectBuilder.start().add("surname",”wayne”).get();
DbObject hero = BasicDBObjectBuilder.start().add("gadget",”batmobile”).get();
dcComicsCollection.update(query, hero, false, true);
//scala
val query = MongoDBObject("name" -> "bruce")
val hero = MongoDBObject("gadget" -> "batmobile")
dcComicsCollection.update(query, hero, true, false)
* The blue value it's the upsert, update or insert if not present
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
17. Querying
//mongo shell
db.dcComicsColl.find({"alias":"batman"}) // all field of the doc
db.dcComicsColl.find({"alias":"batman"},{”surname":1})//surname desc
//java
DBObject query = BasicDBObjectBuilder.start().add("alias", "batman").get();
DBObject out = BasicDBObjectBuilder.start().add("surname", "1").get()
DBCursor cursor = dcComicsColl.find(query, out);
//scala
val query = MongoDBObject("alias" -> "batman")
val obj = dcComicsColl.findOne(query)
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
18. Modifier
Partial updates
$set (set a key, or add if not present)
$inc (with numbers)
$push (add to the end of an array)
$ne
$addToSet
$each
$pop (remove from the end)
$pull (remove element that match criteria)
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
22. Query on array and grouping
Array inside a document
$all $size $slice
Grouping
count distinct group finalize $key
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
23. Javascript as part of a query
db.mycollection.find({
"$where" : function (){
for (var current in this) {
….
}
}
})
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
25. GridFS
GridFS is a specification for storing large files such video, photos, blob in
MongoDB .
GridFS uses two collections to store data:
● files contains the object metadata
● chunks contains the binary chunks with some additional accounting
information
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
26. Map Reduce
map = function() {
for (var key in this) {
emit(key, {count : 1});
};
}
reduce = function(key, emits) {
total = 0;
for (var i in emits) {
total += emits[i].count;
}
return {"count" : total};
}
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
27. Scaling
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
28. Scaling
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
29. Shard
Sharding = break up collections into smaller chunks
Splitting data and storing different portions of the data on different machines, also
know as partitioning
The chunks can be distributed across shards so that each shard is responsible for a
subset of the total data set.
A shard is a container that holds a subset of a collection’s data. A shard is either a
single mongod server (for development/testing) or a replica set (for production).
Thus, even if there are many servers in a shard, there is only one master, and all of
the servers contain the same data.
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
30. Mongos
The client don't know what shard has what data, or even that data is broken up on
different shards. In front of the shard run routing process called mongos.
This router know when the data are located and the client can see a normal
mongod, like a noshard environment.
This is the router process and comes with all MongoDB distributions. It basically
just routes requests and aggregates responses. It doesn’t store any data or config-
uration information. (Although it does cache information from the config servers.)
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
31. Config server
Config servers store the configuration of the cluster: which data is on which shard.
Because mongos doesn’t store anything permanently, it needs somewhere to get
the shard configuration. It syncs this data from the config servers.
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna
32. Replication
Replica set, clustering with automatic failover
Master is elected by the cluster and may change to another node if the current
master goes down.
This election process will be initiated by any node that cannot reach the primary.
The highest-priority most-up-to-date server will become the new primary.
The replication is asynchronous
Massimiliano Dessì – desmax74@yahoo.it – Jug Sardegna