SlideShare une entreprise Scribd logo
1  sur  262
Télécharger pour lire hors ligne
MongoDB for Coder
Uwe Seiler
About me

Big Data Nerd

Hadoop Trainer MongoDB Author

Photography Enthusiast

About us
is a bunch of…

Big Data Nerds

Agile Ninjas

Continuous Delivery Gurus

Join us!
Enterprise Java Specialists Performance Geeks
Agenda I
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD

with MongoDB
3. Indexing: Speed up your queries with

4. MapReduce: Data aggregation with

5. Aggregation Framework: Data

aggregation done the MongoDB way
6. Replication: High Availability with

7. Sharding: Scaling with MongoDB




Live Coding




Labs on your own computer
And please…

If you have
questions, please
share them with us!
And now start your downloads…

Lab files:
Buzzword Bingo
Classification of NoSQL
Key-Value Stores










Column Stores


Graph Databases





Document Stores
Big Data
My favorite definition
The classic definition

The 3 V’s of Big Data

Volume Velocity •Variety
«Big Data» != Hadoop
Vertical Scaling

Vertical Scaling

Vertical Scaling

Horizontal Scaling

Horizontal Scaling





Horizontal Scaling














The problem
The CAP Theorem
a guarantee
that every
receives a

all nodes see
the same data
at the same

failure of
single nodes
doesn‘t effect
the overall
Overview of NoSQL systems

a guarantee
that every
receives a




all nodes see
the same data
at the same

failure of
single nodes
doesn‘t effect
the overall
The problem



Atomicity RDBMS

ACID is a good
concept but it is not
a written law!

Basically Available
Soft State


Eventually consistent




Strong consistency
Isolation & Transactions
Complex Development
More reliable

Eventual consistency
Highly Available
Eases development
Overview of MongoDB
MongoDB is a…



open source


highly performant






highly available



Document Database

Not PDF, Word, etc. … JSON!
Open Source Database

MongoDB is a open source project


Available on GitHub


Uses the AGPL Lizenz


Started and sponsored by MongoDB Inc.
(prior: 10gen)


Commercial version and support available


Join the crowd!



Flexible Schema

_id :
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type :
plan : "Standard" }
Auto Sharding

• Increase capacity as you go
• Commodity and cloud architectures
• Improved operational simplicity and cost visibility
High Availability

• Automated replication and failover
• Multi-data center support
• Improved operational simplicity (e.g., HW swaps)
• Data durability and consistency
MongoDB Architecture
Rich Query Language
Aggregation Framework




Shard 1
Shard 2


Reduce(k, values)

Finalize(k, v)
Geo Information
Driver & Shell
Drivers are available
for almost all popular
languages and




Shell to interact with the




> db.collection.insert({product:“MongoDB”,
type:“Document Database”})
> db.collection.findOne()
: ObjectId(“5106c1c2fc629bfe52792e86”),
: “MongoDB”
: “Document Database”
NoSQL Trends
Google Search

LinkedIn Job Skills
Competitor 1
Competitor 2
Competitor 3
Competitor 4
Competitor 5


Competitor 2

Competitor 1

Competitor 4

Competitor 3

All Others

Jaspersoft Big Data Index Trends
Top Job Trends

Direct Real-Time Downloads
Competitor 1
Competitor 2
Competitor 3

1.HTML 5
5.Mobile Apps
10.Social Media
Data manipulation
Table / View
Foreign Key


Embedded document
Referenced document
Example: Simple blog model
MongoDB Collections







Schema design for the blog
Let’s have a look…
Create a database
// Show all databases
> show dbs
digg 0.078125GB
enron 1.49951171875GB
// Switch to a database
> use blog
// Show all databases again
> show dbs
digg 0.078125GB
enron 1.49951171875GB
Create a collection I
// Show all collections
> show collections
// Insert a user
> db.user.insert(
{ name : “Sheldon“,
mail : ““ }

No feedback about the result of the insert,
db.runCommand( { getLastError: 1} )
Create a collection II
// Show all collections
> show collections
// Show all databases
> show dbs
blog 0.0625GB
digg 0.078125GB
enron 1.49951171875GB

Databases and collections are
automatically created during
the first insert operation!
Read from a collection
// Show the first document
> db.user.findOne()
"_id" : ObjectId("516684a32f391f3c2fcb80ed"),
"name" : "Sheldon",
"mail" : ""
// Show all documents of a collection
> db.user.find()
"_id" : ObjectId("516684a32f391f3c2fcb80ed"),
"name" : "Sheldon",
"mail" : ""
Find documents
// Find a specific document
> db.user.find( { name : ”Penny” } )
"_id" : ObjectId("5166a9dc2f391f3c2fcb80f1"),
"name" : "Penny",
"mail" : ""
// Show only certain fields of the document
> db.user.find( { name : ”Penny” },
{_id: 0, mail : 1} )
{ "mail" : "" }

_id is the primary key in MongoDB


_id is created automatically


If not specified differently, it‘s type is


_id can be specified by the user during the
insert of documents, but needs to be
unique (and can not be edited afterwards)

A ObjectId is a special 12 Byte value


It‘s uniqueness in the whole cluster is
guaranteed as following:
mac pid inc
// Use a cursor with find()
> var myCursor = db.user.find( )
// Get the next document
> var myDocument =
myCursor.hasNext() ? : null;
> if (myDocument) { printjson(myDocument.mail); }
// Show all other documents
> myCursor.forEach(printjson);

By default the shell displays
20 documents
Logical operators
// Find documents using OR
> db.user.find(
{$or : [ { name : “Sheldon“ },
{ mail : }
// Find documents using AND
> db.user.find(
{$and : [ { name : “Sheldon“ },
{ mail : }
Manipulating results
// Sort documents
> db.user.find().sort( { name : 1 } ) // Aufsteigend
> db.user.find().sort( { name : -1 } ) // Absteigend
// Limit the number of documents
> db.user.find().limit(3)
// Skip documents
> db.user.find().skip(2)
// Combination of both methods
> db.user.find().skip(2).limit(3)
Updating documents I
// Updating only the mail address (How not to do…)
> db.user.update( { name : “Sheldon“ },
{ mail : ““ }
// Result of the update operation
"_id" : ObjectId("516684a32f391f3c2fcb80ed"),
"mail" : ""

Be careful when updating
Deleting documents
// Deleting a document
> db.user.remove(
{ mail : ““ }
// Deleting all documents in a collection
> db.user.remove()
// Use a condition to delete documents
> db.user.remove(
{ mail : /.*$/ } )
// Delete only the first document using a condition
> db.user.remove( { mail : /.*.com$/ }, true )
Updating documents II
// Updating only the mail address (This time for real)
> db.user.update( { name : “Sheldon“ },
{ $set : {
mail : ““
// Show the result of the update operation
db.user.find(name : “Sheldon“)
"_id" : ObjectId("5166ba122f391f3c2fcb80f5"),
"mail" : "",
"name" : "Sheldon"
Adding to arrays
// Adding a array
> db.user.update( {name : “Sheldon“ },
{ $set : {enemies :
[ { name : “Wil Wheaton“ },
{ name : “Barry Kripke“ }
// Adding a value to the array
> db.user.update( { name : “Sheldon“},
{ $push : {enemies :
{ name : “Leslie Winkle“}
Deleting from arrays
// Deleting a value from an array
> db.user.update( { name : “Sheldon“ },
{$pull : {enemies :
{name : “Barry Kripke“ }
// Deleting of a complete array
> db.user.update( {name : “Sheldon“},
{$unset : {enemies : 1}}
Adding a subdocument
// Adding a subdocument to an existing document
> db.user.update( { name : “Sheldon“}, {
$set : { mother :{ name : “Mary Cooper“,
residence : “Galveston, Texas“,
religion : “Evangelical Christian“ }}})
"_id" : ObjectId("5166cf162f391f3c2fcb80f7"),
"mail" : "",
"mother" : {
"name" : "Mary Cooper",
"residence" : "Galveston, Texas",
"religion" : "Evangelical Christian"
"name" : "Sheldon"
Querying subdocuments
// Finding out the name of the mother
> db.user.find( { name : “Sheldon“},
{““ : 1 } )
"_id" : ObjectId("5166cf162f391f3c2fcb80f7"),
"mother" : {
"name" : "Mary Cooper"

Compound field names need to
be in “…“!
Overview of all update operators
For fields:
Bitwise operation:

For arrays:
$each (Modifier)
$slice (Modifier)
$sort (Modifier)



Lab time!

Lab Nr. 02
Time box:
20 min
What is an index?





Chained lists








Find Nr. 7 in the chained list!






Find Nr. 7 in a tree!

Indices in MongoDB are B-Trees
Find, Insert and Delete Operations:

Missing or non-optimal
indices are the singlemost avoidable
performance issue
How do I create an index?
// Create a non-existing index for a field
>{ main_ingredient: 1 })

// Make sure there is an index on the field
>{ main_ingredient: 1 })

* 1 for ascending, -1 for descending
What can be indexed?
// Multiple fields (Compound Key Indexes)
main_ingredient: 1,
calories: -1
// Arrays with values (Multikey Indexes)
name: 'Chicken Noodle Soup’,
ingredients : ['chicken', 'noodles']
>{ ingredients: 1 })
What can be indexed?
// Subdocuments
name : 'Apple Pie',
contributor: {
name: 'Joe American',
id: 'joea123'
}{ '': 1 }){ 'contributor': 1 })
How to maintain indices?
// List all indices of a collection

// Drop an index
>{ ingredients: 1 })

// Drop and recreate all indices of a collection
More options

Unique Index
– Allows only unique values in the indexed field(s)


Sparse Index
– For fields that are not available in all documents


Geospatial Index
– For modelling 2D and 3D geospatial indices


TTL Collections
– Are automatically deleted after x seconds
Unique Index
// Make sure the name of a recipe is unique
> { name: 1 }, { unique: true } )

// Force an index on a collection with non-unique values
// Duplicates will be deleted more or less randomly!
{ name: 1 },
{ unique: true, dropDups: true }

* dropDups should be used only with caution!
Sparse Index
// Only documents with the field calories will be indexed
{ calories: -1 },
{ sparse: true }
// Combination with unique index is possible
{ name: 1 , calories: -1 },
{ unique: true, sparse: true }
* Missing fields will be saved as null in the index!
Geospatial Index
// Add longitude and altitude
name: ‚codecentric Frankfurt’,
loc: [ 50.11678, 8.67206]
// Index the 2D coordinates
> db.locations.ensureIndex( { loc : '2d' } )

// Find locations near codecentric Frankfurt
> db.locations.find({
loc: { $near: [ 50.1, 8.7 ] }
TTL Collections
// Documents need a field of type BSON UTC
{ ' submitted_date ' : ISODate('2012-10-12T05:24:07.211Z'), … }

// Documents will be deleted automatically by a daemon process
// after 'expireAfterSeconds'
{ submitted_date: 1 },
{ expireAfterSeconds: 3600 }
Limitations of indices

Collections can‘t have more than 64 indices


Index keys are not allowed to be larger than 1024 Byte


The name of an index (including name space) must be
less than 128 character


Queries can only make use of one index
– Exception: Queries using $or


Indices are tried to be kept in-memory


Indices slow down the writing of data
Optimizing indices
Best practice
1. Identify slow queries
2. Find out more about the slow queries

using explain()
3. Create appropriate indices on the fields

being queried
4. Optimize the query taking the

available indices into account
1. Identify slow queries
> db.setProfilingLevel( n , slowms=100ms )

n=0: Profiler off
n=1: Log all operations slower than slowms
n=2: Log all operations

> db.system.profile.find()

* The collection profile is a capped collection with a limited number of
2. Usage of explain()
> { calories:
{ $lt : 40 } }
).explain( )
"cursor" : "BasicCursor" ,
"n" : 42,
"nscannedObjects” : 53641
"nscanned" : 53641,
"millis" : 252,
2. Metrics of the execution plan I
• Cursor
– The type of the cursor: BasicCursor means no idex

has been used

• n
– The number of matched documents

• nscannedObjects
– The number of scanned documents

• nscanned
– The number of scanned entries (Index entries or

2. Metrics of the execution plan II
• millis
– Execution time of the query

• Complete reference can be found here

Optimize for




3. Create appropriate indices
on the fields being queried
4. Optimize queries taking the
available indices into account
// Using the following index…
> db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })
// … these queries and sorts can make use of the index
> db.collection.find( ).sort({ a:1 })
> db.collection.find( ).sort({ a:1, b:1 })
> db.collection.find({ a:4 }).sort({ a:1, b:1 })
> db.collection.find({ b:5 }).sort({ a:1, b:1 })
4. Optimize queries taking the
available indices into account
// Using the following index…
> db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })

// … the these queries can not make use of it
> db.collection.find( ).sort({ b: 1 })
> db.collection.find({ b: 5 }).sort({ b: 1 })
4. Optimize queries taking the
available indices into account
// Using the following index…
>{ main_ingredient: 1, name: 1 })
// … this query can be complete satisfied using the index!
{ main_ingredient: 'chicken’ },
{ _id: 0, name: 1 }
// The metric indexOnly using explain() verifies this:
{ main_ingredient: 'chicken' },
{ _id: 0, name: 1 }
"indexOnly": true,
Use specific indices
// Tell MongoDB explicitly which index to use
calories: { $lt: 1000 } }
).hint({ _id: 1 })

// Switch the usage of idices completely off (e.g. for performance
// measurements)
{ calories: { $lt: 1000 } }
).hint({ $natural: 1 })
Caveats using indices
Using multiple indices
// MongoDB can only use one index per query!
> db.collection.ensureIndex({ a: 1 })
> db.collection.ensureIndex({ b: 1 })

// For this query only one of those two indices can be used
> db.collection.find({ a: 3, b: 4 })
Compound indices
// Compound indices are often very efficient!
> db.collection.ensureIndex({ a: 1, b: 1, c: 1 })

// But only if the query is a prefix of the index...

// This query can make use of the index
db.collection.find({ c: 2 })

// …but this query can
db.collection.find({ a: 3, b: 5 })
Indices with low selectivity
// The following field has only few distinct values
> db.collection.distinct('status’)
[ 'new', 'processed' ]
// A index on this field is not the best idea…
> db.collection.ensureIndex({ status: 1 })
> db.collection.find({ status: 'new' })
// Better use a adequate compound index with other fields
> db.collection.ensureIndex({ status: 1, created_at: -1 })
> db.collection.find(
{ status: 'new' }
).sort({ created_at: -1 })
Regular expressions & Indices
> db.users.ensureIndex({ username: 1 })

// Left-bound regular expressions can make usage of this index
> db.users.find({ username: /^joe smith/ })

// But not queries with regular expressions in general…
> db.users.find({username: /smith/ })

// Also not case-insensitive queries…
> db.users.find({ username: /^Joe/i })
Negations & Indices
// Negations can not make use of indices
> db.things.ensureIndex({ x: 1 })
// e.g. queries using not equal
> db.things.find({ x: { $ne: 3 } })
// …or queries with not in
> db.things.find({ x: { $nin: [2, 3, 4 ] } })
// …or queries with the $not operator
> db.people.find({ name: { $not: 'John Doe' } })
Lab time!

Lab Nr. 03
Time box:
20 min
What is Map/Reduce?

Programming model coming from
functional languages


Framework for
– parallel processing
– of big volume data
– using distributed systems


Made popular by Google
– Has been invented to calculate the inverted search

index for web sites to keywords (Page Rank)

Not something special about MongoDB

Amazon Elastic MapReduce


Based on key-value-pairs


Prior to version 2.4 and the introduction of
the V8 JavaScript engine only one thread
per shard
The „Hello world“ of
Map/Reduce: Word Count
Word Count: Problem

There is a
map phase

There is a





a: 2
is: 2
map: 1

How often does
one word appear
in all documents?

mapreduce: 1
mongodb: 1
phase: 2

reduce: 1
there: 2
uses: 1
Word Count: Mapping

There is a
map phase

There is a




(mongodb, 1)
(uses, 1)
(mapreduce, 1)


(there, 1)
(is, 1)
(a, 1)
(map, 1)
(phase, 1)


(there, 1)
(is, 1)
(a, 1)
(reduce, 1)
(phase, 1)


Word Count: Group/Sort





There is a
map phase

There is a


(map, 1)
(phase, 1)


(there, 1)
(reduce, 1)

Word Count: Reduce





(a, [1, 1])
(is, [1, 1])
(map, [1])

There is a
map phase


(mapreduce, [1])
(mongodb, [1])
(phase, [1, 1])

There is a


(reduce, [1])
(there, [1, 1])
(uses, [1])

Word Count: Result






(a, [1, 1])
(is, [1, 1])
(map, [1])

a: 2
is: 2
map: 1

There is a
map phase


(mapreduce, [1])
(mongodb, [1])
(phase, [1, 1])

mapreduce: 1
mongodb: 1
phase: 2

There is a


(reduce, [1])
(there, [1, 1])
(uses, [1])

reduce: 1
there: 2
uses: 1
Word Count: In a nutshell





(a, [1, 1])
(is, [1, 1])
(map, [1])


a: 2
is: 2
map: 1



Transforms one keyvalue-pair in 0–N keyvalue-pairs

Reduces 0-N keyvalue-pairs into one
Map/Reduce: Overview




Shard 1

Iterates all


Shard 2

Shard n

reduce(k, values)

finalize(k, v)

Input = Output
Can run multiple
Word Count: Tweets
// Example: Twitter database with tweets
> db.tweets.findOne()
"_id" : ObjectId("4fb9fb91d066d657de8d6f38"),
"text" : "RT @RevRunWisdom: The bravest thing that men do is
love women #love",
"created_at" : "Thu Sep 02 18:11:24 +0000 2010",
"user" : {
"friends_count" : 0,
"profile_sidebar_fill_color" : "252429",
"screen_name" : "RevRunWisdom",
"name" : "Rev Run",
Word Count: map()
// Map function with simple data cleansing
map = function() {
this.text.split(' ').forEach(function(word) {
// Remove whitespace
word = word.replace(/s/g, "");
// Remove all non-word-characters
word = word.replace(/W/gm,"");
// Finally emit the cleaned up word
if(word != "") {
emit(word, 1)
Word Count: reduc()
// Reduce function
reduce = function(key, values) {
return values.length;
Word Count: Call
// Show the results using the console
> db.tweets.mapReduce(map, reduce, { out : { inline : 1 } } );
// Save the results to a collection
> db.tweets.mapReduce(map, reduce, { out : "tweets_word_count"} );
"result" : "tweets_word_count",
"timeMillis" : 19026,
"counts" : {
"input" : 53641,
"emit" : 559217,
"reduce" : 102057,
"output" : 131003
"ok" : 1,
Word Count: Result
// Top-10 of most common words in tweets
> db.tweets_word_count.find().sort({"value" : -1}).limit(10)



"Miley", "value" : 31 }
"mil", "value" : 31 }
"andthenihitmydougie", "value" : 30 }
"programa", "value" : 30 }
"Live", "value" : 29 }
"Super", "value" : 29 }
"cabelo", "value" : 29 }
"listen", "value" : 29 }
"Call", "value" : 28 }
"DA", "value" : 28 }
Typical use cases

Counting, Aggregating & Suming up
– Analyzing log entries & Generating log reports
– Generating an inversed index
– Substitute existing ETL processes


Counting unique values
– Counting the number of unique visitors of a website


Filtering, Parsing & Validation
– Filtering of user data
– Consolidation of user-generated data


– Data analysis using complex sorting

The Map/Reduce framework is very
versatile & powerful


Is implemented in JavaScript
– Necessity to write own map()- und reduce() functions in JavaScript
– Difficult to debug
– Performance is highly influenced by the JavaScript engine


Can be used for complex data analytics


Lots of overhead for simple aggregation tasks
– Suming up of data
– Average of data
– Grouping of data
Map/Reduce should be used as
ultima ratio!
Lab time!

Lab Nr. 04
Time box:
20 min
Aggregation Framework
SELECT customer_id, SUM(price)
FROM orders
WHERE active=true
GROUP BY customer_id
That‘s why!
SELECT customer_id, SUM(price)
FROM orders
WHERE active=true
of fields
GROUP BY customer_id
of data
The Aggregation Framework
Has been introduced to allow 90% of realworld aggregation use cases without using
the „big hammer“ Map/Reduce
• Framework of methods & operators

– Declarative
– No own JavaScript code needed
– Fixed set of methods and operators (but constantly under

development by MongoDB Inc.)


Implemented in C++
– Limitations on JavaScript Engine are avoided
– Better performance
The Aggregation Pipeline



sum: 337
avg: 24,53
min: 2
max : 99
The Aggregation Pipeline

Processes a stream of documents
– Input is a complete collection
– Output is a document containing the results


Succession of pipeline operators
– Each tier filters or transforms the documents
– Input documents of a tier are the output documents

of the previous tier
{ $pipeline_operator_1
{ $pipeline_operator_2
{ $pipeline_operator_3
{ $pipeline_operator_4

Pipeline Operators
// Old friends*
* from the query functionality

// New friends
Example: Tweets
// Example: Twitter database with tweets
> db.tweets.findOne()
"_id" : ObjectId("4fb9fb91d066d657de8d6f38"),
"text" : "RT @RevRunWisdom: The bravest thing that men do is
love women #love",
"created_at" : "Thu Sep 02 18:11:24 +0000 2010",
"user" : {
"friends_count" : 0,
"profile_sidebar_fill_color" : "252429",
"screen_name" : "RevRunWisdom",
"name" : "Rev Run",
// Show all german users
> db.tweets.aggregate(
{ $match : {"user.lang" : "de"}},
// Show all users with 0 to 10 followers
> db.tweets.aggregate(
{ $match : {"user.followers_count" : { $gte : 0, $lt : 10 } } }

> Filters documents
> Equivalent to .find()
// Sorting using one field
> db.tweets.aggregate(
{ $sort : {"user.friends_count" : -1} },
// Sorting using multiple fields
> db.tweets.aggregate(
{ $sort : {"user.lang" : 1, "user.time_zone" : 1,
"user.friends_count" : -1} },

> Sorts documents
> Equivalent to .sort()
// Limit the number of resulting documents to 3
> db.tweets.aggregate(
{ $sort : {"user.friends_count" : -1} },
{ $limit : 3 }

> Limits resulting documents
> Equivalent to .limit()
// Get the No.4-Twitterer according to number of friends
> db.tweets.aggregate(
{ $sort : {"user.friends_count" : -1} },
{ $skip : 3 },
{ $limit : 1 }

> Skips documents
> Equivalent to .skip()
$project I
// Limit the result document to only one field
> db.tweets.aggregate(
{ $project : {text : 1} },
// Remove _id
> db.tweets.aggregate(
{ $project : {_id: 0, text : 1} },

> Limits the fields in
resulting documents
$project II
// Rename a field
> db.tweets.aggregate(
{ $project : {_id: 0, content_of_tweet : "$text"} },
// Add a calculated field
> db.tweets.aggregate(
{ $project : {_id: 0, content_of_tweet : "$text", number_of_friends :
{$add: ["$user.friends_count", 10]} } },
$project III
// Add a subdocument
> db.tweets.aggregate(
{ $project : {_id: 0,
content_of_tweet : "$text",
user : {
name : "$",
number_of_friends : {$add: ["$user.friends_count", 10]}
} } );
$group I
// Grouping using a single field
> db.tweets.aggregate(
{ $group : {
_id : "$user.lang",
anzahl_tweets : {$sum : 1} }

> Groups documents
> Equivalent to GROUP BY in SQL
$group II
// Grouping using multiple fields
> db.tweets.aggregate(
{ $group : {
_id : { background_image:
language: "$user.lang" },
number_of_tweets: {$max : 1} }
$group III
// Grouping with multiple calculated fields
> db.tweets.aggregate(
{ $group : {
_id : "$user.lang",
number_of_tweets : {$sum : 1},
average_of_followers : {$avg : "$user.followers_count"},
minimum_of_followers : {$min : "$user.followers_count"},
maximum_of_followers : {$max : "$user.followers_count"} }
Group Aggregation Functions


$unwind I
// Unwind an array
> db.tweets.aggregate(
{ $project : {_id: 0, content_of_tweet : "$text",
mentioned_users : "$" } },
{ $skip : 18 },
{ $limit : 1 },
{ $unwind : "$mentioned_users" }

> Unwinds arrays and
creates one document per
value in the array
$unwind II
// Resulting document without $unwind
„content_of_tweet" : "RT @Philanthropy: How should
nonprofit groups measure their social-media efforts? A
new podcast from @afine",
„mentioned_users" : [
"Allison Fine"
$unwind III
// Resulting documents with $unwind
" content_of_tweet " : "RT @Philanthropy: How should
nonprofit groups measure their social-media efforts? A
new podcast from @afine",
" mentioned_users " : "Philanthropy"
" content_of_tweet " : "RT @Philanthropy: How should
nonprofit groups measure their social-media efforts? A
new podcast from @afine",
" mentioned_users " : "Allison Fine"
Best Practices
Place $match at the beginning of
the pipeline to reduce the
number of documents as soon as

Best Practice #1
Use $project to remove not
needed fields in the documents
as soon as possible!

Best Practice #2
When being placed at the beginning of the pipeline these
operators can make use of indices:

The above operators can equally use indices when placed
before these operators:


Best Practice #3
Mapping of MongoDB
to SQL

MongoDB Aggregation


















No equivalent operator
($unwind has somehow equivalent
functionality for embedded fields)
Example: Online shopping
cust_id: “sheldon1",
status: ‘purchased',
price: 105,69,
[ { sku: “nobel_price_replica",
qty: 3, price: 29,90 },
{ sku: “wheaton_voodoo_doll",
qty: 1, price: 15,99 } ]
Count all orders

MongoDB Aggregation

count FROM orders

db.orders.aggregate( [ {
$group: { _id: null,
count: { $sum: 1 } }
Average order price per customer

MongoDB Aggregation

SELECT cust_id, SUM(price)
AS total FROM orders
BY total

db.orders.aggregate( [ {
$group: { _id: "$cust_id",
total: { $sum: "$price" } } },
{ $sort: { total: 1 }
Sum up all orders over 250$

MongoDB Aggregation

SELECT cust_id, SUM(price) as db.orders.aggregate( [ {
$match: { status: 'A' } },
{ $group: { _id: "$cust_id",
FROM orders
WHERE status = ‘purchased'
total: { $sum: "$price" } } },
GROUP BY cust_id
{ $match: { total: { $gt: 250
HAVING total > 250
More examples
Lab time!

Lab Nr. 05
Time box:
20 min
Replication: High
Availability with MongoDB
Why do we need replication?

Hardware is unreliable and is doomed to


Do you want to be the person being called
at night to do a manual failover?


How about network latency?


Different use cases for your data
– “Regular” processing
– Data for analysis
– Data for backup
Life cycle of a replica set
Replica set – Create
Replica set – Initializing
Replica set – Node down
Replica set – Failover
Replica set – Recovery
Replica set – Back to normal
Roles & Configuration
Replica sets - Roles
Configuration I
> conf = {
_id : "mySet",
members : [
{_id : 0, host : "A”, priority : 3},
{_id : 1, host : "B", priority : 2},
{_id : 2, host : "C”},
{_id : 3, host : "D", hidden : true},
{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

> rs.initiate(conf)
Configuration II
> conf = {
_id : "mySet”,
members : [

Primary data center

{_id : 0, host : "A”, priority : 3},
{_id : 1, host : "B", priority : 2},
{_id : 2, host : "C”},
{_id : 3, host : "D", hidden : true},
{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

> rs.initiate(conf)
Configuration III
> conf = {
_id : "mySet”,
members : [

Secondary data center
(Default priority = 1)

{_id : 0, host : "A”, priority : 3},
{_id : 1, host : "B", priority : 2},
{_id : 2, host : "C”},
{_id : 3, host : "D", hidden : true},
{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

> rs.initiate(conf)
Configuration IV
> conf = {
_id : "mySet”,
members : [
{_id : 0, host : "A”, priority : 3},
{_id : 1, host : "B", priority : 2},

Analytical data e.g. for
Hadoop, Storm, BI, …

{_id : 2, host : "C”},
{_id : 3, host : "D", hidden : true},
{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

> rs.initiate(conf)
Configuration V
> conf = {
_id : "mySet”,
members : [
{_id : 0, host : "A”, priority : 3},
{_id : 1, host : "B", priority : 2},
{_id : 2, host : "C”},
{_id : 3, host : "D", hidden : true},
{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

> rs.initiate(conf)

Back-up node
Data consistency
Strong consistency
Eventual consistency
Write Concern
• Different levels of data consistency
• Acknowledged by
– Network
– MongoDB
– Journal
– Secondaries
– Tagging
Acknowledged by network
„Fire and forget“
Acknowledged by MongoDB
Wait for Error
Acknowledged by Journal
Wait for Journal Sync
Acknowledged by Secondaries
Wait for Replication
Tagging while writing data

Available since 2.0


Allows for fine granular control


Each node can have multiple tags
– tags: {dc: "ny"}
– tags: {dc: "ny", subnet: „192.168", rack: „row3rk7"}


Allows for creating Write Concern Rules (per
replica set)


Tags can be adapted without code changes
and restarts
Tagging - Example
_id : "mySet",
members : [
{_id : 0, host : "A", tags : {"dc": "ny"}},
{_id : 1, host : "B", tags : {"dc": "ny"}},
{_id : 2, host : "C", tags : {"dc": "sf"}},
{_id : 3, host : "D", tags : {"dc": "sf"}},
{_id : 4, host : "E", tags : {"dc": "cloud"}}],
settings : {
getLastErrorModes : {
allDCs : {"dc" : 3},
someDCs : {"dc" : 2}} }
> db.blogs.insert({...})
> db.runCommand({getLastError : 1, w : "someDCs"})
Acknowledged by Tagging
Wait for Replication (Tagging)
Configure the Write Concern
// Wait for network acknowledgement
> db.runCommand( { getLastError: 1, w: 0 } )
// Wait for error (Default)
> db.runCommand( { getLastError: 1, w: 1 } )
// Wait for journal sync
> db.runCommand( { getLastError: 1, w: 1, j: "true" } )
// Wait for replication
> db.runCommand( { getLastError: 1, w: “majority" } )
> db.runCommand( { getLastError: 1, w: 3 } ) // # of secondaries
Read Concerns

Only primary



Primary preferred



Only secondaries



Secondaries preferred



Nearest node


General: If more than one node is available, the
nearest node will be chosen (All modes except

Only primary

Primary preferred


Only secondaries


Secondaries preferred


Nearest node
Tagging while reading data

Allows for a more fine granular control
where data will be read from
– e.g. { "disk": "ssd", "use": "reporting" }


Can be combined with other read modes
– Except for mode „Only primary“
Configure the Read Concern
// Only primary
> cursor.setReadPref( “primary" )
// Primary preferred
> cursor.setReadPref( “primaryPreferred" )
// Only secondaries with tagging
> cursor.setReadPref( “secondary“, [ rack : 2 ] )

Read Concern must be configured
before using the cursor to read data!
MongoDB Operation
Maintenance & Upgrades

Zero downtime


Rolling upgrades and maintenance


Start with all secondaries
Step down the current primary
Primary as last one
Restore previous primary (if needed)

– rs.stepDown(<secs>)
– db.version()
– db.serverBuildInfo()
Replica set – 1 data center

– Data center
– Switch
– Power Supply


Possible errors:
– Failure of 2 nodes
– Power Supply
– Network
– Data Center


Automatic recovery
Replica set – 2 data center

Additional node for
data recovery


No writing to both
data center since
only one node in
data center No. 2
Replica set – 3 data center

Can recover from a
complete data center


Allows for usage of
w= { dc : 2 } to
guarantee writing to
2 data centers (via

Administration of the nodes


rs.initiate(<conf>) & rs.reconfig(<conf>)
rs.add(host:<port>) & rs.addArb(host:<port>)

Reconfiguration if a minority of the nodes
is not available
– rs.reconfig( cfg, { force: true} )
Best Practices
Best Practices

Uneven number of nodes


Adapt the write concern to your use case


Read from primary except for
– Geographical distribution
– Data analytics


Use logical names and not IP addresses for


Monitor the lags of the secondaries (e.g.
Lab time!

Lab Nr. 06
Time box:
20 min
Sharding: Scaling with
Visual representation of vertical scaling

1970 - 2000: Vertical Scaling
„Scale up“
Visual representation of horizontal scaling

Since 2000: Horizontal Scaling
„Scale out“
When to use Sharding?
Not enough disk space
The working set doesn‘t fit
into the memory
The needs for read-/write throughput
are higher than the I/O capabilities
Sharding MongoDB
Partitioning of data

The user needs to define a shard key


The shard key defines the distribution of
data across the shards
Partitioning of data into chunks

Initially all data is in one chunk


Maximum chunk size: 64 MB


MongoDB divides and distributes chunks
automatically once the maximum size is
One chunk contains data of a
certain value range
Chunks & Shards

A shard is one node in the cluster


A shard can be one single mongod or a
replica set
Metadata Management

Config Server
– Stores the value ranges of the chunks and their

– Number of config servers is 1 or 3 (Production: 3)
– Two Phase Commit
Balancing & Routing Service

mongos balances the data
in the cluster


mongos distributes data to
new nodes


mongos routes queries to
the correct shard or
collects results if data is
spread on multiple shards


No local data
Automatic Balancing

Balancing will be automatically done once
the number of chunks between shards hits a
certain threshold
Splitting of a chunk


Once a chunk hits the maximum size it will be split


Splitting is only a logical operation, no data needs to
be moved


If the splitting of a chunk results in a misbalance of
data, automatic rebalancing will be started
Sharding Infrastructure
MongoDB Auto Sharding

Minimal effort
– Usage of the same interfaces for mongod and



Easy configuration
– Enable sharding for a database

• sh.enableSharding("<database>")
– Shard a collection in a database

• sh.shardCollection("<database>.<collection>",
Configuration example
Example of a very simple cluster


Never use this in production!
– Only one config server (No fault tolerance)
– Shard is no replica set (No high availability)
– Only one mongos and one shard (No performance

Start the config server

// Start the config server (Default port 27019)
> mongod --configsvr
Start the mongos routing service

// Start the mongos router (Default port 27017)
> mongos --configdb <hostname>:27019

// When using 3 config servers
> mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>
Start the shard

// Start a shard with one mongod (Default port 27018)
> mongod --shardsvr

// Shard is not yet added to the cluster!
Add the shard

// Connect to mongos and add the shard
> mongo
> sh.addShard(‘<host>:27018’)
// When adding a replica set, you only need to add one of the nodes!
Check configuration

// Check if the shard has been added
> db.runCommand({ listShards:1 })
{ "shards" :
[ { "_id”: "shard0000”, "host”: ”<hostname>:27018” } ],
"ok" : 1
Configure sharding
// Enable the sharding for a database
> sh.enableSharding(“<dbname>”)

// Shard a collection using a shard key
> sh.shardCollection(“<dbname>.user”, { “name” : 1 } )

// Use a compound shard key
> sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})
Shard Key
Shard Key

The shard key can not be changed


The values of a shard key can not be


The shard key needs to be indexed


The uniqueness of the field _id is only
guaranteed within a shard


The size of a shard key is limited to 512
Considerations for the shard key

Cardinality of data
– The value range needs to be rather large. For example sharding

on the field loglevel with the 3 values error, warning, info
doesn‘t make sense.


Distribution of data
– Always strive for equal distribution of data throughout all



Patterns during reading and writing
– For example for log data using the timestamp as a shard key

can be useful if chronological very close data needs to be read
or written together.
Choices for the shard key

Single field
– If the value range is big enough and data is distributed almost



Compound fields
– Use this if a single field is not enough in respect to value range

and equal distribution


Hash based
– In general a random shard key is a good choice for equal

distribution of data
– For performance the shard key should be part of the queries
– Only available since 2.4
• sh.shardCollection( “", { a: "hashed" } )
Example: User
_id: 346,
username: “sheldinator”,
password: “238b8be8bd133b86d1e2ba191a94f549”,
first_name: “Sheldon”
last_name: “Cooper”
created_on: “Mon Apr 15 15:30:32 +0000 2013“
modified_on: “Thu Apr 18 08:11:23 +0000 2013“

Which shard key would
you choose and why?
Example: Log data
log_type: “error”

// Possible values “error, “warn”, “info“

application: “JBoss v. 4.2.3”
message: “Fatal error. Application will quit.”
created_on: “Mon Apr 15 15:38:05 +0000 2013“

Which shard key would
you choose and why?
Routing of queries
Possible types of queries

Exact queries
– Data is exactly on one shard


Distributed query
– Data is distributed on different shards


Distributed query with sorting
– Data is distributed on different shards and needs to

be sorted
Exact queries
1. mongos receives the query
from the client
2. Query is routed to the shard
with the data
3. Shard returns the data
4. mongos returns the data to
the client
Distributed queries
1. mongos receives the query
from the client
2. mongos routes the query to
all shards
3. Shards return the data
4. mongos returns the data to
the client
Distributed queries with sorting
1. mongos receives the query
from the client
2. mongos routes the query to
all shards
3. Execute the query and local
4. Shards return sorted data
5. mongos sorts the data
6. mongos returns the sorted
data to the client
Lab time!

Lab Nr. 07
Time box:
20 min
Still want moar?

Contenu connexe


mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교Woo Yeong Choi
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsSteven Francia
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningOSSCube
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDBCésar Trigo
MySQL Server Settings Tuning
MySQL Server Settings TuningMySQL Server Settings Tuning
MySQL Server Settings Tuningguest5ca94b
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱PgDay.Seoul
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDBNAVER D2
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster
[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster
[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, ClusterJi-Woong Choi

Tendances (20)

mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuning
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
Schema Design
Schema DesignSchema Design
Schema Design
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
MySQL Server Settings Tuning
MySQL Server Settings TuningMySQL Server Settings Tuning
MySQL Server Settings Tuning
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
Optimizing MySQL queries
Optimizing MySQL queriesOptimizing MySQL queries
Optimizing MySQL queries
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster
[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster
[오픈소스컨설팅]Day #2 MySQL Tuning, Replication, Cluster

En vedette

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationMatteo Dell'Amico
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMongoDB
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy appdoc2005
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...MongoDB
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingLucas Abrantes
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Kai Zhao
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax

En vedette (20)

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every Organization
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewing
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Beer industry
Beer industry Beer industry
Beer industry

Similaire à MongoDB for Coder Training (Coding Serbia 2013)

Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDBDoThinger
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDBMongoDB
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data ModelingDATAVERSITY
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongoMichael Bright
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
MongoDB & NoSQL 101
 MongoDB & NoSQL 101 MongoDB & NoSQL 101
MongoDB & NoSQL 101Jollen Chen
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlTO THE NEW | Technology
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling rogerbodamer
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC

Similaire à MongoDB for Coder Training (Coding Serbia 2013) (20)

Mongodb intro
Mongodb introMongodb intro
Mongodb intro
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
MongoDB & NoSQL 101
 MongoDB & NoSQL 101 MongoDB & NoSQL 101
MongoDB & NoSQL 101
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign

Plus de Uwe Printz

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesUwe Printz
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Uwe Printz
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)Uwe Printz
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererUwe Printz
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBUwe Printz

Plus de Uwe Printz (20)

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
Apache Spark
Apache SparkApache Spark
Apache Spark
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & Databases
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-Programmierer
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB


Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

MongoDB for Coder Training (Coding Serbia 2013)

  • 2. About me Big Data Nerd Hadoop Trainer MongoDB Author Photography Enthusiast Travelpirate
  • 3. About us is a bunch of… Big Data Nerds Agile Ninjas Continuous Delivery Gurus Join us! Enterprise Java Specialists Performance Geeks
  • 4. Agenda I 1. Introduction to NoSQL & MongoDB 2. Data manipulation: Learn how to CRUD with MongoDB 3. Indexing: Speed up your queries with MongoDB 4. MapReduce: Data aggregation with MongoDB
  • 5. Agenda 5. Aggregation Framework: Data aggregation done the MongoDB way 6. Replication: High Availability with MongoDB 7. Sharding: Scaling with MongoDB
  • 7. And please… If you have questions, please share them with us!
  • 8. And now start your downloads… Lab files:
  • 10. NoSQL
  • 11. Classification of NoSQL Key-Value Stores K V K V K V K 1 V K Column Stores V Graph Databases 1 1 1 1 1 1 1 1 1 1 Document Stores _id _id _id
  • 14. The classic definition • The 3 V’s of Big Data Volume Velocity •Variety
  • 15. «Big Data» != Hadoop
  • 24. The CAP Theorem Availability a guarantee that every request receives a response Consistency all nodes see the same data at the same time Partition Tolerance failure of single nodes doesn‘t effect the overall system
  • 25. Overview of NoSQL systems Availability a guarantee that every request receives a response C Partition onsistency Tolerance all nodes see the same data at the same time failure of single nodes doesn‘t effect the overall system
  • 28. ACID vs. BASE 1983 Atomicity RDBMS Consistency Isolation Durability
  • 29. ACID vs. BASE ACID is a good concept but it is not a written law!
  • 30. ACID vs. BASE Basically Available Soft State 2008 NoSQL Eventually consistent
  • 31. ACID vs. BASE ACID BASE - - Strong consistency Isolation & Transactions Two-Phase-Commit Complex Development More reliable Eventual consistency Highly Available "Fire-and-forget" Eases development Faster
  • 33. MongoDB is a… • document • open source • highly performant • flexible • scalable • highly available • feature-rich …database
  • 34. Document Database • Not PDF, Word, etc. … JSON!
  • 35. Open Source Database • MongoDB is a open source project • Available on GitHub – • Uses the AGPL Lizenz • Started and sponsored by MongoDB Inc. (prior: 10gen) • Commercial version and support available • Join the crowd! –
  • 37. Flexible Schema RDBMS MongoDB { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  • 38. Scalability Auto Sharding • Increase capacity as you go • Commodity and cloud architectures • Improved operational simplicity and cost visibility
  • 39. High Availability • Automated replication and failover • Multi-data center support • Improved operational simplicity (e.g., HW swaps) • Data durability and consistency
  • 45. Driver & Shell Drivers are available for almost all popular programming languages and frameworks Java JavaScript Python Shell to interact with the database Ruby Perl Haskell > db.collection.insert({product:“MongoDB”, type:“Document Database”}) > > db.collection.findOne() { “_id” : ObjectId(“5106c1c2fc629bfe52792e86”), “product” : “MongoDB” “type” : “Document Database” }
  • 46. NoSQL Trends Google Search LinkedIn Job Skills MongoDB Competitor 1 Competitor 2 Competitor 3 Competitor 4 Competitor 5 MongoDB Competitor 2 Competitor 1 Competitor 4 Competitor 3 All Others Jaspersoft Big Data Index Trends Top Job Trends Direct Real-Time Downloads MongoDB Competitor 1 Competitor 2 Competitor 3 1.HTML 5 2.MongoDB 3.iOS 4.Android 5.Mobile Apps 6.Puppet 7.Hadoop 8.jQuery 9.PaaS 10.Social Media
  • 48. Terminology RDBMS Table / View Row Index Join Foreign Key Partition MongoDB ➜ ➜ ➜ ➜ ➜ ➜ Collection Document Index Embedded document Referenced document Shard
  • 51. Schema design for the blog
  • 52. Let’s have a look…
  • 53. Create a database // Show all databases > show dbs digg 0.078125GB enron 1.49951171875GB // Switch to a database > use blog // Show all databases again > show dbs digg 0.078125GB enron 1.49951171875GB
  • 54. Create a collection I // Show all collections > show collections // Insert a user > db.user.insert( { name : “Sheldon“, mail : ““ } ) No feedback about the result of the insert, use: db.runCommand( { getLastError: 1} )
  • 55. Create a collection II // Show all collections > show collections system.indexes user // Show all databases > show dbs blog 0.0625GB digg 0.078125GB enron 1.49951171875GB Databases and collections are automatically created during the first insert operation!
  • 56. Read from a collection // Show the first document > db.user.findOne() { "_id" : ObjectId("516684a32f391f3c2fcb80ed"), "name" : "Sheldon", "mail" : "" } // Show all documents of a collection > db.user.find() { "_id" : ObjectId("516684a32f391f3c2fcb80ed"), "name" : "Sheldon", "mail" : "" }
  • 57. Find documents // Find a specific document > db.user.find( { name : ”Penny” } ) { "_id" : ObjectId("5166a9dc2f391f3c2fcb80f1"), "name" : "Penny", "mail" : "" } // Show only certain fields of the document > db.user.find( { name : ”Penny” }, {_id: 0, mail : 1} ) { "mail" : "" }
  • 58. _id • _id is the primary key in MongoDB • _id is created automatically • If not specified differently, it‘s type is ObjectId • _id can be specified by the user during the insert of documents, but needs to be unique (and can not be edited afterwards)
  • 59. ObjectId • A ObjectId is a special 12 Byte value • It‘s uniqueness in the whole cluster is guaranteed as following: ObjectId("50804d0bd94ccab2da652599") |-------------||---------||-----||----------| ts mac pid inc
  • 60. Cursor // Use a cursor with find() > var myCursor = db.user.find( ) // Get the next document > var myDocument = myCursor.hasNext() ? : null; > if (myDocument) { printjson(myDocument.mail); } // Show all other documents > myCursor.forEach(printjson); By default the shell displays 20 documents
  • 61. Logical operators // Find documents using OR > db.user.find( {$or : [ { name : “Sheldon“ }, { mail : } ] }) // Find documents using AND > db.user.find( {$and : [ { name : “Sheldon“ }, { mail : } ] })
  • 62. Manipulating results // Sort documents > db.user.find().sort( { name : 1 } ) // Aufsteigend > db.user.find().sort( { name : -1 } ) // Absteigend // Limit the number of documents > db.user.find().limit(3) // Skip documents > db.user.find().skip(2) // Combination of both methods > db.user.find().skip(2).limit(3)
  • 63. Updating documents I // Updating only the mail address (How not to do…) > db.user.update( { name : “Sheldon“ }, { mail : ““ } ) // Result of the update operation db.user.findOne() { "_id" : ObjectId("516684a32f391f3c2fcb80ed"), "mail" : "" } Be careful when updating documents!
  • 64. Deleting documents // Deleting a document > db.user.remove( { mail : ““ } ) // Deleting all documents in a collection > db.user.remove() // Use a condition to delete documents > db.user.remove( { mail : /.*$/ } ) // Delete only the first document using a condition > db.user.remove( { mail : /.*.com$/ }, true )
  • 65. Updating documents II // Updating only the mail address (This time for real) > db.user.update( { name : “Sheldon“ }, { $set : { mail : ““ }}) // Show the result of the update operation db.user.find(name : “Sheldon“) { "_id" : ObjectId("5166ba122f391f3c2fcb80f5"), "mail" : "", "name" : "Sheldon" }
  • 66. Adding to arrays // Adding a array > db.user.update( {name : “Sheldon“ }, { $set : {enemies : [ { name : “Wil Wheaton“ }, { name : “Barry Kripke“ } ] }}) // Adding a value to the array > db.user.update( { name : “Sheldon“}, { $push : {enemies : { name : “Leslie Winkle“} }})
  • 67. Deleting from arrays // Deleting a value from an array > db.user.update( { name : “Sheldon“ }, {$pull : {enemies : {name : “Barry Kripke“ } }}) // Deleting of a complete array > db.user.update( {name : “Sheldon“}, {$unset : {enemies : 1}} )
  • 68. Adding a subdocument // Adding a subdocument to an existing document > db.user.update( { name : “Sheldon“}, { $set : { mother :{ name : “Mary Cooper“, residence : “Galveston, Texas“, religion : “Evangelical Christian“ }}}) { "_id" : ObjectId("5166cf162f391f3c2fcb80f7"), "mail" : "", "mother" : { "name" : "Mary Cooper", "residence" : "Galveston, Texas", "religion" : "Evangelical Christian" }, "name" : "Sheldon" }
  • 69. Querying subdocuments // Finding out the name of the mother > db.user.find( { name : “Sheldon“}, {““ : 1 } ) { "_id" : ObjectId("5166cf162f391f3c2fcb80f7"), "mother" : { "name" : "Mary Cooper" } } Compound field names need to be in “…“!
  • 70. Overview of all update operators For fields: $inc $rename $set $unset Bitwise operation: $bit Isolation: $isolated For arrays: $addToSet $pop $pullAll $pull $pushAll $push $each (Modifier) $slice (Modifier) $sort (Modifier)
  • 72. Lab time! Lab Nr. 02 Time box: 20 min
  • 74. What is an index?
  • 76. 1 2 3 4 5 6 Find Nr. 7 in the chained list! 7
  • 77. 4 2 1 6 3 5 Find Nr. 7 in a tree! 7
  • 78. Indices in MongoDB are B-Trees
  • 79. Find, Insert and Delete Operations: O(log(n))
  • 80. Missing or non-optimal indices are the singlemost avoidable performance issue
  • 81. How do I create an index? // Create a non-existing index for a field >{ main_ingredient: 1 }) // Make sure there is an index on the field >{ main_ingredient: 1 }) * 1 for ascending, -1 for descending
  • 82. What can be indexed? // Multiple fields (Compound Key Indexes) >{ main_ingredient: 1, calories: -1 }) // Arrays with values (Multikey Indexes) { name: 'Chicken Noodle Soup’, ingredients : ['chicken', 'noodles'] } >{ ingredients: 1 })
  • 83. What can be indexed? // Subdocuments { name : 'Apple Pie', contributor: { name: 'Joe American', id: 'joea123' } }{ '': 1 }){ 'contributor': 1 })
  • 84. How to maintain indices? // List all indices of a collection > > // Drop an index >{ ingredients: 1 }) // Drop and recreate all indices of a collection
  • 85. More options • Unique Index – Allows only unique values in the indexed field(s) • Sparse Index – For fields that are not available in all documents • Geospatial Index – For modelling 2D and 3D geospatial indices • TTL Collections – Are automatically deleted after x seconds
  • 86. Unique Index // Make sure the name of a recipe is unique > { name: 1 }, { unique: true } ) // Force an index on a collection with non-unique values // Duplicates will be deleted more or less randomly! > { name: 1 }, { unique: true, dropDups: true } ) * dropDups should be used only with caution!
  • 87. Sparse Index // Only documents with the field calories will be indexed > { calories: -1 }, { sparse: true } ) // Combination with unique index is possible > { name: 1 , calories: -1 }, { unique: true, sparse: true } ) * Missing fields will be saved as null in the index!
  • 88. Geospatial Index // Add longitude and altitude { name: ‚codecentric Frankfurt’, loc: [ 50.11678, 8.67206] } // Index the 2D coordinates > db.locations.ensureIndex( { loc : '2d' } ) // Find locations near codecentric Frankfurt > db.locations.find({ loc: { $near: [ 50.1, 8.7 ] } })
  • 89. TTL Collections // Documents need a field of type BSON UTC { ' submitted_date ' : ISODate('2012-10-12T05:24:07.211Z'), … } // Documents will be deleted automatically by a daemon process // after 'expireAfterSeconds' > { submitted_date: 1 }, { expireAfterSeconds: 3600 } )
  • 90. Limitations of indices • Collections can‘t have more than 64 indices • Index keys are not allowed to be larger than 1024 Byte • The name of an index (including name space) must be less than 128 character • Queries can only make use of one index – Exception: Queries using $or • Indices are tried to be kept in-memory • Indices slow down the writing of data
  • 92. Best practice 1. Identify slow queries 2. Find out more about the slow queries using explain() 3. Create appropriate indices on the fields being queried 4. Optimize the query taking the available indices into account
  • 93. 1. Identify slow queries > db.setProfilingLevel( n , slowms=100ms ) n=0: Profiler off n=1: Log all operations slower than slowms n=2: Log all operations > db.system.profile.find() * The collection profile is a capped collection with a limited number of entries
  • 94. 2. Usage of explain() > { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BasicCursor" , "n" : 42, "nscannedObjects” : 53641 "nscanned" : 53641, ... "millis" : 252, ... }
  • 95. 2. Metrics of the execution plan I • Cursor – The type of the cursor: BasicCursor means no idex has been used • n – The number of matched documents • nscannedObjects – The number of scanned documents • nscanned – The number of scanned entries (Index entries or documents)
  • 96. 2. Metrics of the execution plan II • millis – Execution time of the query • Complete reference can be found here – Optimize for ℎ =1
  • 97. 3. Create appropriate indices on the fields being queried
  • 98. 4. Optimize queries taking the available indices into account // Using the following index… > db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 }) // … these queries and sorts can make use of the index > db.collection.find( ).sort({ a:1 }) > db.collection.find( ).sort({ a:1, b:1 }) > db.collection.find({ a:4 }).sort({ a:1, b:1 }) > db.collection.find({ b:5 }).sort({ a:1, b:1 })
  • 99. 4. Optimize queries taking the available indices into account // Using the following index… > db.collection.ensureIndex({ a:1, b:1, c:1, d:1 }) // … the these queries can not make use of it > db.collection.find( ).sort({ b: 1 }) > db.collection.find({ b: 5 }).sort({ b: 1 })
  • 100. 4. Optimize queries taking the available indices into account // Using the following index… >{ main_ingredient: 1, name: 1 }) // … this query can be complete satisfied using the index! > { main_ingredient: 'chicken’ }, { _id: 0, name: 1 } ) // The metric indexOnly using explain() verifies this: > { main_ingredient: 'chicken' }, { _id: 0, name: 1 } ).explain() { "indexOnly": true, }
  • 101. Use specific indices // Tell MongoDB explicitly which index to use >{ calories: { $lt: 1000 } } ).hint({ _id: 1 }) // Switch the usage of idices completely off (e.g. for performance // measurements) > { calories: { $lt: 1000 } } ).hint({ $natural: 1 })
  • 103. Using multiple indices // MongoDB can only use one index per query! > db.collection.ensureIndex({ a: 1 }) > db.collection.ensureIndex({ b: 1 }) // For this query only one of those two indices can be used > db.collection.find({ a: 3, b: 4 })
  • 104. Compound indices // Compound indices are often very efficient! > db.collection.ensureIndex({ a: 1, b: 1, c: 1 }) // But only if the query is a prefix of the index... // This query can make use of the index db.collection.find({ c: 2 }) // …but this query can db.collection.find({ a: 3, b: 5 })
  • 105. Indices with low selectivity // The following field has only few distinct values > db.collection.distinct('status’) [ 'new', 'processed' ] // A index on this field is not the best idea… > db.collection.ensureIndex({ status: 1 }) > db.collection.find({ status: 'new' }) // Better use a adequate compound index with other fields > db.collection.ensureIndex({ status: 1, created_at: -1 }) > db.collection.find( { status: 'new' } ).sort({ created_at: -1 })
  • 106. Regular expressions & Indices > db.users.ensureIndex({ username: 1 }) // Left-bound regular expressions can make usage of this index > db.users.find({ username: /^joe smith/ }) // But not queries with regular expressions in general… > db.users.find({username: /smith/ }) // Also not case-insensitive queries… > db.users.find({ username: /^Joe/i })
  • 107. Negations & Indices // Negations can not make use of indices > db.things.ensureIndex({ x: 1 }) // e.g. queries using not equal > db.things.find({ x: { $ne: 3 } }) // …or queries with not in > db.things.find({ x: { $nin: [2, 3, 4 ] } }) // …or queries with the $not operator > db.people.find({ name: { $not: 'John Doe' } })
  • 108. Lab time! Lab Nr. 03 Time box: 20 min
  • 110. What is Map/Reduce? • Programming model coming from functional languages • Framework for – parallel processing – of big volume data – using distributed systems • Made popular by Google – Has been invented to calculate the inverted search index for web sites to keywords (Page Rank) –
  • 111. Basics • Not something special about MongoDB – – – – Hadoop Disco Amazon Elastic MapReduce … • Based on key-value-pairs • Prior to version 2.4 and the introduction of the V8 JavaScript engine only one thread per shard
  • 112. The „Hello world“ of Map/Reduce: Word Count
  • 113. Word Count: Problem INPUT { MongoDB uses MapReduce } { There is a map phase } { There is a reduce phase } MAPPER GROUP/SORT REDUCER OUTPUT a: 2 is: 2 map: 1 Problem: How often does one word appear in all documents? mapreduce: 1 mongodb: 1 phase: 2 reduce: 1 there: 2 uses: 1
  • 114. Word Count: Mapping INPUT { MongoDB uses MapReduce } { There is a map phase } { There is a reduce phase } MAPPER GROUP/SORT (doc1, “…“) (mongodb, 1) (uses, 1) (mapreduce, 1) (doc2, “…“) (there, 1) (is, 1) (a, 1) (map, 1) (phase, 1) (doc3, “…“) (there, 1) (is, 1) (a, 1) (reduce, 1) (phase, 1) REDUCER OUTPUT
  • 115. Word Count: Group/Sort INPUT { MongoDB uses MapReduce } MAPPER GROUP/SORT REDUCER a-l (doc1, “…“) m-q { There is a map phase } { There is a reduce phase } (doc2, “…“) (map, 1) (phase, 1) r-z (doc3, “…“) (there, 1) (reduce, 1) OUTPUT
  • 116. Word Count: Reduce INPUT { MongoDB uses MapReduce } MAPPER GROUP/SORT REDUCER (doc1, “…“) (a, [1, 1]) (is, [1, 1]) (map, [1]) { There is a map phase } (doc2, “…“) (mapreduce, [1]) (mongodb, [1]) (phase, [1, 1]) { There is a reduce phase } (doc3, “…“) (reduce, [1]) (there, [1, 1]) (uses, [1]) OUTPUT
  • 117. Word Count: Result INPUT { MongoDB uses MapReduce } MAPPER GROUP/SORT REDUCER OUTPUT (doc1, “…“) (a, [1, 1]) (is, [1, 1]) (map, [1]) a: 2 is: 2 map: 1 { There is a map phase } (doc2, “…“) (mapreduce, [1]) (mongodb, [1]) (phase, [1, 1]) mapreduce: 1 mongodb: 1 phase: 2 { There is a reduce phase } (doc3, “…“) (reduce, [1]) (there, [1, 1]) (uses, [1]) reduce: 1 there: 2 uses: 1
  • 118. Word Count: In a nutshell INPUT { MongoDB uses MapReduce } MAPPER GROUP/SORT (doc1, “…“) REDUCER (a, [1, 1]) (is, [1, 1]) (map, [1]) OUTPUT a: 2 is: 2 map: 1 map() reduce() Transforms one keyvalue-pair in 0–N keyvalue-pairs Reduces 0-N keyvalue-pairs into one key-value-pair
  • 119. Map/Reduce: Overview MongoDB Data group(k) map() emit(k,v) Shard 1 Iterates all documents sort(k) Shard 2 … Shard n reduce(k, values) finalize(k, v) • • Input = Output Can run multiple times
  • 120. Word Count: Tweets // Example: Twitter database with tweets > db.tweets.findOne() { "_id" : ObjectId("4fb9fb91d066d657de8d6f38"), "text" : "RT @RevRunWisdom: The bravest thing that men do is love women #love", "created_at" : "Thu Sep 02 18:11:24 +0000 2010", … "user" : { "friends_count" : 0, "profile_sidebar_fill_color" : "252429", "screen_name" : "RevRunWisdom", "name" : "Rev Run", }, …
  • 121. Word Count: map() // Map function with simple data cleansing map = function() { this.text.split(' ').forEach(function(word) { // Remove whitespace word = word.replace(/s/g, ""); // Remove all non-word-characters word = word.replace(/W/gm,""); // Finally emit the cleaned up word if(word != "") { emit(word, 1) } }); };
  • 122. Word Count: reduc() // Reduce function reduce = function(key, values) { return values.length; };
  • 123. Word Count: Call // Show the results using the console > db.tweets.mapReduce(map, reduce, { out : { inline : 1 } } ); // Save the results to a collection > db.tweets.mapReduce(map, reduce, { out : "tweets_word_count"} ); { "result" : "tweets_word_count", "timeMillis" : 19026, "counts" : { "input" : 53641, "emit" : 559217, "reduce" : 102057, "output" : 131003 }, "ok" : 1, }
  • 124. Word Count: Result // Top-10 of most common words in tweets > db.tweets_word_count.find().sort({"value" : -1}).limit(10) { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : "Miley", "value" : 31 } "mil", "value" : 31 } "andthenihitmydougie", "value" : 30 } "programa", "value" : 30 } "Live", "value" : 29 } "Super", "value" : 29 } "cabelo", "value" : 29 } "listen", "value" : 29 } "Call", "value" : 28 } "DA", "value" : 28 }
  • 126. Typical use cases • Counting, Aggregating & Suming up – Analyzing log entries & Generating log reports – Generating an inversed index – Substitute existing ETL processes • Counting unique values – Counting the number of unique visitors of a website • Filtering, Parsing & Validation – Filtering of user data – Consolidation of user-generated data • Sorting – Data analysis using complex sorting
  • 127. Summary • The Map/Reduce framework is very versatile & powerful • Is implemented in JavaScript – Necessity to write own map()- und reduce() functions in JavaScript – Difficult to debug – Performance is highly influenced by the JavaScript engine • Can be used for complex data analytics • Lots of overhead for simple aggregation tasks – Suming up of data – Average of data – Grouping of data
  • 128. Map/Reduce should be used as ultima ratio!
  • 129. Lab time! Lab Nr. 04 Time box: 20 min
  • 131. Why? SELECT customer_id, SUM(price) FROM orders WHERE active=true GROUP BY customer_id
  • 132. That‘s why! SELECT customer_id, SUM(price) FROM orders Calculation WHERE active=true of fields GROUP BY customer_id Grouping of data
  • 133. The Aggregation Framework Has been introduced to allow 90% of realworld aggregation use cases without using the „big hammer“ Map/Reduce • Framework of methods & operators • – Declarative – No own JavaScript code needed – Fixed set of methods and operators (but constantly under development by MongoDB Inc.) • Implemented in C++ – Limitations on JavaScript Engine are avoided – Better performance
  • 135. The Aggregation Pipeline • Processes a stream of documents – Input is a complete collection – Output is a document containing the results • Succession of pipeline operators – Each tier filters or transforms the documents – Input documents of a tier are the output documents of the previous tier
  • 136. Call db.tweets.aggregate( { $pipeline_operator_1 { $pipeline_operator_2 { $pipeline_operator_3 { $pipeline_operator_4 ... ); }, }, }, },
  • 137. Pipeline Operators // Old friends* $match $sort $limit $skip * from the query functionality // New friends $project $group $unwind
  • 138. Example: Tweets // Example: Twitter database with tweets > db.tweets.findOne() { "_id" : ObjectId("4fb9fb91d066d657de8d6f38"), "text" : "RT @RevRunWisdom: The bravest thing that men do is love women #love", "created_at" : "Thu Sep 02 18:11:24 +0000 2010", … "user" : { "friends_count" : 0, "profile_sidebar_fill_color" : "252429", "screen_name" : "RevRunWisdom", "name" : "Rev Run", }, …
  • 139. $match // Show all german users > db.tweets.aggregate( { $match : {"user.lang" : "de"}}, ); // Show all users with 0 to 10 followers > db.tweets.aggregate( { $match : {"user.followers_count" : { $gte : 0, $lt : 10 } } } ); > Filters documents > Equivalent to .find()
  • 140. $sort // Sorting using one field > db.tweets.aggregate( { $sort : {"user.friends_count" : -1} }, ); // Sorting using multiple fields > db.tweets.aggregate( { $sort : {"user.lang" : 1, "user.time_zone" : 1, "user.friends_count" : -1} }, ); > Sorts documents > Equivalent to .sort()
  • 141. $limit // Limit the number of resulting documents to 3 > db.tweets.aggregate( { $sort : {"user.friends_count" : -1} }, { $limit : 3 } ); > Limits resulting documents > Equivalent to .limit()
  • 142. $skip // Get the No.4-Twitterer according to number of friends > db.tweets.aggregate( { $sort : {"user.friends_count" : -1} }, { $skip : 3 }, { $limit : 1 } ); > Skips documents > Equivalent to .skip()
  • 143. $project I // Limit the result document to only one field > db.tweets.aggregate( { $project : {text : 1} }, ); // Remove _id > db.tweets.aggregate( { $project : {_id: 0, text : 1} }, ); > Limits the fields in resulting documents
  • 144. $project II // Rename a field > db.tweets.aggregate( { $project : {_id: 0, content_of_tweet : "$text"} }, ); // Add a calculated field > db.tweets.aggregate( { $project : {_id: 0, content_of_tweet : "$text", number_of_friends : {$add: ["$user.friends_count", 10]} } }, );
  • 145. $project III // Add a subdocument > db.tweets.aggregate( { $project : {_id: 0, content_of_tweet : "$text", user : { name : "$", number_of_friends : {$add: ["$user.friends_count", 10]} } } } );
  • 146. $group I // Grouping using a single field > db.tweets.aggregate( { $group : { _id : "$user.lang", anzahl_tweets : {$sum : 1} } } ); > Groups documents > Equivalent to GROUP BY in SQL
  • 147. $group II // Grouping using multiple fields > db.tweets.aggregate( { $group : { _id : { background_image: "$user.profile_use_background_image", language: "$user.lang" }, number_of_tweets: {$max : 1} } } );
  • 148. $group III // Grouping with multiple calculated fields > db.tweets.aggregate( { $group : { _id : "$user.lang", number_of_tweets : {$sum : 1}, average_of_followers : {$avg : "$user.followers_count"}, minimum_of_followers : {$min : "$user.followers_count"}, maximum_of_followers : {$max : "$user.followers_count"} } } );
  • 150. $unwind I // Unwind an array > db.tweets.aggregate( { $project : {_id: 0, content_of_tweet : "$text", mentioned_users : "$" } }, { $skip : 18 }, { $limit : 1 }, { $unwind : "$mentioned_users" } ); > Unwinds arrays and creates one document per value in the array
  • 151. $unwind II // Resulting document without $unwind { „content_of_tweet" : "RT @Philanthropy: How should nonprofit groups measure their social-media efforts? A new podcast from @afine", „mentioned_users" : [ "Philanthropy", "Allison Fine" ] }
  • 152. $unwind III // Resulting documents with $unwind { " content_of_tweet " : "RT @Philanthropy: How should nonprofit groups measure their social-media efforts? A new podcast from @afine", " mentioned_users " : "Philanthropy" }, { " content_of_tweet " : "RT @Philanthropy: How should nonprofit groups measure their social-media efforts? A new podcast from @afine", " mentioned_users " : "Allison Fine" }
  • 154. Place $match at the beginning of the pipeline to reduce the number of documents as soon as possible! Best Practice #1
  • 155. Use $project to remove not needed fields in the documents as soon as possible! Best Practice #2
  • 156. When being placed at the beginning of the pipeline these operators can make use of indices: $match $sort $limit $skip The above operators can equally use indices when placed before these operators: $project $unwind $group Best Practice #3
  • 158. Mapping SQL MongoDB Aggregation WHERE $match GROUP BY $group HAVING $match SELECT $project ORDER BY $sort LIMIT $limit SUM() $sum COUNT() $sum join No equivalent operator ($unwind has somehow equivalent functionality for embedded fields)
  • 159. Example: Online shopping { cust_id: “sheldon1", ord_date: ISODate("2013-04-018T19:38:11.102Z"), status: ‘purchased', price: 105,69, items: [ { sku: “nobel_price_replica", qty: 3, price: 29,90 }, { sku: “wheaton_voodoo_doll", qty: 1, price: 15,99 } ] }
  • 160. Count all orders SQL MongoDB Aggregation SELECT COUNT(*) AS count FROM orders db.orders.aggregate( [ { $group: { _id: null, count: { $sum: 1 } } }])
  • 161. Average order price per customer SQL MongoDB Aggregation SELECT cust_id, SUM(price) AS total FROM orders GROUP BY cust_id ORDER BY total db.orders.aggregate( [ { $group: { _id: "$cust_id", total: { $sum: "$price" } } }, { $sort: { total: 1 } }])
  • 162. Sum up all orders over 250$ SQL MongoDB Aggregation SELECT cust_id, SUM(price) as db.orders.aggregate( [ { $match: { status: 'A' } }, total { $group: { _id: "$cust_id", FROM orders WHERE status = ‘purchased' total: { $sum: "$price" } } }, GROUP BY cust_id { $match: { total: { $gt: 250 HAVING total > 250 }}}])
  • 164. Lab time! Lab Nr. 05 Time box: 20 min
  • 166. Why do we need replication? • Hardware is unreliable and is doomed to fail! • Do you want to be the person being called at night to do a manual failover? • How about network latency? • Different use cases for your data – “Regular” processing – Data for analysis – Data for backup
  • 167. Life cycle of a replica set
  • 168. Replica set – Create
  • 169. Replica set – Initializing
  • 170. Replica set – Node down
  • 171. Replica set – Failover
  • 172. Replica set – Recovery
  • 173. Replica set – Back to normal
  • 175. Replica sets - Roles
  • 176. Configuration I > conf = { _id : "mySet", members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf)
  • 177. Configuration II > conf = { _id : "mySet”, members : [ Primary data center {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf)
  • 178. Configuration III > conf = { _id : "mySet”, members : [ Secondary data center (Default priority = 1) {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf)
  • 179. Configuration IV > conf = { _id : "mySet”, members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, Analytical data e.g. for Hadoop, Storm, BI, … {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf)
  • 180. Configuration V > conf = { _id : "mySet”, members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Back-up node
  • 184. Write Concern • Different levels of data consistency • Acknowledged by – Network – MongoDB – Journal – Secondaries – Tagging
  • 187. Acknowledged by Journal Wait for Journal Sync
  • 189. Tagging while writing data • Available since 2.0 • Allows for fine granular control • Each node can have multiple tags – tags: {dc: "ny"} – tags: {dc: "ny", subnet: „192.168", rack: „row3rk7"} • Allows for creating Write Concern Rules (per replica set) • Tags can be adapted without code changes and restarts
  • 190. Tagging - Example { _id : "mySet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}}], settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}} } } > db.blogs.insert({...}) > db.runCommand({getLastError : 1, w : "someDCs"})
  • 191. Acknowledged by Tagging Wait for Replication (Tagging)
  • 192. Configure the Write Concern // Wait for network acknowledgement > db.runCommand( { getLastError: 1, w: 0 } ) // Wait for error (Default) > db.runCommand( { getLastError: 1, w: 1 } ) // Wait for journal sync > db.runCommand( { getLastError: 1, w: 1, j: "true" } ) // Wait for replication > db.runCommand( { getLastError: 1, w: “majority" } ) > db.runCommand( { getLastError: 1, w: 3 } ) // # of secondaries
  • 193. Read Concerns • Only primary (primary) • Primary preferred (primaryPreferred) • Only secondaries (secondary) • Secondaries preferred (secondaryPreferred) • Nearest node (Nearest) General: If more than one node is available, the nearest node will be chosen (All modes except Primary)
  • 199. Tagging while reading data • Allows for a more fine granular control where data will be read from – e.g. { "disk": "ssd", "use": "reporting" } • Can be combined with other read modes – Except for mode „Only primary“
  • 200. Configure the Read Concern // Only primary > cursor.setReadPref( “primary" ) // Primary preferred > cursor.setReadPref( “primaryPreferred" ) … // Only secondaries with tagging > cursor.setReadPref( “secondary“, [ rack : 2 ] ) Read Concern must be configured before using the cursor to read data!
  • 202. Maintenance & Upgrades • Zero downtime • Rolling upgrades and maintenance – – – – • Start with all secondaries Step down the current primary Primary as last one Restore previous primary (if needed) Commands: – rs.stepDown(<secs>) – db.version() – db.serverBuildInfo()
  • 203. Replica set – 1 data center • One – Data center – Switch – Power Supply • Possible errors: – Failure of 2 nodes – Power Supply – Network – Data Center • Automatic recovery
  • 204. Replica set – 2 data center • Additional node for data recovery • No writing to both data center since only one node in data center No. 2
  • 205. Replica set – 3 data center • Can recover from a complete data center failure • Allows for usage of w= { dc : 2 } to guarantee writing to 2 data centers (via tagging)
  • 206. Commands • Administration of the nodes – – – – – • rs.conf() rs.initiate(<conf>) & rs.reconfig(<conf>) rs.add(host:<port>) & rs.addArb(host:<port>) rs.status() rs.stepDown(<secs>) Reconfiguration if a minority of the nodes is not available – rs.reconfig( cfg, { force: true} )
  • 208. Best Practices • Uneven number of nodes • Adapt the write concern to your use case • Read from primary except for – Geographical distribution – Data analytics • Use logical names and not IP addresses for configuration • Monitor the lags of the secondaries (e.g. MMS)
  • 209. Lab time! Lab Nr. 06 Time box: 20 min
  • 211. Visual representation of vertical scaling 1970 - 2000: Vertical Scaling „Scale up“
  • 212. Visual representation of horizontal scaling Since 2000: Horizontal Scaling „Scale out“
  • 213. When to use Sharding?
  • 214. Not enough disk space
  • 215. The working set doesn‘t fit into the memory
  • 216. The needs for read-/write throughput are higher than the I/O capabilities
  • 218. Partitioning of data • The user needs to define a shard key • The shard key defines the distribution of data across the shards
  • 219. Partitioning of data into chunks • Initially all data is in one chunk • Maximum chunk size: 64 MB • MongoDB divides and distributes chunks automatically once the maximum size is met
  • 220. One chunk contains data of a certain value range
  • 221. Chunks & Shards • A shard is one node in the cluster • A shard can be one single mongod or a replica set
  • 222. Metadata Management • Config Server – Stores the value ranges of the chunks and their location – Number of config servers is 1 or 3 (Production: 3) – Two Phase Commit
  • 223. Balancing & Routing Service • mongos balances the data in the cluster • mongos distributes data to new nodes • mongos routes queries to the correct shard or collects results if data is spread on multiple shards • No local data
  • 224. Automatic Balancing Balancing will be automatically done once the number of chunks between shards hits a certain threshold
  • 225. Splitting of a chunk • Once a chunk hits the maximum size it will be split • Splitting is only a logical operation, no data needs to be moved • If the splitting of a chunk results in a misbalance of data, automatic rebalancing will be started
  • 227. MongoDB Auto Sharding • Minimal effort – Usage of the same interfaces for mongod and mongos • Easy configuration – Enable sharding for a database • sh.enableSharding("<database>") – Shard a collection in a database • sh.shardCollection("<database>.<collection>", shard-key-pattern)
  • 229. Example of a very simple cluster • Never use this in production! – Only one config server (No fault tolerance) – Shard is no replica set (No high availability) – Only one mongos and one shard (No performance improvement)
  • 230. Start the config server // Start the config server (Default port 27019) > mongod --configsvr
  • 231. Start the mongos routing service // Start the mongos router (Default port 27017) > mongos --configdb <hostname>:27019 // When using 3 config servers > mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>
  • 232. Start the shard // Start a shard with one mongod (Default port 27018) > mongod --shardsvr // Shard is not yet added to the cluster!
  • 233. Add the shard // Connect to mongos and add the shard > mongo > sh.addShard(‘<host>:27018’) // When adding a replica set, you only need to add one of the nodes!
  • 234. Check configuration // Check if the shard has been added > db.runCommand({ listShards:1 }) { "shards" : [ { "_id”: "shard0000”, "host”: ”<hostname>:27018” } ], "ok" : 1 }
  • 235. Configure sharding // Enable the sharding for a database > sh.enableSharding(“<dbname>”) // Shard a collection using a shard key > sh.shardCollection(“<dbname>.user”, { “name” : 1 } ) // Use a compound shard key > sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})
  • 237. Shard Key • The shard key can not be changed • The values of a shard key can not be changed • The shard key needs to be indexed • The uniqueness of the field _id is only guaranteed within a shard • The size of a shard key is limited to 512 bytes
  • 238. Considerations for the shard key • Cardinality of data – The value range needs to be rather large. For example sharding on the field loglevel with the 3 values error, warning, info doesn‘t make sense. • Distribution of data – Always strive for equal distribution of data throughout all shards! • Patterns during reading and writing – For example for log data using the timestamp as a shard key can be useful if chronological very close data needs to be read or written together.
  • 239. Choices for the shard key • Single field – If the value range is big enough and data is distributed almost equally • Compound fields – Use this if a single field is not enough in respect to value range and equal distribution • Hash based – In general a random shard key is a good choice for equal distribution of data – For performance the shard key should be part of the queries – Only available since 2.4 • sh.shardCollection( “", { a: "hashed" } )
  • 240. Example: User { _id: 346, username: “sheldinator”, password: “238b8be8bd133b86d1e2ba191a94f549”, first_name: “Sheldon” last_name: “Cooper” created_on: “Mon Apr 15 15:30:32 +0000 2013“ modified_on: “Thu Apr 18 08:11:23 +0000 2013“ } Which shard key would you choose and why?
  • 241. Example: Log data { log_type: “error” // Possible values “error, “warn”, “info“ application: “JBoss v. 4.2.3” message: “Fatal error. Application will quit.” created_on: “Mon Apr 15 15:38:05 +0000 2013“ } Which shard key would you choose and why?
  • 243. Possible types of queries • Exact queries – Data is exactly on one shard • Distributed query – Data is distributed on different shards • Distributed query with sorting – Data is distributed on different shards and needs to be sorted
  • 245. 1. mongos receives the query from the client
  • 246. 2. Query is routed to the shard with the data
  • 247. 3. Shard returns the data
  • 248. 4. mongos returns the data to the client
  • 250. 1. mongos receives the query from the client
  • 251. 2. mongos routes the query to all shards
  • 252. 3. Shards return the data
  • 253. 4. mongos returns the data to the client
  • 255. 1. mongos receives the query from the client
  • 256. 2. mongos routes the query to all shards
  • 257. 3. Execute the query and local sorting
  • 258. 4. Shards return sorted data
  • 259. 5. mongos sorts the data globally
  • 260. 6. mongos returns the sorted data to the client
  • 261. Lab time! Lab Nr. 07 Time box: 20 min