SlideShare une entreprise Scribd logo
1  sur  34
MongoDB - Aggregation
Pipeline
Jason Terpko
DBA @ Rackspace/ObjectRocket
linkedin.com/in/jterpko
1
Background
www.objectrocket.com
2
Overview
www.objectrocket.com
3
o Aggregation Framework
o Pipeline Stages
o Operators
o Performance
o New Features
Aggregation
Pipeline
www.objectrocket.com
4
o Overview
o Stages
o Operators
o Multiple Stage Example
What is the Aggregation Pipeline?
www.objectrocket.com
5
A framework for data visualization and or manipulation using one ore multiple stages in
order (i.e. pipeline).
• Framework - Allows for the transformation of data through stages, the result can be
an array, cursor, or even a collection
• Visualization – Data transformation is not required at all times, this framework can
be used for basic counts, summations, and grouping
• Manipulation – Using stages the documents can be transformed as they pass
through each stage, this prepares the data for the next stage or the final result set
• Output – The result can be iterated over using a cursor or saved to a collection
within the same database
• Expandable – New stages and operators are added with each major version and in
3.4 views leverage the aggregation framework
All Stages
www.objectrocket.com
6
$collStats
$project
$match
$redact
$limit
$skip
$unwind
$group
$sample
$sort
$geoNear
$lookup
$out
$indexStats
$facet
$bucket
$bucketAuto
$sortByCount
$addFields
$replaceRoot
$count
$graphLookup
Common Stages
www.objectrocket.com
7
$match
$group
$project
$sort
$limit
$unwind
$out
- Filter (reduce) the number of documents that is passed to the next stage
- Group documents by a distinct key, the key can also be a compound key
- Pass documents with specific fields or newly computed fields to the next stage
- Returns the input documents in sorted order
- Limit the number of documents for the next stage
- Splits an array into into one document for each element in the array
- As the last stage, creates/replaces an unsharded collection with the input documents
Common Operators
www.objectrocket.com
8
Group Operators
$sum
$avg
$max
$min
$first
$last
Date Operators
$year
$month
$week
$hour
$minute
$second
Arithmetic Operators
$abs
$add
$multiply
$subtract
$trunc
Operators that return a value based on document data.
Operators that return true or false based on document data.
Comparison Operators
$eq $gt
$lt $gte
$lte
Boolean Operators
$and
$or
Aggregate()
www.objectrocket.com
9
db.changelog.aggregate([
{$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}},
{$sort: {time:-1}},
{$limit: 100},
{$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6",
"$details.step 3 of 6","$details.step 4 of 6",
"$details.step 5 of 6","$details.step 6 of 6" ] } } },
{$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } }
]);
Collection
Purpose: Return the average number of milliseconds to move a chunk for the last one
hundred moves.
$match
www.objectrocket.com
10
db.changelog.aggregate([
{$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}},
{$sort: {time:-1}},
{$limit: 100},
{$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6",
"$details.step 3 of 6","$details.step 4 of 6",
"$details.step 5 of 6","$details.step 6 of 6" ] } } },
{$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } }
]);
Stage 1
Purpose: In the first stage filter only the chunks that moved successfully.
Comparison
Operator
$sort
www.objectrocket.com
11
db.changelog.aggregate([
{$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}},
{$sort: {time:-1}},
{$limit: 100},
{$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6",
"$details.step 3 of 6","$details.step 4 of 6",
"$details.step 5 of 6","$details.step 6 of 6" ] } } },
{$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } }
]);
Stage 2
Purpose: Sort descending so we are prioritizing the most recent moved chunks.
$limit
www.objectrocket.com
12
db.changelog.aggregate([
{$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}},
{$sort: {time:-1}},
{$limit: 100},
{$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6",
"$details.step 3 of 6","$details.step 4 of 6",
"$details.step 5 of 6","$details.step 6 of 6" ] } } },
{$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } }
]);
Stage 3
Purpose: Further reduce the number of moves being analyzed because time to move a
chunk varies by chunk and collection.
$project
www.objectrocket.com
13
db.changelog.aggregate([
{$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}},
{$sort: {time:-1}},
{$limit: 100},
{$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6",
"$details.step 3 of 6","$details.step 4 of 6",
"$details.step 5 of 6","$details.step 6 of 6" ] } } },
{$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } }
]);
Stage 4
Purpose: For each moveChunk document project the sum of the steps to the next stage.
Arithmetic
Operator
$group
www.objectrocket.com
14
db.changelog.aggregate([
{$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}},
{$sort: {time:-1}},
{$limit: 100},
{$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6",
"$details.step 3 of 6","$details.step 4 of 6",
"$details.step 5 of 6","$details.step 6 of 6" ] } } },
{$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } }
]);
Stage 5
Purpose: Return the average number of milliseconds to move a chunk for the last one
hundred moves.
Arithmetic
Operator
Optimizations
www.objectrocket.com
15
o Projections
o Sequencing
o Indexing
o Sorting
Projections
www.objectrocket.com
16
When using $project stage Mongo will read and pass less data to the next stage. By doing this it will
require less CPU, RAM, and reduce the disk IO to process the aggregation.
db.jobs.aggregate([
{$match : {"type": "import"}},
{$sort: {"cluster": 1}},
{$project : { cluster: 1, type:1, seconds:1, _id: 0} },
{$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } }
]);
Stage 3
By default Mongo will try to determine if a subset of fields are required, if so it will request only those
fields and optimize the stage for you.
Sequencing
www.objectrocket.com
17
When stages can be ordered more efficiently, Mongo will reorder those stages for you to improve
execution time.
db.jobs.aggregate([
{$sort: {"cluster": 1}},
{$match : {"type": "import"}},
{$project : { cluster: 1, type:1, seconds:1, _id: 0} },
{$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } }
]);
By filtering documents first the number of documents to be sorted is reduced.
Sequencing
www.objectrocket.com
18
When stages can be ordered more efficiently, Mongo will reorder those stages for you to improve
execution time.
db.jobs.aggregate([
{$match : {"type": "import"}},
{$sort: {"cluster": 1}},
{$project : { cluster: 1, type:1, seconds:1} },
{$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } }
]);
In addition to sequence optimizations Mongo can also coalesce stages, for example a $match stage
followed by another $match will become one stage. A full list of sequence and coalesce optimizations
can be viewed at Aggregation Pipeline Optimization.
Indexing and Data Merging
www.objectrocket.com
19
Only two stages have the ability to utilize indexes, the $match stage and the $sort stage. Starting in
version 3.2 an index can cover an aggregation. Like find() you can generate an explain plan for an
aggregation to view a more detail execution plan.
To use an index, these stages must be the first stages in the pipeline.
Also released in version 3.2 for aggregations:
• Data that does not require the primary shard no longer has to be merged on the primary shard.
• Aggregations that include the shard key in the $match stage and don’t require data from other
shards can execute entirely on the target shard.
Memory
www.objectrocket.com
20
Stages have a limit of 100MB of RAM, this restriction is the most common restriction one encounters
when using the aggregation framework.
To exceed this limitation use the allowDiskUse option to allow stages like $sort to use temporary files.
db.jobs.aggregate([
{$match : {"type": "import"}},
{$sort: {"cluster": 1}},
{$project : { cluster: 1, type:1, seconds:1} },
{$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } }
], {allowDiskUse: true});
This option should be used with caution in production due to added resource consumption.
New In 3.4
www.objectrocket.com
21
o Recursive Search
o Faceted Search
o Views
Recursive Search
www.objectrocket.com
22
Recursively search a collection using $graphLookup. This stage in the pipeline takes input from
either the collection or a previous stage (e.g. $match).
{
$graphLookup: {
from: "users",
startWith: "$connections",
connectFromField: "connections",
connectToField: "name",
as: "connections",
}
}
Considerations
• This stage is limited to 100M of
RAM even with allowDiskUse
option
• maxDepth of zero is equivilent to
$lookup
• Collation must be consistent when
involving multiple views
Recursive Search
www.objectrocket.com
23
Users Collection:
{ "_id" : 101, "name" : "John”, "connections" : ["Jane", "David"] }
{ "_id" : 102, "name" : "David”, "connections" : ["George"] }
{ "_id" : 103, "name" : "George", "connections" : ["Melissa"] }
{ "_id" : 104, "name" : "Jane", "connections" : ["Jen"] }
{ "_id" : 105, "name" : "Melissa”, "connections" : ["Jason"] }
{ "_id" : 106, "name" : "Nick", "connections" : ["Derek"] }
Recursive Search
www.objectrocket.com
24
db.users.aggregate( [
{ $match: { "name": "John" } },
{ $graphLookup: {
from: "users",
startWith: "$connections",
connectFromField: "connections",
connectToField: "name",
as: "connections",
} },
{ $project: {
"_id": 0,
"name": 1,
"known connections": "$connections.name"
} } ] ).pretty();
Aggregation:
{
"name": "John",
"known connections": [
"Melissa",
"George",
"Jane",
"David”
]
}
Result:
Faceted Search
www.objectrocket.com
25
{ "_id" : 101, "name" : "Perf T", "price" : NumberDecimal("19.99"), "colors" : [ "red","white" ],
"sizes" : ["M", "L", "XL"] }
{ "_id" : 102, "name" : "Perf V-Neck", "price" : NumberDecimal("24.99"), "colors" : [ "white", "blue" ],
"sizes" : ["M", "L", "XL"] }
{ "_id" : 103, "name" : "Perf Tank", "price" : NumberDecimal("14.99"), "colors" : [ "red", "blue" ],
"sizes" : ["M", "L"] }
{ "_id" : 104, "name" : "Perf Hoodie", "price" : NumberDecimal("34.99"), "colors" : [ "blue" ],
"sizes" : ["M", "L", "XL"] }
Sample Data:
$facet allows you to process multiple pipelines with in a single aggregation stage. The
sub-pipelines take the same input documents and output one document in the stage
output.
Faceted Search
www.objectrocket.com
26
db.store.aggregate( [
{
$facet: {
"categorizedByColor": [
{ $unwind: "$colors" }, { $sortByCount: "$colors" }
],
"categorizedBySize": [
{ $unwind: "$sizes" }, { $sortByCount: "$sizes" }
],
"categorizedByPrice": [
{ $bucketAuto: {
groupBy: "$price”, buckets: 2
} } ] } }
]).pretty()
Command:
Faceted Search
www.objectrocket.com
27
{
"categorizedByColor": [ {
"_id": "blue",
"count": 3
},
{
"_id": "white",
"count": 2
},
{
"_id": "red",
"count": 2
}
],
……….
"categorizedBySize": [{
"_id": "L",
"count": 4
},
{
"_id": "M",
"count": 4
},
{
"_id": "XL",
"count": 3
}
],
……….
"categorizedByPrice": [{
"_id": {
"min": NumberDecimal("14.99"),
"max": NumberDecimal("24.99")
},
"count": 2
}, {
"_id": {
"min": NumberDecimal("24.99"),
"max": NumberDecimal("34.99")
},
"count": 2
} ]
}
Views
www.objectrocket.com
28
A read-only object that can be queried like the underlying collection. A view is created
using an aggregation pipeline and can be used to transform data or limit data access
from another collection.
• Computed on demand for each read operation
• Use indexes from the underlying collection
• Names are immutable, to change the name drop and recreate
• Can be created on sharded collections
• Are listed as collections in getCollectionNames()
• Allows for more granular access controls than RBAC via views
Example View
www.objectrocket.com
29
{
"_id": 101,
"first_name": "John",
"last_name": "Doe",
”dept": ”123",
”role": ”DBA",
”expense_id": 1234,
”amt": ”35.00",
”c_date": ISODate("2017-02-25T17:08:46.166Z")
}
Documents:
Example View
www.objectrocket.com
30
db.createView(
"recentExpenses",
"expenses",
[
{ $match: {"c_date": {$gte: new Date(new Date()-86400*1000)}}},
{ $project: { "_id": 1, "first_name": 1, "last_name": 1, "amt": 1 } },
{ $sort: {"c_date": -1}}
]
);
Create View:
Example View
www.objectrocket.com
31
> show collections
system.views
expenses
recentExpenses
> db.system.views.find()
{ "_id" : ”mydb.recentExpenses", "viewOn" : "expenses", "pipeline" : [ { ”$match ……
> db.recentExpenses.find()
{ "_id" : 103, "first_name" : "John", "last_name" : "Doe", "amt" : "35.00" }
{ "_id" : 102, "first_name" : ”Jane", "last_name" : ”Smith", "amt" : ”36.00" }
{ "_id" : 101, "first_name" : ”Mike", "last_name" : ”Adams", "amt" : ”33.00" }
Collections:
Metadata:
Usage:
Questions?
www.objectrocket.com
32
www.objectrocket.com
33
We’re Hiring!
Looking to join a dynamic & innovative
team?
Justine is here at Percona Live 2017,
Ask to speak with her!
Reach out directly to our Recruiter at
justine.marmolejo@rackspace.com
Thank you!
Address:
401 Congress Ave Suite 1950
Austin, TX 78701
Support:
1-800-961-4454
Sales:
1-888-440-3242
www.objectrocket.com
34

Contenu connexe

Tendances

MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB FundamentalsMongoDB
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptxSurya937648
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performanceVladimir Sitnikov
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Mongo DB 성능최적화 전략
Mongo DB 성능최적화 전략Mongo DB 성능최적화 전략
Mongo DB 성능최적화 전략Jin wook
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerMydbops
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptxSigit52
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query OptimizationMongoDB
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patternsjoergreichert
 

Tendances (20)

MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performance
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongo DB 성능최적화 전략
Mongo DB 성능최적화 전략Mongo DB 성능최적화 전략
Mongo DB 성능최적화 전략
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
MongoDB
MongoDBMongoDB
MongoDB
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 

Similaire à MongoDB - Aggregation Pipeline

Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05Ankit Dubey
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Datagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and BackgridDatagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and BackgridGiorgio Cefaro
 
Datagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and BackgridDatagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and Backgrideugenio pombi
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011Chris Westin
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
Optimizing Angular Performance in Enterprise Single Page Apps
Optimizing Angular Performance in Enterprise Single Page AppsOptimizing Angular Performance in Enterprise Single Page Apps
Optimizing Angular Performance in Enterprise Single Page AppsMorgan Stone
 
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB
 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGuillaume Laforge
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Anuj Jain
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10genMongoDB
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1상욱 송
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Djangokenluck2001
 

Similaire à MongoDB - Aggregation Pipeline (20)

Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Datagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and BackgridDatagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and Backgrid
 
Datagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and BackgridDatagrids with Symfony 2, Backbone and Backgrid
Datagrids with Symfony 2, Backbone and Backgrid
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Optimizing Angular Performance in Enterprise Single Page Apps
Optimizing Angular Performance in Enterprise Single Page AppsOptimizing Angular Performance in Enterprise Single Page Apps
Optimizing Angular Performance in Enterprise Single Page Apps
 
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
MongoDB World 2019: Creating a Self-healing MongoDB Replica Set on GCP Comput...
 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Tt subtemplates-caching
Tt subtemplates-cachingTt subtemplates-caching
Tt subtemplates-caching
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 

Plus de Jason Terpko

Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External AuthenticationJason Terpko
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best PracticesJason Terpko
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingJason Terpko
 

Plus de Jason Terpko (7)

Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and Merging
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

MongoDB - Aggregation Pipeline

  • 1. MongoDB - Aggregation Pipeline Jason Terpko DBA @ Rackspace/ObjectRocket linkedin.com/in/jterpko 1
  • 3. Overview www.objectrocket.com 3 o Aggregation Framework o Pipeline Stages o Operators o Performance o New Features
  • 5. What is the Aggregation Pipeline? www.objectrocket.com 5 A framework for data visualization and or manipulation using one ore multiple stages in order (i.e. pipeline). • Framework - Allows for the transformation of data through stages, the result can be an array, cursor, or even a collection • Visualization – Data transformation is not required at all times, this framework can be used for basic counts, summations, and grouping • Manipulation – Using stages the documents can be transformed as they pass through each stage, this prepares the data for the next stage or the final result set • Output – The result can be iterated over using a cursor or saved to a collection within the same database • Expandable – New stages and operators are added with each major version and in 3.4 views leverage the aggregation framework
  • 7. Common Stages www.objectrocket.com 7 $match $group $project $sort $limit $unwind $out - Filter (reduce) the number of documents that is passed to the next stage - Group documents by a distinct key, the key can also be a compound key - Pass documents with specific fields or newly computed fields to the next stage - Returns the input documents in sorted order - Limit the number of documents for the next stage - Splits an array into into one document for each element in the array - As the last stage, creates/replaces an unsharded collection with the input documents
  • 8. Common Operators www.objectrocket.com 8 Group Operators $sum $avg $max $min $first $last Date Operators $year $month $week $hour $minute $second Arithmetic Operators $abs $add $multiply $subtract $trunc Operators that return a value based on document data. Operators that return true or false based on document data. Comparison Operators $eq $gt $lt $gte $lte Boolean Operators $and $or
  • 9. Aggregate() www.objectrocket.com 9 db.changelog.aggregate([ {$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}}, {$sort: {time:-1}}, {$limit: 100}, {$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6", "$details.step 3 of 6","$details.step 4 of 6", "$details.step 5 of 6","$details.step 6 of 6" ] } } }, {$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } } ]); Collection Purpose: Return the average number of milliseconds to move a chunk for the last one hundred moves.
  • 10. $match www.objectrocket.com 10 db.changelog.aggregate([ {$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}}, {$sort: {time:-1}}, {$limit: 100}, {$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6", "$details.step 3 of 6","$details.step 4 of 6", "$details.step 5 of 6","$details.step 6 of 6" ] } } }, {$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } } ]); Stage 1 Purpose: In the first stage filter only the chunks that moved successfully. Comparison Operator
  • 11. $sort www.objectrocket.com 11 db.changelog.aggregate([ {$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}}, {$sort: {time:-1}}, {$limit: 100}, {$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6", "$details.step 3 of 6","$details.step 4 of 6", "$details.step 5 of 6","$details.step 6 of 6" ] } } }, {$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } } ]); Stage 2 Purpose: Sort descending so we are prioritizing the most recent moved chunks.
  • 12. $limit www.objectrocket.com 12 db.changelog.aggregate([ {$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}}, {$sort: {time:-1}}, {$limit: 100}, {$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6", "$details.step 3 of 6","$details.step 4 of 6", "$details.step 5 of 6","$details.step 6 of 6" ] } } }, {$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } } ]); Stage 3 Purpose: Further reduce the number of moves being analyzed because time to move a chunk varies by chunk and collection.
  • 13. $project www.objectrocket.com 13 db.changelog.aggregate([ {$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}}, {$sort: {time:-1}}, {$limit: 100}, {$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6", "$details.step 3 of 6","$details.step 4 of 6", "$details.step 5 of 6","$details.step 6 of 6" ] } } }, {$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } } ]); Stage 4 Purpose: For each moveChunk document project the sum of the steps to the next stage. Arithmetic Operator
  • 14. $group www.objectrocket.com 14 db.changelog.aggregate([ {$match : {"details.note":"success", "details.step 6 of 6": {$gte:0}}}, {$sort: {time:-1}}, {$limit: 100}, {$project : {'totalTime' : { '$add' : [ "$details.step 1 of 6","$details.step 2 of 6", "$details.step 3 of 6","$details.step 4 of 6", "$details.step 5 of 6","$details.step 6 of 6" ] } } }, {$group: {_id: null, averageTotalTime: {$avg: "$totalTime"} } } ]); Stage 5 Purpose: Return the average number of milliseconds to move a chunk for the last one hundred moves. Arithmetic Operator
  • 16. Projections www.objectrocket.com 16 When using $project stage Mongo will read and pass less data to the next stage. By doing this it will require less CPU, RAM, and reduce the disk IO to process the aggregation. db.jobs.aggregate([ {$match : {"type": "import"}}, {$sort: {"cluster": 1}}, {$project : { cluster: 1, type:1, seconds:1, _id: 0} }, {$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } } ]); Stage 3 By default Mongo will try to determine if a subset of fields are required, if so it will request only those fields and optimize the stage for you.
  • 17. Sequencing www.objectrocket.com 17 When stages can be ordered more efficiently, Mongo will reorder those stages for you to improve execution time. db.jobs.aggregate([ {$sort: {"cluster": 1}}, {$match : {"type": "import"}}, {$project : { cluster: 1, type:1, seconds:1, _id: 0} }, {$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } } ]); By filtering documents first the number of documents to be sorted is reduced.
  • 18. Sequencing www.objectrocket.com 18 When stages can be ordered more efficiently, Mongo will reorder those stages for you to improve execution time. db.jobs.aggregate([ {$match : {"type": "import"}}, {$sort: {"cluster": 1}}, {$project : { cluster: 1, type:1, seconds:1} }, {$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } } ]); In addition to sequence optimizations Mongo can also coalesce stages, for example a $match stage followed by another $match will become one stage. A full list of sequence and coalesce optimizations can be viewed at Aggregation Pipeline Optimization.
  • 19. Indexing and Data Merging www.objectrocket.com 19 Only two stages have the ability to utilize indexes, the $match stage and the $sort stage. Starting in version 3.2 an index can cover an aggregation. Like find() you can generate an explain plan for an aggregation to view a more detail execution plan. To use an index, these stages must be the first stages in the pipeline. Also released in version 3.2 for aggregations: • Data that does not require the primary shard no longer has to be merged on the primary shard. • Aggregations that include the shard key in the $match stage and don’t require data from other shards can execute entirely on the target shard.
  • 20. Memory www.objectrocket.com 20 Stages have a limit of 100MB of RAM, this restriction is the most common restriction one encounters when using the aggregation framework. To exceed this limitation use the allowDiskUse option to allow stages like $sort to use temporary files. db.jobs.aggregate([ {$match : {"type": "import"}}, {$sort: {"cluster": 1}}, {$project : { cluster: 1, type:1, seconds:1} }, {$group: {_id: {cluster: "$cluster", type: "$type"}, avgExecTime: {$avg: "$seconds"} } } ], {allowDiskUse: true}); This option should be used with caution in production due to added resource consumption.
  • 21. New In 3.4 www.objectrocket.com 21 o Recursive Search o Faceted Search o Views
  • 22. Recursive Search www.objectrocket.com 22 Recursively search a collection using $graphLookup. This stage in the pipeline takes input from either the collection or a previous stage (e.g. $match). { $graphLookup: { from: "users", startWith: "$connections", connectFromField: "connections", connectToField: "name", as: "connections", } } Considerations • This stage is limited to 100M of RAM even with allowDiskUse option • maxDepth of zero is equivilent to $lookup • Collation must be consistent when involving multiple views
  • 23. Recursive Search www.objectrocket.com 23 Users Collection: { "_id" : 101, "name" : "John”, "connections" : ["Jane", "David"] } { "_id" : 102, "name" : "David”, "connections" : ["George"] } { "_id" : 103, "name" : "George", "connections" : ["Melissa"] } { "_id" : 104, "name" : "Jane", "connections" : ["Jen"] } { "_id" : 105, "name" : "Melissa”, "connections" : ["Jason"] } { "_id" : 106, "name" : "Nick", "connections" : ["Derek"] }
  • 24. Recursive Search www.objectrocket.com 24 db.users.aggregate( [ { $match: { "name": "John" } }, { $graphLookup: { from: "users", startWith: "$connections", connectFromField: "connections", connectToField: "name", as: "connections", } }, { $project: { "_id": 0, "name": 1, "known connections": "$connections.name" } } ] ).pretty(); Aggregation: { "name": "John", "known connections": [ "Melissa", "George", "Jane", "David” ] } Result:
  • 25. Faceted Search www.objectrocket.com 25 { "_id" : 101, "name" : "Perf T", "price" : NumberDecimal("19.99"), "colors" : [ "red","white" ], "sizes" : ["M", "L", "XL"] } { "_id" : 102, "name" : "Perf V-Neck", "price" : NumberDecimal("24.99"), "colors" : [ "white", "blue" ], "sizes" : ["M", "L", "XL"] } { "_id" : 103, "name" : "Perf Tank", "price" : NumberDecimal("14.99"), "colors" : [ "red", "blue" ], "sizes" : ["M", "L"] } { "_id" : 104, "name" : "Perf Hoodie", "price" : NumberDecimal("34.99"), "colors" : [ "blue" ], "sizes" : ["M", "L", "XL"] } Sample Data: $facet allows you to process multiple pipelines with in a single aggregation stage. The sub-pipelines take the same input documents and output one document in the stage output.
  • 26. Faceted Search www.objectrocket.com 26 db.store.aggregate( [ { $facet: { "categorizedByColor": [ { $unwind: "$colors" }, { $sortByCount: "$colors" } ], "categorizedBySize": [ { $unwind: "$sizes" }, { $sortByCount: "$sizes" } ], "categorizedByPrice": [ { $bucketAuto: { groupBy: "$price”, buckets: 2 } } ] } } ]).pretty() Command:
  • 27. Faceted Search www.objectrocket.com 27 { "categorizedByColor": [ { "_id": "blue", "count": 3 }, { "_id": "white", "count": 2 }, { "_id": "red", "count": 2 } ], ………. "categorizedBySize": [{ "_id": "L", "count": 4 }, { "_id": "M", "count": 4 }, { "_id": "XL", "count": 3 } ], ………. "categorizedByPrice": [{ "_id": { "min": NumberDecimal("14.99"), "max": NumberDecimal("24.99") }, "count": 2 }, { "_id": { "min": NumberDecimal("24.99"), "max": NumberDecimal("34.99") }, "count": 2 } ] }
  • 28. Views www.objectrocket.com 28 A read-only object that can be queried like the underlying collection. A view is created using an aggregation pipeline and can be used to transform data or limit data access from another collection. • Computed on demand for each read operation • Use indexes from the underlying collection • Names are immutable, to change the name drop and recreate • Can be created on sharded collections • Are listed as collections in getCollectionNames() • Allows for more granular access controls than RBAC via views
  • 29. Example View www.objectrocket.com 29 { "_id": 101, "first_name": "John", "last_name": "Doe", ”dept": ”123", ”role": ”DBA", ”expense_id": 1234, ”amt": ”35.00", ”c_date": ISODate("2017-02-25T17:08:46.166Z") } Documents:
  • 30. Example View www.objectrocket.com 30 db.createView( "recentExpenses", "expenses", [ { $match: {"c_date": {$gte: new Date(new Date()-86400*1000)}}}, { $project: { "_id": 1, "first_name": 1, "last_name": 1, "amt": 1 } }, { $sort: {"c_date": -1}} ] ); Create View:
  • 31. Example View www.objectrocket.com 31 > show collections system.views expenses recentExpenses > db.system.views.find() { "_id" : ”mydb.recentExpenses", "viewOn" : "expenses", "pipeline" : [ { ”$match …… > db.recentExpenses.find() { "_id" : 103, "first_name" : "John", "last_name" : "Doe", "amt" : "35.00" } { "_id" : 102, "first_name" : ”Jane", "last_name" : ”Smith", "amt" : ”36.00" } { "_id" : 101, "first_name" : ”Mike", "last_name" : ”Adams", "amt" : ”33.00" } Collections: Metadata: Usage:
  • 33. www.objectrocket.com 33 We’re Hiring! Looking to join a dynamic & innovative team? Justine is here at Percona Live 2017, Ask to speak with her! Reach out directly to our Recruiter at justine.marmolejo@rackspace.com
  • 34. Thank you! Address: 401 Congress Ave Suite 1950 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com 34