2. What problem are we solving? Map/Reduce can be used for aggregation… Currently being used for totaling, averaging, etc Map/Reduce is a big hammer Simpler tasks should be easier Shouldn’t need to write JavaScript Avoid the overhead of JavaScript engine We’re seeing requests for help in handling complex documents Select only subdocuments or arrays
3. How will we solve the problem? Our new aggregation framework Declarative framework No JavaScript required Describe a chain of operations to apply Expression evaluation Return computed values Framework: we can add new operations easily C++ implementation Higher performance than JavaScript
4. Aggregation - Pipelines Aggregation requests specify a pipeline A pipeline is a series of operations Conceptually, the members of a collection are passed through a pipeline to produce a result Similar to a command-line pipe
5. Pipeline Operations $match Uses a query predicate (like .find({…})) as a filter $project Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) This can include computed values $group Aggregates items into buckets defined by a key
6. Computed Expressions Available in $project operations Prefix expression language Add two fields: $add:[“$field1”, “$field2”] Provide a value for a missing field: $ifnull:[“$field1”, “$field2”] Nesting: $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] Other functions…. And we can easily add more as required
7. Projections $project can reshape results $unwind expression doles out array values one at a time Pull fields from nested documents to the top Push fields from the top down into new virtual documents
8. Grouping $group aggregation expressions Total of column values: $sum Average of column values: $avg Collect column values in an array: $push
10. Usage Tips Use $match in a pipeline as early as possible The query optimizer can then be used to choose an index and avoid scanning the entire collection
11. Driver Support Initial version is a command For any language, build a JSON database object, and execute the command { aggregate : <collection>, pipeline : {…} } Beware of command result size limit
12. When is this being released? In final development now Expect to see this in the near future
13. Sharding support Initial release will support sharding Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
14. Pipeline Operations – Future Plans $sort Sorts the document stream according to a key $out Saves the document stream to a collection Similar to M/R $out, but with sharded output
15. Expressions – Future Plans Date field extraction Get year, month, day, hour, etc, from Date Date arithmetic