Simeon Simeonov, Founder & CTO of Swoop, shares how Swoop uses Mongo behind the scenes for their high-performance core data processing and analytics. The presentation goes over tips and tricks such as zero-overhead hierarchical relationships with MongoDB, high-performance MongoDB atomic update buffering, content-addressed storage using cryptographic hashing and more. Presented to the Boston MongoDB User Group.
13. Display Advertising
Makes the Web Suck
User-focused optimization
Tens of millions of users
1000+% better than average
200+% better than Google
Swoop Fixes That
16. MongoDB: the Bad
Not Quite Enterprise-Grade
Not Quite Enterprise-Grade
Not Cheap to Run Well
17. I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
I will write more robust code
18. I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
I will design a better map-reduce
21. Five Steps to Happiness
Sharding
Native Relationships
Atomic Update Buffering
Content-Addressed Storage
Shell Tricks
24. // Google AdWords object model
Account
Campaign
AdGroup // this joins ads & keywords
Ad
Keyword
// For example
AdGroup has an Account
AdGroup has a Campaign
AdGroup has many Ads
AdGroup has many Keywords
Slam dunk
for SQL
25. // Let’s play a bit
Account
Campaign
AdGroup
Ad
Keyword
26. // Let’s play some more
Account
Campaign
AdGroup
Ad
Keyword
27. // There is just one bit left
Account
Campaign
AdGroup
1 Ad
0 Keyword
28. // build a hierarchical ID
accountIDcampaignIDadGroupID((0keywordID)|(1adID))
// a binary ID!
10100100001100000000101001100110101010010100
< accountID >< campaignID >< …
// Encode it in base 16, 32 or 64
{"_id" : "a4300a66a94d20f1", … }
29. // Example
The 5th
ad
Of the 3rd
ad group
Of the 7th
campaign
Of the 255th
account
could have the _id 0x00ff000700031005
The _id for the 10th
keyword of the same ad group
would be 0x00ff00070003000a
30. // Neat: the ad’s and keyword’s _ids contain the
// IDs of all of their ancestors in the hierarchy.
keywordId = 0x00ff00070003000a
adGroupId = keywordId & 0xffffffffffff0000
campaignId = keywordId & 0xffffffff00000000
accountId = keywordId & 0xffff000000000000
// has-a relationship is a simple lookup
account = db.accounts.findOne({_id: accountId})
31. // Neater: has-many relationships are just
// range queries on the _id index.
adGroupId = keywordId & 0xffffffffffff0000
startOfAds = adGroupId + 0x1000
endOfAds = adGroupId + 0x1fff
adsForKeyword = db.ads.find({
_id: {$gte: startOfAds, $lte: endOfAds}
})
// Technically, that was a join via the ad group.
// Who said Mongo can’t do joins???
37. Content Addressed Storage
Lazy join abstraction
Very space efficient
Extremely (pre-)cacheable
Join only happens during reporting
38. // Step 1: take a set of dimensions worth tracking
data = {
"domain_id" : "SW-28077508-16444",
"hint" : "Find an organic alternative",
"theme" : "red"
}
// Step 2: compute a digital signature, e.g., MD5
sig = "000069569F4835D16E69DF704187AC2F”
// Step 3: if new sig, increment a counter
counter = 264034
// Step 4: create a document in the context-
// addressed store collection for these
39. > db.cas.findOne()
{
"_id" : "000069569F4835D16E69DF704187AC2F", // MD5 hash
"data" : { // data that was digested to the hash above
"domain_id" : "SW-28077508-16444",
"hint" : "Find an organic alternative",
"theme” : "red"
},
"meta_data" : {
"id" : 264034 // variationID
},
"created_at" : ISODate("2013-02-04T12:05:34.752Z")
}
// Elsewhere, in the reports collection…
"variations" : {
"264034" : {
// metrics here
},
…
lazy join
41. // Use underscore.js in the shell
// See http://underscorejs.org/
function underscore() {
load("/mongo_hacks/underscore.js");
}
42. // Loads underscore.js on the MongoDB server
function server_underscore(force) {
force = force || false;
if (force || typeof(underscoreLoaded) === 'undefined') {
db.eval(cat("/mongo_hacks/underscore.js"));
underscoreLoaded = true;
}
}
43. // Callstack printing on exception -- wraps a function
function dbg(f) {
try {
f();
} catch (e) {
print("n**** Exception: " + e.toString());
print("n");
print(e.stack);
print("n");
if (arguments.length > 1) {
printjson(arguments);
print("n");
}
throw e;
}
}
44. function minutesAgo(minutes, d) {
d = d || new Date();
return new Date(d.valueOf() - minutes * 60 * 1000);
}
function hoursAgo(hours, d) {
d = d || new Date();
return minutesAgo(60 * hours, d);
}
function daysAgo(days, d) {
d = d || new Date();
return hoursAgo(24 * days, d);
}
45. // Don’t write in the shell.
// Use your fav editor, save & type t() in mongo
function t() {
load("/mongo_hacks/bag_of_tricks.js");
}