14. Advertising Requirement
MongoDB UK 2012
Page 14
• BrianAir *
• Retarget people who visited their site
• But did NOT purchase flights
* Stolen from the amazing Rhod Gilbert
15. Understanding a User
MongoDB UK 2012
Page 15
• Tracking Pixel
• Product Page – what a user has looked at
• Conversion Pixel
• Confirmation Page – what a user has purchased
16. A Simple Story…
MongoDB UK 2012
Page 16
• User 123
• Searches to travel to Copenhagen
• Does not purchase
• “?product_id=CPH”
19. MongoDB UK 2012
Page 19
Building an example Ad
{
“from” : “LHR”,
“to” : “CPH”,
“price” : “£39.99”
…
}
Geolocation
$loc = array(40.739037, 73.992964); // lat, long from IP
$airports->find(array(“loc” => array( “$near” => $loc)));
23. Our MongoDB Platform
• Multiple data centres
• Replication across our own Tier 1 Carrier
• 8 Shards
• 3 Machines in each replica set
• Mixture of languages
• C++
• Erlang
• Java
• PHP
MongoDB UK 2012
Page 23
Until yesterday, I was in New York, discussing how we’re going to be using MongoDB moving forward
Given an advertiser, a well known budget airlinewhich for legal reasons I cant name… lets call them… BrianAirWho want to retarget – or advertise to – the subset of people who have visited their siteSearched for a particular flight on itBut did not buy
First steps, is that we need a way of recording the information handed to us by BrianAirWe have an Event server, written in Erlang, which records the dynamic information from tracking pixels, and stores them in MongoDBThe same applies to the conversion pixels.So that we know, for each visitor, what they looked at, and whether they converted for that productThis is all great – its just means that all your advertisers traffic = all of your trafficHigh number of inserts, again making MongoDB a great choice
User 123 (a v4 UUID generated, and stored in the cookie)Searches the BrianAir site, to travel to CopenhagenBut does not purchaseWhich means, that in the tracking pixel call, we’ll see something (more complicated than this) but containing the product IDWe map this to the ID in the cookie
This allows us to build a profile of each userAgain, you can easily see how the document model from mongoDB works perfectly – allows completely flexible user profile shapes & sizes
When User 123 is browsing the web a day later, she gets an ad from the AOL networkWe need to build that ad for user 123
Not all ads will be like this – some simpler, some more complicated, but…Once the creative has been chosen:3 main components of a dynamic adUser Info – ie. User 123 searched for product CPHProduct Info – all the products & prices from the BrianAir siteGeolocation – we need to know which local airport user 123 is most likely to want to fly fromGeopIP LookupSpatial indexes query from MongoDB
Internal target of 5ms, to supply other systems with the dynamic information needed for an ad
Total transactions per second – events, ad impressions, select queriesSo, what kind of platform do we need to do this?
For advertising:Couple of DCs in the USOne in EuropeWith the data being replicated between them using our own Tier 1 Carrier (10gb/s)Currently running 8 shards, with 3 machines in each replica setWe’re using a 4 different divers for this project – again something which makes MongoDB so nice to work withSo server hardware…
We use pretty high spec machines – but theyre still available off the shelf(well to order anyway!)Key points of the spec are:A lot of network capacity – we hit network capacity before disk capacity beforeXFS (with noatime) – this can save a matter of seconds when MongoDBSo, why did we choose MongoDB…
So, to summarise the use case case for Online AdvertisingThe advertiser traffic == our trafficAs a summary…
The benefits really are clear when compared to other possible solutionsI don’t have a large team that can support a big hadoopcliuster, for eg.We’re a small team, that really needs to maximise the use of the time availableMongoDB is first & foremost easy to setup & learnDownload & Run – that’s itPerformance is greatWe keep a huge amount of data in RAMIt’s the best way of guaranteeing the performance we needScaling is really easyOps can add a new shard at the appropriate timeWith no downtimeThe community support is unparalleledOnline docs, forums, groupsInternal forumsShould you want it, there are support contracts available
During sharding events, global write-lock was too lowWe werent able to sustain the write performance we neededWe needed local writes, because of the transatlantic performance hit of writing remotelyWhen we pre-chunked, we prefixed the shard key with the datacentre name, so that the front end machines would only write locally