The document discusses BranchOut's implementation of social features like following and activity feeds using MongoDB. For the follow system, they initially stored follows in MongoDB with advantages of compact data and read locality, but couldn't easily display followers. Their final solution stored followers across multiple documents to address the 16MB document size limit. For activity feeds, they modeled feed data in a way that aggregates events and scales horizontally, with average response times under 500ms. The document provides examples of how these social features were traditionally implemented in MySQL and the advantages of using MongoDB.
2. BranchOut
• Connect with your colleagues (follow)
• Activity feed of their professional activity
• Timeline of an individual’s posts
A more social professional network
Tuesday, January 22, 13
3. BranchOut
• 30M installed users
• 750MM total user records
• Average 300 connections per installed user
A more social professional network
Tuesday, January 22, 13
6. MongoDB @ BranchOut
• 100% MySQL until ~July 2012
• Much of our data fits well into a document
model
Tuesday, January 22, 13
7. MongoDB @ BranchOut
• 100% MySQL until ~July 2012
• Much of our data fits well into a document
model
• Our data design avoids RDBMS features
Tuesday, January 22, 13
10. Follow System
• Limit of 2000 followees (people you follow)
Business logic
Tuesday, January 22, 13
11. Follow System
• Limit of 2000 followees (people you follow)
• Unlimited followers
Business logic
Tuesday, January 22, 13
12. Follow System
• Limit of 2000 followees (people you follow)
• Unlimited followers
• Both lists reflect updates in near-real time
Business logic
Tuesday, January 22, 13
13. Follow System
Traditional RDBMS (i.e. MySQL)
follower_uid followee_uid follow_time
123 456 2013-01-22 15:43:00
456 123 2013-01-22 15:52:00
Tuesday, January 22, 13
14. Follow System
Traditional RDBMS (i.e. MySQL)
follower_uid followee_uid follow_time
123 456 2013-01-22 15:43:00
456 123 2013-01-22 15:52:00
Advantage: Easy inserts, deletes
Tuesday, January 22, 13
15. Follow System
Traditional RDBMS (i.e. MySQL)
follower_uid followee_uid follow_time
123 456 2013-01-22 15:43:00
456 123 2013-01-22 15:52:00
Advantage: Easy inserts, deletes
Disadvantage: Data locality, index size
Tuesday, January 22, 13
26. Follow System
Follower document size
• Max Mongo doc size: 16MB
• Number of people who follow our
community manager: 30MM
Tuesday, January 22, 13
27. Follow System
Follower document size
• Max Mongo doc size: 16MB
• Number of people who follow our
community manager: 30MM
• 30MM uids × 8 bytes/uid = 240MB
Tuesday, January 22, 13
28. Follow System
Follower document size
• Max Mongo doc size: 16MB
• Number of people who follow our
community manager: 30MM
• 30MM uids × 8 bytes/uid = 240MB
• Max followers per doc: ~2MM
Tuesday, January 22, 13
37. Business logic
• All connections and followees appear in your feed
Activity Feed
Tuesday, January 22, 13
38. Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
Activity Feed
Tuesday, January 22, 13
39. Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
• Support for evolving set of feed event types
Activity Feed
Tuesday, January 22, 13
40. Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
• Support for evolving set of feed event types
• Tagging creates multiple feed events for the same
underlying object
Activity Feed
Tuesday, January 22, 13
41. Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
• Support for evolving set of feed event types
• Tagging creates multiple feed events for the same
underlying object
• Feed events are not ephemeral -- Timeline
Activity Feed
Tuesday, January 22, 13
42. Traditional RDBMS (i.e. MySQL)
activity_id uid event_time type oid1 oid2
1 123 2013-01-22 15:43:00 photo 123abc 789ghi
2 345 2013-01-22 15:52:00 status 456def foobar
Activity Feed
Tuesday, January 22, 13
43. Traditional RDBMS (i.e. MySQL)
activity_id uid event_time type oid1 oid2
1 123 2013-01-22 15:43:00 photo 123abc 789ghi
2 345 2013-01-22 15:52:00 status 456def foobar
Advantage: Easy inserts
Activity Feed
Tuesday, January 22, 13
44. Traditional RDBMS (i.e. MySQL)
activity_id uid event_time type oid1 oid2
1 123 2013-01-22 15:43:00 photo 123abc 789ghi
2 345 2013-01-22 15:52:00 status 456def foobar
Advantage: Easy inserts
Disadvantages: Rigid schema adapts poorly to
new activity types, doesn’t scale
Activity Feed
Tuesday, January 22, 13
48. Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
Activity Feed
Tuesday, January 22, 13
49. Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
Activity Feed
Tuesday, January 22, 13
50. Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
4. Aggregate events that refer to the same story
Activity Feed
Tuesday, January 22, 13
51. Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
4. Aggregate events that refer to the same story
5. Sort (reverse chron)
Activity Feed
Tuesday, January 22, 13
52. Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
4. Aggregate events that refer to the same story
5. Sort (reverse chron)
6. Load content, comments, etc. and build stories
Activity Feed
Tuesday, January 22, 13
54. Performance
• Response times average under 500 ms (98th
percentile under 1 sec
Activity Feed
Tuesday, January 22, 13
55. Performance
• Response times average under 500 ms (98th
percentile under 1 sec
• Design expected to scale well horizontally
Activity Feed
Tuesday, January 22, 13
56. Performance
• Response times average under 500 ms (98th
percentile under 1 sec
• Design expected to scale well horizontally
• Need to continue to optimize
Activity Feed
Tuesday, January 22, 13
57. Building Social Features
with MongoDB
Nathan Smith
BrO: http://branchout.com/nate
FB: http://facebook.com/neocortica
Twitter: @nate510
Email: nate@branchout.com
Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack
Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
Good Quora questions on activity feeds:
http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
Tuesday, January 22, 13