Injustice - Developers Among Us (SciFiDevCon 2024)
Real-time Location Based Social Discovery using MongoDB
1. Real-time Location Based Social
Discovery using MongoDB
Fredrik Björk
Director of Engineering
MongoSV, Dec 4th 2012
2. What is Banjo?
• The most powerful location based mobile
technology that brings you the moments
you would otherwise miss
• Aggregates geo tagged posts from
Facebook, Twitter, Instagram and
Foursquare in real-time
4. Stats
• Launched June 2011
• 3 million users
• Social graph of 400 million profiles
• 50 billion connections
• ~200 geo posts created per second
4
5. Why MongoDB?
• Developer friendly
• Easy to maintain and scale
• Automatic failover
• Rapid prototyping of features
• Good fit for consuming, storing and
presenting JSON data
• Geospatial features out of the box
5
6. Infrastructure
• ~160 EC2 instances (75% MongoDB, 25%
Redis)
• SSD drives for low latency
• App servers (Sinatra & Rails) hosted on
Heroku
• Mongos with authentication running on
dedicated servers
6
7. Geo tagged posts
• Consumed as JSON from social network
APIs - streaming, polling & real-time
callbacks
• Exposed via REST APIs as JSON to the
Banjo iOS and Android apps
7
9. • _id is composed of provider (Facebook:
1, Twitter: 2 etc.) and post id for
uniqueness
https://twitter.com/fbjork/status/262989592561606656
> db.posts.find({ _id: ‘2:262989592561606656’ })
{
_id: “2:262989592561606656”,
username: “fbjork”,
text: “Will give a presentation at #MongoSV on how we use @MongoDB for
real-time location based social discovery at @Banjo http://www.10gen.com/
events/mongosv”,
...
}
9
10. • Coordinates are stored inside an array
with latitude, longitude
{
_id: “2:262989592561606656”,
username: “fbjork”,
text: “Will give a presentation at #MongoSV on how we use @MongoDB for
real-time location based social discovery at @Banjo http://www.10gen.com/
events/mongosv”,
coordinates: [37.784234,-122.438212],
...
}
10
11. • Friends are stored inside an array
{
_id: “2:262989592561606656”,
username: “fbjork”,
text: “Will give a presentation at #MongoSV on how we use @MongoDB for
real-time location based social discovery at @Banjo http://www.10gen.com/
events/mongosv”,
coordinates: [37.784234,-122.438212],
friend_ids: [8816792, 10324882, 2006261, ...]
}
11
18. Compound geo indexes
• Create a compound index on coordinates
and friend_ids:
> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )
18
19. • Fails for compound indexes with large
arrays
• Geospatial indexes have a size limit of
1000 bytes
> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )
Error: Key too large to index
19
20. Geospatial query performance
• Do we need a compound index at all?
• Geospatial index is usually restrictive
enough
• Problem: Array traversal (using $in) is
CPU hungry for large arrays
• Solution: Pre-sharded array fields
20
21. Pre-sharded array fields
• When dealing with large arrays, i.e
@BarackObama follower ids
• Partition fields using pre-sharding
• shard = Hash(key) MOD shard_count
• Keep array sizes in the low hundreds
21
24. Capped collections
• Good fit for storing a feed of posts for a
period of time
• Eliminates need to expire old posts
• Documents can’t grow
• Documents can’t be deleted
• Resizing collections is painful
• Can’t be sharded
24
25. TTL collections
• We switched to TTL collections with
MongoDB 2.2
• Deleting and growing documents is now
possible
• Easier to change expiration times
• Can be sharded (not by geo)
25