1. Using NoSQL with Yo’ SQL
Supplementing your app with a slice of MongoDB
Rich Thornett
Dribbble
Thursday, June 9, 2011
2. Dribbble
What are you working on?
Show and tell for creatives via screenshots
Thursday, June 9, 2011
3. Your Father's Webapp
Dribbble is a typical web application:
Ruby on Rails + Relational Database
We <3 PostgreSQL
But for certain tasks ...
Thursday, June 9, 2011
4. Alternative Values
log | scale | optimize | aggregate | cache
More flexible data structures
Easier horizontal scaling
Thursday, June 9, 2011
5. NoSQL
No == Not Only
(but sounds a bit stronger, no?)
• No: Fixed table schemas
• No: Joins
• Yes: Scale horizontally
Examples
Memcached, Redis, CouchDB, Cassandra, MongoDB ...
Thursday, June 9, 2011
6. Exploring MongoDB
• Persistent data store
• Powerful query language (closest to RDBMs)
• Broad feature set
• Great community and documentation
Utility belt that fits us?
Thursday, June 9, 2011
7. What is MongoDB?
A document-oriented NoSQL database
Collections & Documents
v.
Tables & Rows
Thursday, June 9, 2011
8. What's a document?
Our old friend JavaScript
{
_id: ObjectId("4ddfe31db6bc16ab615e573d"),
description: "This is a BSON document",
embedded_doc: {
description: "I belong to my parent document"
},
tags: ['can', 'haz', 'arrays']
}
Documents are BSON (binary encoded JSON)
Thursday, June 9, 2011
9. Embedded Documents
Avoid joins for "belongs to" associations
{
_id: ObjectId("4ddfe31db6bc16ab615e573d"),
description: "This is a BSON document",
embedded_doc: {
description: "I belong to my parent document"
},
tags: ['can', 'haz', 'arrays']
})
Thursday, June 9, 2011
10. Arrays
Avoid joins for "tiny relations"
{
_id: ObjectId("4ddfe31db6bc16ab615e573d"),
description: "This is a BSON document",
embedded_doc: {
description: "I belong to my parent document"
},
tags: ['can', 'haz', 'arrays']
})
Relational Cruft
thing thing_taggings tags
Thursday, June 9, 2011
11. Googley
“With MongoDB we can ... grow our data set horizontally
on a cluster of commodity hardware and do distributed
(read parallel execution of) queries/updates/inserts/deletes.”
--Markus Gattol
http://www.markus-gattol.name/ws/mongodb.html
Thursday, June 9, 2011
12. Replica Sets
Automate the storing of multiple copies of data
• Read Scaling
• Data Redundancy
• Automated Failover
• Maintenance
• Disaster Recovery
Thursday, June 9, 2011
13. Dude, who sharded?
Relax, not you.
Auto-sharding
You
Specify a shard key for a collection
Mongo
Partitions the collection across machines
Application
Blissfully unaware (mostly :)
Thursday, June 9, 2011
14. CoSQL
MongoDB
Lo
g
alin
WEBAPP
ggi
Sc
ng
MIND THE APP
RDBMS
An
ing
aly
ch
tic
Ca
s
Flexibility
Thursday, June 9, 2011
15. Ads
• Orthogonal to primary app
• Few joins
• Integrity not critical
Let's Mongo!
Thursday, June 9, 2011
16. From the Console
But there are drivers for all major languages
Create a text ad
db.ads.insert({
advertiser_id: 1,
type: 'text',
url: 'http://dribbbler-on-the-roof.com',
copy: 'Watch me!',
runs: [{
start: new Date(2011, 4, 7),
end: new Date(2011, 4, 14)
}],
created_at: new Date()
})
Thursday, June 9, 2011
17. Querying
Query by match
db.ads.find({advertiser_id: 1})
Paging active ads
// Page 2 of text ads running this month
db.ads.find({
type: 'text',
runs: {
$elemMatch: {
start: {$lte: new Date(2011, 4, 10)},
end: {$gte: new Date(2011, 4, 10)}
}
}
}).sort({created_at: -1}).skip(15).limit(15)
Thursday, June 9, 2011
18. Advanced Queries
http://www.mongodb.org/display/DOCS/Advanced+Queries
$gt $mod $size
$lt $ne $type
$gte $in $elemMatch
$lte $nin $not
$all $nor $where
$exists $or
count | distinct | group
Group does not work across shards, use map/reduce instead.
Thursday, June 9, 2011
19. Polymorphism
Easy inheritance. Document has whatever fields it needs.
// Banner ad has additional fields
db.ads.insert({
advertiser_id: 1,
type: 'banner',
url: 'http://dribbble-me-this.com',
copy: 'Buy me!',
runs: [],
image_file_name: 'ad.png',
image_content_type: 'image/png',
image_file_size: '33333'
})
Single | Multiple | Joined
table inheritance all present difficulties
No DB changes to create new subclasses in Mongo
Thursday, June 9, 2011
20. Logging
• Scale and query horizontally
• Add fields on the fly
• Writes: Fast, asynchronous, atomic
Thursday, June 9, 2011
21. Volume Logging
• Ad impressions
• Screenshot views
• Profile views
Fast, asynchronous writes and sharding FTW!
Thursday, June 9, 2011
22. Real-time Analytics
What people and locations are trending this hour?
db.trends.update(
{date: "2011-04-10 13:00"}, // search criteria
{
$inc: { // increment
'user.simplebits.likes_received': 1,
'country.us.likes_received': 1,
'city.boston.likes_received': 1
}
},
true // upsert
)
upsert: Update document (if present) or insert it
$inc: Increment field by amount (if present) or set to amount
Thursday, June 9, 2011
23. Flex Benefits
• Add/nest new fields to measure with ease
• Atomic upsert with $inc
Replaces two-step, transactional find-and-update/create
• Live, cached aggregation
Thursday, June 9, 2011
27. Search by Location
boston = [-71.0602778, 42.3583333] // long/lat
Within area
// $maxDistance: Find users in Boston area (w/in 50 miles)
db.users.find({location: {$near: boston, $maxDistance: 0.7234842}})
Within area, matching criteria
// Find users in the Boston area who:
// are available for work
// have expertise in HTML and icon design
db.users.find({
location: {$near: boston, $maxDistance: .7234842},
available: true,
skills: {$all: ['html', 'icon design']}
})
Thursday, June 9, 2011
28. Search Power
Flexible Documents
+
Rich Query Language
+
Geospatial Indexing
Thursday, June 9, 2011
30. Unique Views
a.k.a visitors per day
unique = remote_ip address / DAY
Thursday, June 9, 2011
31. Map/Reduce
http://www.mongodb.org/display/DOCS/MapReduce
Aggregate by key => GROUP BY in SQL
Collections
Input and output
Map
Returns 0..N key/value pairs per document
Reduce
Aggregates values per key
Thursday, June 9, 2011
32. Strategy
Two-pass map/reduce to calculate unique visitors
Pass 1
GROUP BY: profile, visitor
COUNT: visits per visitor per profile
Pass 2
GROUP BY: profile
COUNT: visitors
Thursday, June 9, 2011
33. Profile View Data
Visits on a given day
// Profile 1
{profile_id: 1, remote_ip: '127.0.0.1'}
{profile_id: 1, remote_ip: '127.0.0.1'}
{profile_id: 1, remote_ip: '127.0.0.2'}
// Profile 2
{profile_id: 2, remote_ip: '127.0.0.4'}
{profile_id: 2, remote_ip: '127.0.0.4'}
Thursday, June 9, 2011
34. Pass 1: Map Function
Count visits per remote_ip per profile
KEY = profile, remote_ip
map = function() {
var key = {
profile_id: this.profile_id,
remote_ip: this.remote_ip
};
emit(key, {count: 1});
}
Thursday, June 9, 2011
35. Reduce Function
Counts
(occurrences of key)
reduce = function(key, values) {
var count = 0;
values.forEach(function(v) {
count += v.count;
});
return {count: count};
}
Thursday, June 9, 2011
38. Pass 2: Results
Count visitors per profile
// Same reduce function as before
db.profile_views_by_visitor.mapReduce(map, reduce,
{out: 'profile_views_unique'}
)
// Results
db.profile_views_unique.find()
{ "_id" : 1, "value" : { "count" : 2 } }
{ "_id" : 2, "value" : { "count" : 1 } }
Thursday, June 9, 2011
39. Map/Deduce
Can be clunkier than GROUP BY in SQL. But ...
Large data sets, you get:
• Horizontal scaling
• Parallel processing across cluster
JavaScript functions offers flexibility/power
Thursday, June 9, 2011
40. Activity
SELECT * FROM everything;
Too many tables to JOIN or UNION
Thursday, June 9, 2011
41. Relational solution
Denormalized events table as activity log.
Column | Type |
------------------------+-----------------------------+
id | integer |
event_type | character varying(255) |
subject_type | character varying(255) |
actor_type | character varying(255) |
secondary_subject_type | character varying(255) |
subject_id | integer |
actor_id | integer |
secondary_subject_id | integer |
recipient_id | integer |
secondary_recipient_id | integer |
created_at | timestamp without time zone |
We use James Golick’s timeline_fu gem for Rails:
https://github.com/jamesgolick/timeline_fu
Thursday, June 9, 2011
42. Direction
Incoming Activity Generated Activity
(recipients) (actors)
Thursday, June 9, 2011
43. Complications
Multiple recipients
• Subscribe to comments for a shot
• Twitter-style @ mentions in comments
Confusing names
• Generic names make queries and view logic hard to follow
N+1
• Each event may require several lookups to get actor, subject, etc
Thursday, June 9, 2011
44. Events in Mongo
Comment on a Screenshot containing an @ mention
Screenshot owner and @user should be recipients.
Mongo version of our timeline_events table
{
event_type: "created",
subject_type: "Comment",
actor_type: "User",
subject_id: 999,
actor_id: 1,
recipients: [], // Multiple recipients
secondary_recipient_id: 3,
created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"
}
Thursday, June 9, 2011
45. Mongo Event v.2
Why is a user a recipient?
{
event_type: "created",
subject_type: "Comment",
actor_type: "User",
subject_id: 999,
actor_id: 1,
recipients: [1, 2],
recipients: [
{user_id: 2, reason: 'screenshot owner'},
{user_id: 3, reason: 'mention'}
],
created_at: "Wed May 05 2010 15:37:58 GMT-0400 (EDT)"
}
Thursday, June 9, 2011
48. Denormalizing?
You're giving up RDBMs benefits to optimize.
Optimize your optimizations.
Document flexibility:
Data structures can mirror the view
Thursday, June 9, 2011
49. Caching
http://www.mongodb.org/display/DOCS/Caching
MongoDB uses memory-mapped files
• Grabs free memory as needed; no configured cache size
• Relies on OS to reclaim memory (LRU)
Thursday, June 9, 2011
50. Replace Redis/Memcached?
FREQUENTLY accessed items LIKELY in memory
Good enough for you?
One less moving part.
Thursday, June 9, 2011
51. Cache Namespaces
'ad_1'
Memcached keys are flat
'ad_2'
'ad_3' No simple way to expire all
Collection
// Clear collection to expire
db.ads_cache.remove()
can serve as an expirable namespace
Thursday, June 9, 2011
52. Time to Mongo?
Versatility?
Data structure flexibility worth more than joins?
Easier horizontal scaling?
log | scale | optimize | aggregate | cache
http://www.mongodb.org
Thursday, June 9, 2011