8. 3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
9. Fan out on read
• Generally, not the right approach
• 1 document per message sent
• Multiple recipients in an array key
• Reading an inbox is finding all messages with
my own name in the recipient field
• Requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find
everything
10. Fan out on Read
// Shard on “from”
db.shardCollection(”myapp.messages”, { ”from”: 1} )
// Make sure we have an index to handle inbox reads
db.messages.ensureIndex( { ”to”: 1, ”sent”: 1 } )
msg = {
from: "Joe”,
to: [ ”Bob”, “Jane” ],
sent: new Date(),
message: ”Hi!”,
}
// Send a message
db.messages.save(msg)
// Read my inbox
db.messages.find({ to: ”Joe” }).sort({ sent: -1 })
11. Fan out on read – Send
Message
Send
Message
Shard 1 Shard 2 Shard 3
12. Fan out on read – Inbox Read
Read
Inbox
Shard 1 Shard 2 Shard 3
13. Fan out on write
• Tends to scale better than fan out on read
• 1 document per recipient
• Reading my inbox is just finding all of the
messages with me as the recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random IO on the shard
14. Fan out on Write
// Shard on “recipient” and “sent”
db.shardCollection(”myapp.messages”, { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe”,
to: [ ”Bob”, “Jane” ],
sent: new Date(),
message: ”Hi!”,
}
// Send a message
for( recipient in msg.to ) {
msg.recipient = recipient
db.messages.save(msg);
}
// Read my inbox
db.messages.find({ recipient: ”Joe” }).sort({ sent: -1 })
15. Fan out on write – Send
Message
Send
Message
Shard 1 Shard 2 Shard 3
16. Fan out on write– Read Inbox
Read
Inbox
Shard 1 Shard 2 Shard 3
17. Fan out on write with
bucketing
• Generally the best approach
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inbox documents so there’s not too many
per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
22. Tradeoffs
Fan out on Fan out on Bucketed Fan out
Read Write on Write
Send Message Best Good Worst
Performance Single shard Shard per recipient Shard per recipient
Single write Multiple writes Appends (grows)
Read Inbox Worst Good Best
Performance Broadcast all shards Single shard Single shard
Random reads Random reads Single read
Data Size Best Worst Worst
Message stored Copy per recipient Copy per recipient
once
23. Things to consider
• Lots of recipients
• Fan out on write might become prohibitive
• Consider introducing a “Group”
• Very large message size
• Multiple copies of messages can be a burden
• Consider single copy of message with a “pointer” per inbox
• More writes than reads
• Fan out on read might be okay
26. Summary
• Multiple ways to model status updates
• Bucketed fan out on write is typically the better
approach
• Think about how your model distributes across
shards
• Think about how much random IO needs to
happen on a shard