8. Why do we need replication
•Failover
•Backups
•Secondary batch jobs
•High availability
Sunday, 21 October 12
9. Replica Sets
Data Availability across nodes
• Data Protection
• Multiple copies of the data
• Spread across Data Centers, AZs
• High Availability
• Automated Failover
• Automated Recovery
Sunday, 21 October 12
17. Sharding
Data Distribution across nodes
• Data location transparent to your code
• Data distribution is automatic
• Data re-distribution is automatic
• Aggregate system resources horizontally
• No code changes
Sunday, 21 October 12
18. Sharding - Range distribution
sh.shardCollection("test.tweets", {_id: 1} , false)
shard01 shard02 shard03
Sunday, 21 October 12
19. Sharding - Range distribution
shard01 shard02 shard03
a-i j-r s-z
Sunday, 21 October 12
31. Two choices for consistency
•Eventual consistency
•Allow updates when a system has been partitioned
•Resolve conflicts later
•Example: CouchDB, Cassandra
•Immediate consistency
•Limit the application of updates to a single master
node for a given slice of data
•Another node can take over after a failure is detected
•Avoids the possibility of conflicts
•Example: MongoDB
Sunday, 21 October 12
32. Durability
•For how long is my data available?
•When do I now that my data is safe?
•Where?
•Mongodb style
•Fire and Forget
•Get Last Error
•Journal Sync
•Replica Safe
Sunday, 21 October 12
49. What MongoDB solves
• Applications store complex data that is easier to
Agility •
model as documents
Schemaless DB enables faster development cycles
• Relaxed transactional semantics enable easy scale
Flexibility •
out
Auto Sharding for scale down and scale up
• Cost effective operationalize abundant data
Cost (clickstreams, logs, tweets, ...)
Sunday, 21 October 12
50. Challenges for Databases
✓ Build a database for scaleout
• Run on clusters of 100s of commodity
machines
•… that enables agile development
•… and is usable for a broad variety of applications
Sunday, 21 October 12
51. Data Model
• Why JSON?
• Provides a simple, well understood
encapsulation of data
• Maps simply to the object in your OO language
• Linking & Embedding to describe relationships
Sunday, 21 October 12
52. Json
place1 = {
name : "10gen HQ",
address : "578 Broadway 7th Floor",
city : "New York",
zip : "10011",
tags : [ "business", "tech" ]
}
Sunday, 21 October 12
53. Schema Design
Relational Database
Sunday, 21 October 12
54. Schema Design
MongoDB embedding
linking
Sunday, 21 October 12
55. Schemas in MongoDB
Design documents that simply map to
your application
post = {author: "Hergé",
date: new Date(),
text: "Destination Moon",
tags: ["comic", "adventure"]}
> db.posts.save(post)
Sunday, 21 October 12
57. JSON & Scaleout
• Embedding removes need for
• Distributed Joins
• Two Phase commit
• Enables data to be distributed across many nodes
without penalty
Sunday, 21 October 12