3. Why Scale Horizontally? Vertical scaling is expensive Horizontal scaling is more incremental – works well in the cloud Will always be able to scale wider than higher
4. Distribution Models Ad-hoc partitioning Consistent hashing (dynamo) Range based partitioning (BigTable/PNUTS)
5. Auto Sharding Each piece of data is exclusively controlled by a single node (shard) Each node (shard) has exclusive control over a well defined subset of the data Database operations run against one shard when possible, multiple when necessary As system load changes, assignment of data to shards is rebalanced automatically
8. Mongo Sharding Mapping of documents to shards controlled by shard key Can convert from single master to sharded cluster with 0 downtime Most functionality of a single Mongo master is preserved Fully consistent
14. { a : …, b : …, c : … } a is declared shard key find( { a : { $gt : 333, $lt : 400 } )
15. { a : …, b : …, c : … } a is declared shard key find( { a : { $gt : 333, $lt : 2012 } )
16. Query III Shard key { user_id: 1 } db.users.find( { hometown: ‘Seattle’ } ) Query all shards
17. { a : …, b : …, c : … } a is declared shard key find( { a : { $gt : 333, $lt : 2012 } )
18. { a : …, b : …, c : … } secondary query, secondary index ensureIndex({b:1}) find( { b : 99 } ) This case good when m is small (such as here), also when the queries are large tasks
19. Query IV Shard key { user_id: 1 } db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } ) Query all shards, in sequence
20. Query V Shard key { user_id: 1 } db.users.find( { hometown: ‘Seattle’ } ).sort( { lastname: 1 } ) Query all shards in parallel, perform merge sort Secondary index in { lastname: 1 } can be used
21. Map/Reduce Map/Reduce was designed for distributed systems Map/Reduce jobs will run on all relevant shards in parallel, subject to query spec
23. Writes Inserts routed to appropriate shard (inserted doc must contain shard key) Removes call upon matching shards Updates call upon matching shards Writes parallel if asynchronous, sequential if synchronous Updates cannot modify the shard key
24. Choice of Shard Key Key per document that is generally a component of your queries Often you want a unique key If not, consider granularity of key and potentially add fields The shard key is generally comprised of fields you would put in an index if operating on a single machine But in a sharded configuration, the shard key will be indexed automatically
25. Shard Key Examples (again) { user_id: 1 } { state: 1 } { lastname: 1, firstname: 1 } { tag: 1, timestamp: -1 } { _id: 1 } This is the default Careful when using ObjectId
26. Bit.ly Example ~50M users ~10K concurrently using server at peak ~12.5B shortens per month (1K/sec peak) History of all shortens per user stored in mongo
29. Balancing The whole point of autosharding is that mongo balances the shards for you The balancing algorithm is complicated, basic idea is that this 1000 <= user_id < +inf -inf <= user_id < 1000
31. Balancing Current balancing metric is data size Future possibilities – cpu, disk utilization There is some flexibility built into the partitioning algorithm – so we aren’t thrashing data back and forth between shards Only move one ‘chunk’ of data at a time – a conservative choice that limits total overhead of balancing
33. Shard Regular mongodprocess(es), storing all documents for a given key range Handles all reads/writes for this key range as well Each shard indexes the data contained within it Can be single mongod, master/slave, or replica set
34. Shard - Chunk In a sharded cluster, shards partitioned by shard key Within a shard, chunks partitioned by shard key A chunk is the smallest unit of data for balancing Data moves between chunks at chunk granularity Upper limit on chunk size is 200MB Special case if shard key range is open ended
35. Shard - Replica Sets Replica sets provide data redundancy and auto failover In the case of sharding, this means redundancy and failover per shard All typical replica set operations are possible For example, write with w=N Replica sets were specifically designed to work as shards
37. Mongos Sharding router – distributes reads/writes to sharded cluster Client interface is the same as a mongod Can have as many mongos instances as you want Can run on app server machine to avoid extra network traffic Mongos also initiates balancing operations Keeps metadata per chunk in RAM – 1MB RAM per 1TB of user data in cluster
39. Config Server 3 Config servers Changes are made with a 2 phase commit If any of the 3 servers goes down, config data becomes read only Sharded cluster will remain online as long as 1 of the config servers is running Config metadata size estimate 1MB metadata per 1TB data in cluster
42. Limitations Unique index constraints not expressed by shard key are not enforced across shards Updates to a document’s shard key aren’t allowed (you can remove and reinsert, but it’s not atomic) Balancing metric is limited to # of chunks right now – but this will be enhanced Right now only one chunk moves in the cluster at a time – this means balancing can be slow, it’s a conservative choice we’ve made to keep the overhead of balancing low for now 20 petabyte size limit
43. Start Up Config Servers $ mkdir -p ~/dbs/config $ ./mongod --dbpath ~/dbs/config --port 20000 Repeat as necessary
44. Start Up Mongos $ ./mongos --port 30000 --configdb localhost:20000 No dbpath Repeat as necessary
45. Start Up Shards $ mkdir -p ~/dbs/shard1 $ ./mongod --dbpath ~/dbs/shard1 --port 10000 Repeat as necessary
53. Give it a Try! Download from mongodb.org Sharding production ready in 1.6, which is scheduled for release next week For now use 1.5 (unstable) to try sharding