MongoDB Auto-Sharding at Mongo Seattle

MongoDBauto sharding Aaron Staple aaron@10gen.com Mongo Seattle July 27, 2010

Why Scale Horizontally? Vertical scaling is expensive Horizontal scaling is more incremental – works well in the cloud Will always be able to scale wider than higher

Distribution Models Ad-hoc partitioning Consistent hashing (dynamo) Range based partitioning (BigTable/PNUTS)

Auto Sharding Each piece of data is exclusively controlled by a single node (shard) Each node (shard) has exclusive control over a well defined subset of the data Database operations run against one shard when possible, multiple when necessary As system load changes, assignment of data to shards is rebalanced automatically

Mongo Sharding Basic goal is to make this mongod mongod mongod ? client

Mongo Sharding Look like this mongod client

Mongo Sharding Mapping of documents to shards controlled by shard key Can convert from single master to sharded cluster with 0 downtime Most functionality of a single Mongo master is preserved Fully consistent

Shard Key Examples { user_id: 1 } { state: 1 } { lastname: 1, firstname: 1 } { tag: 1, timestamp: -1 } { _id: 1 } This is the default Careful when using ObjectId

Architecture Overview 5500 <= user_id < +inf -inf <= user_id < 2000 2000 <= user_id < 5500 mongod mongod mongod mongos client

Query I Shard key { user_id: 1 } db.users.find( { user_id: 5000 } ) Query appropriate shard

Query II Shard key { user_id: 1 } db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } ) Query appropriate shard(s)

{ a : …, b : …, c : … } a is declared shard key find( { a : { $gt : 333, $lt : 400 } )

{ a : …, b : …, c : … } a is declared shard key find( { a : { $gt : 333, $lt : 2012 } )

Query III Shard key { user_id: 1 } db.users.find( { hometown: ‘Seattle’ } ) Query all shards

{ a : …, b : …, c : … } secondary query, secondary index ensureIndex({b:1}) find( { b : 99 } ) This case good when m is small (such as here), also when the queries are large tasks

Query IV Shard key { user_id: 1 } db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } ) Query all shards, in sequence

Query V Shard key { user_id: 1 } db.users.find( { hometown: ‘Seattle’ } ).sort( { lastname: 1 } ) Query all shards in parallel, perform merge sort Secondary index in { lastname: 1 } can be used

Map/Reduce Map/Reduce was designed for distributed systems Map/Reduce jobs will run on all relevant shards in parallel, subject to query spec

Map/Reduce > m = function() { emit(this.user_id, 1); } > r = function(k,vals) { return 1; } > res = db.events.mapReduce(m, r, { query : {type:'sale'} }); > db[res.result].find().limit(2) { "_id" : 8321073716060 , "value" : 1 } { "_id" : 7921232311289 , "value" : 1 }

Writes Inserts routed to appropriate shard (inserted doc must contain shard key) Removes call upon matching shards Updates call upon matching shards Writes parallel if asynchronous, sequential if synchronous Updates cannot modify the shard key

Choice of Shard Key Key per document that is generally a component of your queries Often you want a unique key If not, consider granularity of key and potentially add fields The shard key is generally comprised of fields you would put in an index if operating on a single machine But in a sharded configuration, the shard key will be indexed automatically

Shard Key Examples (again) { user_id: 1 } { state: 1 } { lastname: 1, firstname: 1 } { tag: 1, timestamp: -1 } { _id: 1 } This is the default Careful when using ObjectId

Bit.ly Example ~50M users ~10K concurrently using server at peak ~12.5B shortens per month (1K/sec peak) History of all shortens per user stored in mongo

Bit.ly Example - Schema { "_id" : "lthp", "domain" : null, "keyword" : null, "title" : "bit.ly, a simple urlshortener", "url" : "http://jay.bit.ly/", "labels" : [ {"text" : "lthp", "name" : "user_hash"}, {"text" : "BUFx", "name" : "global_hash"}, {"text" : "http://jay.bit.ly/", "name" : "url"}, {"text" : "bit.ly, a simple urlshortener", "name" : "title"} ], "ts" : 1229375771, "global_hash" : "BUFx", "user" : "jay", "media_type" : null }

Bit.ly Example Shard key { user: 1 } Indices { _id: 1 } { user: 1 } { global_hash: 1 } { user: 1, ts: -1 } Query db.history.find( { user : user, labels.text : … } ).sort( { ts: -1 } )

Balancing The whole point of autosharding is that mongo balances the shards for you The balancing algorithm is complicated, basic idea is that this 1000 <= user_id < +inf -inf <= user_id < 1000

Balancing Becomes this 3000 <= user_id < +inf -inf <= user_id < 3000

Balancing Current balancing metric is data size Future possibilities – cpu, disk utilization There is some flexibility built into the partitioning algorithm – so we aren’t thrashing data back and forth between shards Only move one ‘chunk’ of data at a time – a conservative choice that limits total overhead of balancing

Shard Regular mongodprocess(es), storing all documents for a given key range Handles all reads/writes for this key range as well Each shard indexes the data contained within it Can be single mongod, master/slave, or replica set

Shard - Chunk In a sharded cluster, shards partitioned by shard key Within a shard, chunks partitioned by shard key A chunk is the smallest unit of data for balancing Data moves between chunks at chunk granularity Upper limit on chunk size is 200MB Special case if shard key range is open ended

Shard - Replica Sets Replica sets provide data redundancy and auto failover In the case of sharding, this means redundancy and failover per shard All typical replica set operations are possible For example, write with w=N Replica sets were specifically designed to work as shards

Mongos Sharding router – distributes reads/writes to sharded cluster Client interface is the same as a mongod Can have as many mongos instances as you want Can run on app server machine to avoid extra network traffic Mongos also initiates balancing operations Keeps metadata per chunk in RAM – 1MB RAM per 1TB of user data in cluster

Config Server 3 Config servers Changes are made with a 2 phase commit If any of the 3 servers goes down, config data becomes read only Sharded cluster will remain online as long as 1 of the config servers is running Config metadata size estimate 1MB metadata per 1TB data in cluster

Limitations Unique index constraints not expressed by shard key are not enforced across shards Updates to a document’s shard key aren’t allowed (you can remove and reinsert, but it’s not atomic) Balancing metric is limited to # of chunks right now – but this will be enhanced Right now only one chunk moves in the cluster at a time – this means balancing can be slow, it’s a conservative choice we’ve made to keep the overhead of balancing low for now 20 petabyte size limit

Start Up Config Servers $ mkdir -p ~/dbs/config $ ./mongod --dbpath ~/dbs/config --port 20000 Repeat as necessary

Start Up Mongos $ ./mongos --port 30000 --configdb localhost:20000 No dbpath Repeat as necessary

Start Up Shards $ mkdir -p ~/dbs/shard1 $ ./mongod --dbpath ~/dbs/shard1 --port 10000 Repeat as necessary

Configure Shards $ ./mongo localhost:30000/admin > db.runCommand({addshard : "localhost:10000", allowLocal : true}) { "added" : "localhost:10000", "ok" : true } Repeat as necessary

Shard Data > db.runCommand({"enablesharding" : "foo"}) > db.runCommand({"shardcollection" : "foo.bar", "key" : {"_id" : 1}}) Repeat as necessary

Production Configuration Multiple config servers Multiple mongos servers Replica sets for each shard Use getLastError/w correctly

Looking at config data Mongo shell connected to config server, config database > db.shards.find() { "_id" : "shard0", "host" : "localhost:10000" } { "_id" : "shard1", "host" : "localhost:10001" }

Looking at config data > db.databases.find() { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "foo", "partitioned" : false, "primary" : "shard1" } { "_id" : "x", "partitioned" : false, "primary" : "shard0” } {"_id" : "test", "partitioned" : true, "primary" : "shard0", "sharded" : { "test.foo" : { "key" : {"x" : 1}, "unique" : false } } }

Looking at config data > db.chunks.find() {"_id" : "test.foo-x_MinKey", "lastmod" : { "t" : 1276636243000, "i" : 1 }, "ns" : "test.foo", "min" : {"x" : { $minKey : 1 } }, "max" : {"x" : { $maxKey : 1 } }, "shard" : "shard0” }

Looking at config data > db.printShardingStatus() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3 } shards: { "_id" : "shard0", "host" : "localhost:10000" } { "_id" : "shard1", "host" : "localhost:10001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "foo", "partitioned" : false, "primary" : "shard1" } { "_id" : "x", "partitioned" : false, "primary" : "shard0" } { "_id" : "test", "partitioned" : true, "primary" : "shard0","sharded" : { "test.foo" : { "key" : { "x" : 1 }, "unique" : false } } } test.foo chunks: { "x" : { $minKey : 1 } } -->> { "x" : { $maxKey : 1 } } on : shard0 { "t" : 1276636243 …

Give it a Try! Download from mongodb.org Sharding production ready in 1.6, which is scheduled for release next week For now use 1.5 (unstable) to try sharding

MongoDB Auto-Sharding at Mongo Seattle

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to MongoDB Auto-Sharding at Mongo Seattle

Similar to MongoDB Auto-Sharding at Mongo Seattle (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB Auto-Sharding at Mongo Seattle