The document discusses new sharding architectures for MongoDB that provide higher availability and better resource utilization compared to traditional MongoDB clusters. It describes how TokuMX, a fork of MongoDB, implements read-free replication to allow secondaries to only perform writes, improving their utilization. It also explains how TokuMX can implement Dynamo-style sharding to provide linear write scaling and replicated data for high read throughput and reliability. Future work is needed to improve the chunk balancing strategies when machines are added or removed.
6. General
MongoDB Cluster
• Sx write throughput.
• Rx read throughput.
• R/2 nodes can go down
without losing availability.
• Data can survive
destruction of R-1 nodes.
• S×R hardware &
maintenance cost.
7. TokuMX: MongoDB with Fractal Trees
• MongoDB fork.
• Compression, performance, transactions.
• Details about Fractal Trees after lunch.
8. TokuMX: MongoDB with Fractal Trees
• Read-free Replication
• Fast Updates
• Optimized Sharding Migrations
• Ark Consensus for Replication Failover
• Partitioned Collections
• Clustering Indexes & Primary Keys
• tokutek.com/tokumx
9. Fractal Tree
Performance Basics
Writes are cheap:
• O(1/B) I/Os per op.
• ≈10k/s
Reads are expensive:
• Ω(1) I/O per op.
• ≈100/s
11. Read-free Replication
Updates are reads + writes.
Secondaries can trust the primary,
only do writes.
Looking at I/O utilization,
secondaries are very cheap
compared to primaries.
12. A Traditional
TokuMX Cluster
• 9 machines, only 3x
throughput benefit.
• Secondaries are
under-utilized.
13. A TokuMX Cluster With
Read-free Replication
• 3x write throughput.
• 3x read throughput.
• (maybe separately)
14. A TokuMX Cluster With
Read-free Replication
• 1 node can go down
without losing availability.
15. A TokuMX Cluster With
Read-free Replication
• Data can survive
destruction of 2 nodes.
16. A TokuMX Cluster With
Read-free Replication
• Only 3x hardware cost,
down from 9x.
17. Dynamo Architecture
• Developed at Amazon.
• Used by Cassandra, Riak, Voldemort.
• Many components, I will focus on data
partitioning.
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
18. Dynamo Architecture
• Servers are equal peers, not separate
primaries and secondaries.
• Store overlapping subsets of data
(MongoDB shards store disjoint subsets).
• Data partitioning determined by
consistent hashing.
19. Dynamo Partitioning
• N servers in a ring.
• hash(K) is a location
around the ring.
• Store data for K on the
next R servers on the
ring.
20. Dynamo Partitioning
• All nodes accept writes:
~linear write scaling.
• Data replicated R times:
Rx read performance/
reliability.
21. Dynamo-style Sharding in TokuMX
• Each node is primary for some
chunks, secondary for others.
• Nodes store overlapping
subsets of the data set.
22. Dynamo-style Sharding in TokuMX
• S primaries in the ring:
Sx write throughput.
• R copies of each chunk on
separate machines:
Rx read throughput,
availability & recovery
guarantees.
23. Dynamo-style Sharding in TokuMX
• Adding a node:
– Move one secondary from each
of next 2 nodes to the new node.
– Initialize a new replica set on the
new node and next 2 nodes.
24. Future Work
Chunk balancer is not
sophisticated:
• Adding/removing machines is
rough, overloads the machine’s
neighbors.
• Can we use ideas from
Cassandra & Riak to improve
this?
MongoDB architecture
requires managing multiple
processes on each machine.
• We can do better with good
tools. Talk to me if you want to
write them.
25. Thanks!
Come to my talk after lunch for details about
Fractal Trees.
leif@tokutek.com
@leifwalsh
tokutek.com/tokumx
slidesha.re/13pxgH8