Internals of replication in mongodb. These internals cover replication selection, the replication process, elections (and the rules), and oplog transformation.
This presentation was given at the MongoDB San Francisco conference.
8. Replication Process
● Record oplog entry on write
● Idempotent entries
● Pulled by replicas
1. Read over network
2. Buffer locally
3. Apply in batch
4. Repeat
9. Read + Apply Decoupled
● Background oplog reader thread
● Pool of oplog applier threads (by collection)
Repl Source
Applier
Thread
Pool
16
Buffer
DB4
DB3
DB1 DB2
Local Oplog
Network
Batch
Com
plete
11. Good Replication States
● Initial Sync
○ Record oplog start position
○ Clone/copy all dbs
○ Set minvalid, apply oplog since start
○ Build indexes
● Replication Batch: MinValid
15. Election Nomination
Disqualifications
A replica will nominate itself unless:
● Priority:0 or arbiter
● Not freshest
● Just stepped down (in unelectable state)
● Would be vetoed by anyone because
○ There is a Primary already
○ They don't have us in their config
○ Higher priority member out there
● Higher config version out there
16. The Election
Nomination:
● If it looks like a tie, sleep random time
(unless first node)
Voting:
● If all goes well, only one nominee
● All voting members vote for one nominee
● Majority of votes wins
22. Replication Source Select'n
● Select closest source
○ Limit to non-hidden or slave delayed
○ If nothing, try again with hidden/slave delayed
○ Select node with fastest "ping" time
○ Must be fresher
● Choose source when
○ Starting
○ Any error with existing source (network, query)
○ Any member is 30s ahead of current source
● Manual override
○ replSetSyncSource -- good until we choose again
24. Goal: Dynamic Reads
Controls for consistency
● Default to Primary
● Non-primary allowed
● Based on
○ Locality (ping/tags)
○ Tags
Client
S
P
S
Tags: A,
B
Tags: B, C