2. Background
• Prior to Hadoop 2.0.0, the NameNode was a single
point of failure (SPOF) in an HDFS cluster.
3. Approach and Terminology
• Initial goal is Active-Standby
• Terminology
– Active NN: Actively serves the read/write operations from the
clients
– Standby NN: Waits, becomes active when Active dies or is
unhealthy
– Hot Standby: Standby has all most of the Active’s state and start
immediately
5. Hardware resources
• NameNode machines
– Should have equivalent hardware to each other
• Shared storage
– Both NameNode can have read/write access
– Only a single shared directory is supported
• High-quality dedicated NAS appliance is recommended
• Secondary NameNode is not necessary
6. Automatic Failover
• Introduce two new components
– ZooKeeper
– ZKFailoverController (abbreviated as ZKFC)
• It’s a ZooKeeper client
• Each of the machines which runs a NameNode also runs a ZKFC
• Responsible for:
– Health monitoring
– ZooKeeper session management
– ZooKeeper-based election
7. Automatic Failover
ZK ZK ZK
session session
Shared dir on
NFS
ZKFC ZKFC
Heartbeat Active Hot Standby Heartbeat
NN NN
Block Reports
DN DN DN
8. Appendix
• High Availability Framework for HDFS NN
– HDFS-1623
• HDFS portion of ZK-based FailoverController
– HDFS-2185