14. HDFS Single NameNode
Single NameSpace - easy to serialize operations
NameSpace stored entirely in memory
Changes written to transaction log first
Single Point of Failure
Performance Bottleneck?
15. NameNode Scalability
“100,000 HDFS clients on a 10,000-node
HDFS cluster will exceed the throughput
capacity of a single name-node.
... any solution intended for single
namespace server optimization lacks
Konstantin scalability.
Shvachko
... the most promising solutions seem to
Login Apr 2010 be based on distributing the namespace
server ...”
16. Goal
50
writes/second (thousands)
37.5
25
12.5
0
Single NN Target
33. Current Status
Working code exists that uses HBase with slightly
modified DFSClient and DataNode for create, write,
close, open, read, mkdirs, delete.
New component: HealthServer monitors DataNodes
and does garbage collection. More like BigTable
master, can die, restart without affecting clients.
34. Code
Will be at http://code.google.com/p/hdfs-dnn
Available under the Apache license - whichever is
compatible with Hadoop
36. Self-Hosted HBase
May be possible to have HBase use the same HDFS
instance it’s supporting
Some recursion and self-reference already exists:
HBase Metadata table is itself a table in HBase
Have to work out bootstrapping and failure recovery to
resolve any potential circular dependencies