HDFS Federation++

HDFS Federation andOther New Features Suresh Srinivas Founder & Architect – Hortonworks Yahoo Hadoop Team (3 years) @suresh_m_s Sanjay Radia Founder & Architect – Hortonworks Yahoo Hadoop Team (4 years) @srr © Hortonworks Inc. 2011

Agenda HDFS Architecture Overview Current Limitations HDFS Federation Other HDFS Features in Release 0.23 Futures © Hortonworks Inc. 2011 2

Two main layers Namespace Consists of dirs, files and blocks Supports create, delete, modify and list files or dirs operations Block Storage Block Management Datanode cluster membership Supports create/delete/modify/get block location operations Manages replication and replica placement Storage - provides read and write access to blocks Implemented as ,[object Object],Namespace Volume = Namespace + Blocks ,[object Object],Entire namespace is in memory Provides Block Management Datanodes store block replicas Block files stored on local file system on the nodes HDFS Architecture © Hortonworks Inc. 2011 3 NS Namenode Namespace Block Management Storage Block Storage Datanode Datanode …

Limitation - Single Namenode © Hortonworks Inc. 2011 4 Scalability ,[object Object]

Limited number of files, dirs and blocks

250 million files and blocks at 64GB Namenode heap sizePerformance ,[object Object],Poor Isolation ,[object Object]

Separate volume for tenants is not possible

Lacks separate namespace for different categories of applications

Experimental apps can affect production apps

Example - HBase could use its own namespace,[object Object]

HDFS Federation © Hortonworks Inc. 2011 7 ... ... NN-n NN-k NN-1 Foreign NS n Namespace NS1 NS k ... ... Pool k Pool n Pool 1 Block Pools Balancer Block Storage Datanode 2 Datanode m Datanode 1 ... Common Storage Multiple independent Namenodes and Namespace Volumes in a cluster Namespace Volume = Namespace + Block Pool Block Storage as generic storage service Set of blocks for a Namespace Volume is called a Block Pool DNs store blocks for all the Namespace Volumes – no partitioning

Key Ideas & Benefits Distributed Namespace: Partitioned across namenodes Simple and Robust due to independent masters Each master serves a namespace volume Preserves namenode stability - very little code change in the namenodes Scalability – 6K nodes, 100K tasks, 120PB and 1 billion files Block Pools enable generic storage service Enables Namespace Volumes to be independent of each other Fuels innovation and Rapid development New implementations of file systems and Applications on top of block storage possible New block pool categories – tmp storage, distributed cache, small object storage In future, move Block Management out of namenode to separate set of nodes Simplifies namespace/application implementation Distributed namenode becomes significantly simpler © Hortonworks Inc. 2011 8 Alternate NN Implementation HBase HDFS Namespace MR tmp Storage Service

HDFS Federation – First Phase Details Simple design Little change in the Namenode, most change in Datanode, Config and Tools Core development in 4 months Namespace and Block Management remain in Namenode Block Management could be moved out of namenode in the future Little impact on existing deployments Single namenode configuration runs as is Cluster level Web UI for better manageability Provides cluster summary, namenode list and decommissioning status Tools Decommissioning, Balancer and other tools work with multiple namespaces © Hortonworks Inc. 2011 9

HDFS Federation++

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à HDFS Federation++

Similaire à HDFS Federation++ (20)

Plus de Hortonworks

Plus de Hortonworks (20)

Dernier

Dernier (20)

HDFS Federation++

Notes de l'éditeur