SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
HDFS Federation

Sanjay Radia
Founder and Architect @ Hortonworks




                                      Page 1
About Me

•  Apache Hadoop Committer and Member
   of Hadoop PMC
•  Architect of core-Hadoop @ Yahoo
  -  Focusing on HDFS, MapReduce scheduler,
     Compatibility, etc.

•  PhD in Computer Science from the
  University of Waterloo, Canada




                                              Page 2
Agenda

•  HDFS Background

•  Current Limitations

•  Federation Architecture

•  Federation Details

•  Next Steps

•  Q&A

                             Page 3
HDFS Architecture

                                                Two main layers
                                                •  Namespace
Namespace




                Namenode                          -  Consists of dirs, files and blocks
                             NS                   -  Supports create, delete, modify and list files
                                                     or dirs operations
                     Block Management
                                                •  Block Storage
Block Storage




                                                  -  Block Management
                Datanode     …	
     Datanode        •  Datanode cluster membership
                                                     •  Supports create/delete/modify/get block
                           Storage
                                                        location operations
                                                     •  Manages replication and replica placement
                                                  -  Storage - provides read and write access to
                                                     blocks


                                                                                            Page 4
HDFS Architecture

                                                Implemented as
                                                •  Single Namespace Volume
                                                   -  Namespace Volume = Namespace + Blocks
Namespace




                Namenode
                             NS                 •  Single namenode with a namespace
                     Block Management
                                                   -  Entire namespace is in memory
                                                   -  Provides Block Management
Block Storage




                                                •  Datanodes store block replicas
                Datanode     …	
     Datanode
                                                   -  Block files stored on local file system
                           Storage




                                                                                                Page 5
Limitation - Isolation

Poor Isolation
•    All the tenants share a single namespace
      -    Separate volume for tenants is desirable
•    Lacks separate namespace for different application categories
     or application requirements
      -    Experimental apps can affect production apps
      -    Example - HBase could use its own namespace




                                                           Page 6
Limitation - Scalability

Scalability
•     Storage scales horizontally - namespace doesn’t
•     Limited number of files, dirs and blocks
     -    250 million files and blocks at 64GB Namenode heap size
           •    Still a very large cluster
           •    Facebook clusters are sized at ~70 PB storage

Performance
•     File system operations throughput limited by a single node
     -    120K read ops/sec and 6000 write ops/sec
           •    Support 4K clusters easily
           •    Easily scalable to 20K write ops/sec by code improvements



                                                                            Page 7
Limitation – Tight Coupling

Namespace and Block Management are distinct layers
•  Tightly coupled due to co-location
•  Separating the layers makes it easier to evolve each layer
•  Separating services
   -  Scaling block management independent of namespace is simpler
   -  Simplifies Namespace and scaling it,

Block Storage could be a generic service
•  Namespace is one of the applications to use the service
•  Other services can be built directly on Block Storage
   -  HBase
   -  MR Tmp
   -  Foreign namespaces

                                                                     Page 8
Stated Problem



             Isolation is a problem
                  for even small
                     clusters!




                                      Page 9
HDFS Federation
      Namespace       NN-1                                                           NN-k                                                                NN-n

                                                                                                                                                                    Foreig
                        	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  NS1                             	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  NS	
  k                     n	
  NS	
  n
                                                                            ...                                                                    ...


                                                            Pool	
  	
  1                                   Pool	
  	
  k                                 Pool	
  	
  n
      Block Storage




                                                                              	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Block	
  	
  Pools




                        Datanode	
  1	
                                                      Datanode	
  2                                                Datanode	
  m
                                                   ...                                                                    ...                                        ...
                                                                                    Common	
  Storage

•  Multiple independent Namenodes and Namespace Volumes in a
   cluster
   -  Namespace Volume = Namespace + Block Pool
•  Block Storage as generic storage service
   -  Set of blocks for a Namespace Volume is called a Block Pool
   -  DNs store blocks for all the Namespace Volumes – no partitioning
                                                                                                                                                                                   Page 10
Key Ideas & Benefits
•  Distributed Namespace: Partitioned
   across namenodes                                                     Alternate NN
   -  Simple and Robust due to independent                             Implementation             HBase
                                                           HDFS
      masters                                            Namespace                            MR tmp

     •  Each master serves a namespace volume
     •  Preserves namenode stability – little namenode
        code change
   -  Scalability – 6K nodes, 100K tasks, 200PB and                  Storage Service

      1 billion files




                                                                                        Page 11
Key Ideas & Benefits
•  Block Pools enable generic storage service
   -  Enables Namespace Volumes to be
                                                                        Alternate NN
      independent of each other                                        Implementation             HBase
                                                           HDFS
   -  Fuels innovation and Rapid development             Namespace                            MR tmp

      •  New implementations of file systems and
         Applications on top of block storage possible

      •  New block pool categories – tmp storage,                    Storage Service

         distributed cache, small object storage

•  In future, move Block Management out of
   namenode to separate set of nodes
   -  Simplifies namespace/application
      implementation

      •  Distributed namenode becomes significantly
         simpler

                                                                                        Page 12
HDFS Federation Details
•  Simple design
   -  Little change to the Namenode, most changes in Datanode, Config and Tools
   -  Core development in 4 months
   -  Namespace and Block Management remain in Namenode
      •  Block Management could be moved out of namenode in the future

•  Little impact on existing deployments
   -  Single namenode configuration runs as is

•  Datanodes provide storage services for all the namenodes
   -  Register with all the namenodes

   -  Send periodic heartbeats and block reports to all the namenodes
   -  Send block received/deleted for a block pool to corresponding namenode


                                                                          Page 13
HDFS Federation Details
•  Cluster Web UI for better manageability
   -  Provides cluster summary

   -  Includes namenode list and summary of namenode status

   -  Decommissioning status

•  Tools
   -  Decommissioning works with multiple namespace
   -  Balancer works with multiple namespaces

      •  Both Datanode storage or Block Pool storage can be balanced

•  Namenode can be added/deleted in Federated cluster
   -  No need to restart the cluster

•  Single configuration for all the nodes in the cluster
                                                                       Page 14
Managing Namespaces
•  Federation has multiple namespaces                                        Client-side
                                                                        /
                                                                             mount-table
   -  Don’t you need a single global namespace?
   -  Some tenants want private namespace
   -  Global? Key is to share the data and the names
      used to access the data                                data project home       tmp

•  A single global namespace is one way
   share
                                                                                           NS4




                                                       NS1       NS2        NS3




                                                                                 Page 15
Managing Namespaces
•  Client-side mount table is another                                 /
                                                                           Client-side
   way to share                                                            mount-table

   -  Shared mount-table => “global” shared
      view
   -  Personalized mount-table => per-                     data project home       tmp
      application view
      •  Share the data that matter by mounting it

•  Client-side implementation of                                                         NS4
   mount tables
   -  No single point of failure
   -  No hotspot for root and top level              NS1       NS2        NS3
      directories




                                                                               Page 16
Next Steps
•  Complete separation of namespace and block management
   layers
   -  Block storage as generic service

•  Partial namespace in memory for further scalability
•  Move partial namespace from one namenode to another
   -  Namespace operation - no data copy




                                                         Page 17
Next Steps

                               •  Namenode as a container for
                                  namespaces
                                  -  Lots of small namespace volumes

                     …              •  Chosen per user/tenant/data feed

      Namenodes                     •  Mount tables for unified namespace
                                      -    Can be managed by a central volume server

 Datanode   …	
     Datanode      -  Move namespace from one container to
                                     another for balancing

                               •  Combined with partial namespace
                                  -  Choose number of namenodes to match
                                    •  Sum of (Namespace working set)

                                    •  Sum of (Namespace throughput)
                                                                               Page 18
Thank You
More information


1.  HDFS-1052: HDFS Scalability with multiple namenodes
2.  Hadoop – 7426: user guide for how to use viewfs with federation
3.  An Introduction to HDFS Federation –
   https://hortonworks.com/an-introduction-to-hdfs-federation/


                                                                      Page 19
Other Resources
• Next webinar: Improve Hive and HBase Integration
  –  May 2, 2012 @ 10am PST
  –  Register now :http://hortonworks.com/webinars/

• Hadoop Summit
  – June 13-14
  – San Jose, California
  – www.Hadoopsummit.org

• Hadoop Training and Certification
  – Developing Solutions Using Apache Hadoop
  – Administering Apache Hadoop
  – http://hortonworks.com/training/

        © Hortonworks Inc. 2012                       Page 20
Backup slides



                Page 21
HDFS Federation Across Clusters

                    /
  Application                                           /      Application
  mount-table                                                  mount-table
  in Cluster 2                                                 in Cluster 1

                           home
                                                                    tmp
                                                        home
 tmp
         data           project
                                  data        project




        Cluster 2                 Cluster 1
                                              Page 22

Contenu connexe

Tendances

Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGlusterFS
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability Omid Vahdaty
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011GlusterFS
 
Cloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentCloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentGlusterFS
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingSergey Bushik
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 Chris Almond
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines
 

Tendances (20)

Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
 
Cloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentCloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and Deployment
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part X
 

En vedette

"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012
"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012
"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012Vladimir Ivanov
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Cloudera, Inc.
 
Как компаниям использовать фриланс на благо и не пролететь
Как компаниям использовать фриланс на благо и не пролететьКак компаниям использовать фриланс на благо и не пролететь
Как компаниям использовать фриланс на благо и не пролететьStas Davydov
 
Секреты сборки мусора в Java
Секреты сборки мусора в JavaСекреты сборки мусора в Java
Секреты сборки мусора в Javaaragozin
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 

En vedette (8)

"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012
"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012
"G1 GC и Обзор сборки мусора в HotSpot JVM" @ JUG SPb, 31-05-2012
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
 
Как компаниям использовать фриланс на благо и не пролететь
Как компаниям использовать фриланс на благо и не пролететьКак компаниям использовать фриланс на благо и не пролететь
Как компаниям использовать фриланс на благо и не пролететь
 
Секреты сборки мусора в Java
Секреты сборки мусора в JavaСекреты сборки мусора в Java
Секреты сборки мусора в Java
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 

Similaire à HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Understanding das-nas-san
Understanding das-nas-sanUnderstanding das-nas-san
Understanding das-nas-sanAshwin Pawar
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkSisimon Soman
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
My sql with enterprise storage
My sql with enterprise storageMy sql with enterprise storage
My sql with enterprise storageCaroline_Rose
 
Tailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesTailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesThe Linux Foundation
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptxSwarnaSLcse
 
DB2 and storage management
DB2 and storage managementDB2 and storage management
DB2 and storage managementCraig Mullins
 
HDFS - What's New and Future
HDFS - What's New and FutureHDFS - What's New and Future
HDFS - What's New and FutureDataWorks Summit
 
Severalnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIIISeveralnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIIISeveralnines
 

Similaire à HDFS Futures: NameNode Federation for Improved Efficiency and Scalability (20)

HDFS Federation
HDFS FederationHDFS Federation
HDFS Federation
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Understanding das-nas-san
Understanding das-nas-sanUnderstanding das-nas-san
Understanding das-nas-san
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Storage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talkStorage virtualization citrix blr wide tech talk
Storage virtualization citrix blr wide tech talk
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
My sql with enterprise storage
My sql with enterprise storageMy sql with enterprise storage
My sql with enterprise storage
 
Tailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesTailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual Machines
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
DB2 and storage management
DB2 and storage managementDB2 and storage management
DB2 and storage management
 
Index file
Index fileIndex file
Index file
 
HDFS - What's New and Future
HDFS - What's New and FutureHDFS - What's New and Future
HDFS - What's New and Future
 
Severalnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIIISeveralnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIII
 

Plus de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Plus de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

  • 1. HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1
  • 2. About Me •  Apache Hadoop Committer and Member of Hadoop PMC •  Architect of core-Hadoop @ Yahoo -  Focusing on HDFS, MapReduce scheduler, Compatibility, etc. •  PhD in Computer Science from the University of Waterloo, Canada Page 2
  • 3. Agenda •  HDFS Background •  Current Limitations •  Federation Architecture •  Federation Details •  Next Steps •  Q&A Page 3
  • 4. HDFS Architecture Two main layers •  Namespace Namespace Namenode -  Consists of dirs, files and blocks NS -  Supports create, delete, modify and list files or dirs operations Block Management •  Block Storage Block Storage -  Block Management Datanode …   Datanode •  Datanode cluster membership •  Supports create/delete/modify/get block Storage location operations •  Manages replication and replica placement -  Storage - provides read and write access to blocks Page 4
  • 5. HDFS Architecture Implemented as •  Single Namespace Volume -  Namespace Volume = Namespace + Blocks Namespace Namenode NS •  Single namenode with a namespace Block Management -  Entire namespace is in memory -  Provides Block Management Block Storage •  Datanodes store block replicas Datanode …   Datanode -  Block files stored on local file system Storage Page 5
  • 6. Limitation - Isolation Poor Isolation •  All the tenants share a single namespace -  Separate volume for tenants is desirable •  Lacks separate namespace for different application categories or application requirements -  Experimental apps can affect production apps -  Example - HBase could use its own namespace Page 6
  • 7. Limitation - Scalability Scalability •  Storage scales horizontally - namespace doesn’t •  Limited number of files, dirs and blocks -  250 million files and blocks at 64GB Namenode heap size •  Still a very large cluster •  Facebook clusters are sized at ~70 PB storage Performance •  File system operations throughput limited by a single node -  120K read ops/sec and 6000 write ops/sec •  Support 4K clusters easily •  Easily scalable to 20K write ops/sec by code improvements Page 7
  • 8. Limitation – Tight Coupling Namespace and Block Management are distinct layers •  Tightly coupled due to co-location •  Separating the layers makes it easier to evolve each layer •  Separating services -  Scaling block management independent of namespace is simpler -  Simplifies Namespace and scaling it, Block Storage could be a generic service •  Namespace is one of the applications to use the service •  Other services can be built directly on Block Storage -  HBase -  MR Tmp -  Foreign namespaces Page 8
  • 9. Stated Problem Isolation is a problem for even small clusters! Page 9
  • 10. HDFS Federation Namespace NN-1 NN-k NN-n Foreig                    NS1                    NS  k n  NS  n ... ... Pool    1 Pool    k Pool    n Block Storage                        Block    Pools Datanode  1   Datanode  2 Datanode  m ... ... ... Common  Storage •  Multiple independent Namenodes and Namespace Volumes in a cluster -  Namespace Volume = Namespace + Block Pool •  Block Storage as generic storage service -  Set of blocks for a Namespace Volume is called a Block Pool -  DNs store blocks for all the Namespace Volumes – no partitioning Page 10
  • 11. Key Ideas & Benefits •  Distributed Namespace: Partitioned across namenodes Alternate NN -  Simple and Robust due to independent Implementation HBase HDFS masters Namespace MR tmp •  Each master serves a namespace volume •  Preserves namenode stability – little namenode code change -  Scalability – 6K nodes, 100K tasks, 200PB and Storage Service 1 billion files Page 11
  • 12. Key Ideas & Benefits •  Block Pools enable generic storage service -  Enables Namespace Volumes to be Alternate NN independent of each other Implementation HBase HDFS -  Fuels innovation and Rapid development Namespace MR tmp •  New implementations of file systems and Applications on top of block storage possible •  New block pool categories – tmp storage, Storage Service distributed cache, small object storage •  In future, move Block Management out of namenode to separate set of nodes -  Simplifies namespace/application implementation •  Distributed namenode becomes significantly simpler Page 12
  • 13. HDFS Federation Details •  Simple design -  Little change to the Namenode, most changes in Datanode, Config and Tools -  Core development in 4 months -  Namespace and Block Management remain in Namenode •  Block Management could be moved out of namenode in the future •  Little impact on existing deployments -  Single namenode configuration runs as is •  Datanodes provide storage services for all the namenodes -  Register with all the namenodes -  Send periodic heartbeats and block reports to all the namenodes -  Send block received/deleted for a block pool to corresponding namenode Page 13
  • 14. HDFS Federation Details •  Cluster Web UI for better manageability -  Provides cluster summary -  Includes namenode list and summary of namenode status -  Decommissioning status •  Tools -  Decommissioning works with multiple namespace -  Balancer works with multiple namespaces •  Both Datanode storage or Block Pool storage can be balanced •  Namenode can be added/deleted in Federated cluster -  No need to restart the cluster •  Single configuration for all the nodes in the cluster Page 14
  • 15. Managing Namespaces •  Federation has multiple namespaces Client-side / mount-table -  Don’t you need a single global namespace? -  Some tenants want private namespace -  Global? Key is to share the data and the names used to access the data data project home tmp •  A single global namespace is one way share NS4 NS1 NS2 NS3 Page 15
  • 16. Managing Namespaces •  Client-side mount table is another / Client-side way to share mount-table -  Shared mount-table => “global” shared view -  Personalized mount-table => per- data project home tmp application view •  Share the data that matter by mounting it •  Client-side implementation of NS4 mount tables -  No single point of failure -  No hotspot for root and top level NS1 NS2 NS3 directories Page 16
  • 17. Next Steps •  Complete separation of namespace and block management layers -  Block storage as generic service •  Partial namespace in memory for further scalability •  Move partial namespace from one namenode to another -  Namespace operation - no data copy Page 17
  • 18. Next Steps •  Namenode as a container for namespaces -  Lots of small namespace volumes … •  Chosen per user/tenant/data feed Namenodes •  Mount tables for unified namespace -  Can be managed by a central volume server Datanode …   Datanode -  Move namespace from one container to another for balancing •  Combined with partial namespace -  Choose number of namenodes to match •  Sum of (Namespace working set) •  Sum of (Namespace throughput) Page 18
  • 19. Thank You More information 1.  HDFS-1052: HDFS Scalability with multiple namenodes 2.  Hadoop – 7426: user guide for how to use viewfs with federation 3.  An Introduction to HDFS Federation – https://hortonworks.com/an-introduction-to-hdfs-federation/ Page 19
  • 20. Other Resources • Next webinar: Improve Hive and HBase Integration –  May 2, 2012 @ 10am PST –  Register now :http://hortonworks.com/webinars/ • Hadoop Summit – June 13-14 – San Jose, California – www.Hadoopsummit.org • Hadoop Training and Certification – Developing Solutions Using Apache Hadoop – Administering Apache Hadoop – http://hortonworks.com/training/ © Hortonworks Inc. 2012 Page 20
  • 21. Backup slides Page 21
  • 22. HDFS Federation Across Clusters / Application / Application mount-table mount-table in Cluster 2 in Cluster 1 home tmp home tmp data project data project Cluster 2 Cluster 1 Page 22