SlideShare une entreprise Scribd logo
1  sur  14
HDFS Federation andOther New Features Suresh Srinivas Founder & Architect – Hortonworks Yahoo Hadoop Team (3 years) @suresh_m_s Sanjay Radia Founder & Architect – Hortonworks Yahoo Hadoop Team (4 years) @srr © Hortonworks Inc. 2011
Agenda HDFS Architecture Overview Current Limitations HDFS Federation Other HDFS Features in Release 0.23 Futures © Hortonworks Inc. 2011 2
Two main layers Namespace Consists of dirs, files and blocks Supports create, delete, modify and list files or dirs operations Block Storage Block Management Datanode cluster membership Supports create/delete/modify/get block location operations Manages replication and replica placement Storage - provides read and write access to blocks Implemented as ,[object Object],Namespace Volume = Namespace + Blocks ,[object Object],Entire namespace is in memory Provides Block Management Datanodes store block replicas Block files stored on local file system on the nodes HDFS Architecture © Hortonworks Inc. 2011 3 NS Namenode Namespace Block Management Storage Block Storage Datanode Datanode …
Limitation - Single Namenode © Hortonworks Inc. 2011 4 Scalability ,[object Object]
Limited number of files, dirs and blocks
250 million files and blocks at 64GB Namenode heap sizePerformance ,[object Object],Poor Isolation ,[object Object]
Separate volume for tenants is not possible
Lacks separate namespace for different categories of applications
Experimental apps can affect production apps
Example - HBase could use its own namespace,[object Object]
© Hortonworks Inc. 2011 6 Isolation is a problem for even small clusters
HDFS Federation © Hortonworks Inc. 2011 7 ... ... NN-n NN-k NN-1 Foreign NS n Namespace           NS1           NS k ... ... Pool  k Pool  n Pool  1             Block  Pools Balancer Block Storage Datanode 2 Datanode m Datanode 1 ... Common Storage Multiple independent Namenodes and Namespace Volumes in a cluster Namespace Volume = Namespace + Block Pool Block Storage as generic storage service Set of blocks for a Namespace Volume is called a Block Pool DNs store blocks for all the Namespace Volumes – no partitioning
Key Ideas & Benefits Distributed Namespace: Partitioned across namenodes Simple and Robust due to independent masters Each master serves a namespace volume Preserves namenode stability - very little code change in the namenodes Scalability – 6K nodes, 100K tasks, 120PB and 1 billion files Block Pools enable generic storage service Enables Namespace Volumes to be independent of each other Fuels innovation and Rapid development New implementations of file systems  and Applications on top of block storage possible New block pool categories – tmp storage, distributed cache, small object storage In future, move Block Management out of namenode to separate set of nodes Simplifies namespace/application implementation Distributed namenode becomes significantly simpler © Hortonworks Inc. 2011 8 Alternate NN Implementation HBase HDFS Namespace MR tmp Storage Service
HDFS Federation – First Phase Details Simple design Little change in the Namenode, most change in Datanode, Config and Tools Core development in 4 months Namespace and Block Management remain in Namenode Block Management could be moved out of namenode in the future Little impact on existing deployments Single namenode configuration runs as is Cluster level Web UI for better manageability Provides cluster summary, namenode list and decommissioning status Tools Decommissioning, Balancer and other tools work with multiple namespaces © Hortonworks Inc. 2011 9

Contenu connexe

Tendances

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
elliando dias
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Anand Kulkarni
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
vmoorthy
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 

Tendances (20)

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

En vedette

En vedette (14)

Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
Hadoop 3.0 features
Hadoop 3.0 featuresHadoop 3.0 features
Hadoop 3.0 features
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
A3RT - the details and actual use cases of "Analytics & Artificial intelligen...
 
Path to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered JourneyPath to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered Journey
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
Introduction to Hadoop and Spark (before joining the other talk) and An Overv...
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and ChallengesReal-time Analytics in Financial: Use Case, Architecture and Challenges
Real-time Analytics in Financial: Use Case, Architecture and Challenges
 

Similaire à HDFS Federation++

Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Yahoo Developer Network
 

Similaire à HDFS Federation++ (20)

Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hdfs
HdfsHdfs
Hdfs
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 

Plus de Hortonworks

Plus de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

HDFS Federation++

  • 1. HDFS Federation andOther New Features Suresh Srinivas Founder & Architect – Hortonworks Yahoo Hadoop Team (3 years) @suresh_m_s Sanjay Radia Founder & Architect – Hortonworks Yahoo Hadoop Team (4 years) @srr © Hortonworks Inc. 2011
  • 2. Agenda HDFS Architecture Overview Current Limitations HDFS Federation Other HDFS Features in Release 0.23 Futures © Hortonworks Inc. 2011 2
  • 3.
  • 4.
  • 5. Limited number of files, dirs and blocks
  • 6.
  • 7. Separate volume for tenants is not possible
  • 8. Lacks separate namespace for different categories of applications
  • 9. Experimental apps can affect production apps
  • 10.
  • 11. © Hortonworks Inc. 2011 6 Isolation is a problem for even small clusters
  • 12. HDFS Federation © Hortonworks Inc. 2011 7 ... ... NN-n NN-k NN-1 Foreign NS n Namespace NS1 NS k ... ... Pool k Pool n Pool 1 Block Pools Balancer Block Storage Datanode 2 Datanode m Datanode 1 ... Common Storage Multiple independent Namenodes and Namespace Volumes in a cluster Namespace Volume = Namespace + Block Pool Block Storage as generic storage service Set of blocks for a Namespace Volume is called a Block Pool DNs store blocks for all the Namespace Volumes – no partitioning
  • 13. Key Ideas & Benefits Distributed Namespace: Partitioned across namenodes Simple and Robust due to independent masters Each master serves a namespace volume Preserves namenode stability - very little code change in the namenodes Scalability – 6K nodes, 100K tasks, 120PB and 1 billion files Block Pools enable generic storage service Enables Namespace Volumes to be independent of each other Fuels innovation and Rapid development New implementations of file systems and Applications on top of block storage possible New block pool categories – tmp storage, distributed cache, small object storage In future, move Block Management out of namenode to separate set of nodes Simplifies namespace/application implementation Distributed namenode becomes significantly simpler © Hortonworks Inc. 2011 8 Alternate NN Implementation HBase HDFS Namespace MR tmp Storage Service
  • 14. HDFS Federation – First Phase Details Simple design Little change in the Namenode, most change in Datanode, Config and Tools Core development in 4 months Namespace and Block Management remain in Namenode Block Management could be moved out of namenode in the future Little impact on existing deployments Single namenode configuration runs as is Cluster level Web UI for better manageability Provides cluster summary, namenode list and decommissioning status Tools Decommissioning, Balancer and other tools work with multiple namespaces © Hortonworks Inc. 2011 9
  • 15. Managing Namespaces Federation has multiple namespaces – Don’t you need a single global namespace? Key is to share the data and the names used to access the data A global namespace is one way to do that Client-side mount table is another way to share. Shared mount-table => “global” shared view Personalized mount-table => per-application view Share the data that matter by mounting it Client-side implementation of mount tables No single point of failure No hotspot for root and top level directories © Hortonworks Inc. 2011 10 Client-side mount-table / project home tmp data NS4 NS3 NS2 NS1
  • 16. HDFS I/O Improvements Ongoing improvements for HBase (HDFS-1599) Append, Flush and Sync improvements Allow reuse of connections to datanodes (HDFS-941) 35-65% improvement for random reads ~40% improvement in latency as seen by HBase Pure Java CRC implementation (HADOOP-6148) ~2x improvement CRC and other improvements in progress (HDFS-2080) 2.5x performance improvement for CPU bound sequential reads Improved locking for multithreaded readers (HDFS-1148) 15% improvement for HBase Improvements in threading model of readers(HDFS-918/HDFS-1323) 10-15% improvement in reads © Hortonworks Inc. 2011 11
  • 17.
  • 18. Startup time from 1 hour to 20 mins
  • 19. Upgrade from 3 hours to 25 mins
  • 20. Disk Fail In Place (HADOOP-7123)
  • 21. Currently disk failure brings down a datanode
  • 22. Loss of significant storage with high density datanode
  • 23. Changes and best practices to make datanode resilient to disk failuresOperational
  • 24. Futures HA for Namenode (HDFS-1623) Work in progress Wire compatibility (HADOOP-7347) Data transfer protocol cleanup and use Protocol Buffers (done) Protocol Buffers for RPC Continue I/Operformance improvements Lot of work already done in 0.23 Partial namespace in memory Further scale the namespace size supported by the namenode More management and operability improvements © Hortonworks Inc. 2011 13
  • 25. © Hortonworks Inc. 2011 Thank you. 14

Notes de l'éditeur

  1. Improve random or short read performanceAllow reuse of connection (HDFS-941)Speedup DFS read pathPure Java CRC implementation (HADOOP-6148)28% improvement for 32 bit JVM and 58% improvement for 64 bit JVMCRC and other improvements (HDFS-2080)Significant speedup and CPU savings – work on going