SlideShare une entreprise Scribd logo
1  sur  37
Hadoop
Distributed File System
(HDFS)
SEMINAR GUIDE
Mr. PRAMOD PAVITHRAN
HEAD OF DIVISION
COMPUTER SCIENCE & ENGINEERING
SCHOOL OF ENGINEERING, CUSAT
PRESENTED BY
VIJAY PRATAP SINGH
REG NO: 12110083
S7, CS-B
ROLL NO: 81
CONTENTS
WHAT IS HADOOP
PROJECT COMPONENTS IN HADOOP
MAP/REDUCE
HDFS
ARCHITECTURE
WRITE & READ IN HDFS
GOALS OF HADOOP
COMPARISION WITH OTHER SYSTEMS
CONCLUSION
REFERENCES
WHAT IS HADOOP ?
WHAT IS HADOOP ?
WHAT IS HADOOP ?
WHAT IS HADOOP ?
o Hadoop is an open-source software framework .
o Hadoop framework consists on two main layers
● Distributed file system (HDFS)
● Execution engine (MapReduce)
o Supports data-intensive distributed applications.
o Licensed under the Apache v2 license.
o It enables applications to work with thousands of computation-independent
computers and petabytes of data
WHY HADOOP ?
PROJECT COMPONENTS IN
HADOOP
MAP/REDUCE
o Hadoop is the popular open source implementation of map/reduce
o MapReduce is a programming model for processing large data sets
o MapReduce is typically used to do distributed computing on clusters of computers
o MapReduce can take advantage of locality of data, processing data on or near the storage assets to
decrease transmission of data.
oThe model is inspired by the map and reduce functions
o"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes
them to slave nodes. The slave node processes the smaller problem, and passes the answer back to
its master node.
o"Reduce" step: The master node then collects the answers to all the sub-problems and combines
them in some way to form the final output
MAP REDUCE ENGINE
HDFS
Highly scalable file system
◦ 6K nodes and 120PB
◦ Add commodity servers and disks to scale storage and IO bandwidth
Supports parallel reading & processing of data
◦ Optimized for streaming reads/writes of large files
◦ Bandwidth scales linearly with the number of nodes and disks
Fault tolerant & easy management
◦ Built in redundancy
◦ Tolerate disk and node failure
◦ Automatically manages addition/removal of nodes
◦ One operator per 3K nodes
Scalable, Reliable & Manageable
LIMITATIONS OF EXISTING DATA
ANALYTICS ARCHITECTURE
BIG DATA
INCREASING BIG DATA
HADOOP'S APPROACH
HADOOP'S APPROACH
HADOOP'S APPROACH
ARCHITECTURE OF HADOOP
HADOOP MASTER/SLAVE
ARCHITECTURE
ARCHITECTURE OF HDFS
ARCHITECTURE OF HDFS
CLIENT INTERACTION TO
HADOOP
HDFS WRITE
Client
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Rack 1
Core Switch
Switch SwitchF
DataNode 1
DataNode 9
DataNode 7
Rack 5
BA C
Name Node
I want to
write file.txt
Block A
OK, Write to
DataNode
[1,7,9]
Ready DN 7,9
Ready DN 9
Ready
PIPELINED WRITE
Client
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Rack 1
Core Switch
Switch SwitchF
DataNode 1
DataNode 9
DataNode 7
Rack 5
BA C
Name Node
A A
A
PIPELINED WRITE
Client
Rack Awareness
Rack 1:DN 1
Rack 2:DN7,9
Rack 1
Core Switch
Switch SwitchF
DataNode 1
DataNode 9
DataNode 7
Rack 5
BA C
Name Node
A A
A
Block Received
Success
MetaData
File.txt =
Block:
DN: 1,7,9
A
HDFS READ
Client
Rack 1
Core Switch
Switch Switch
DataNode 1
DataNode 9
DataNode 7
Rack 5
Name Node
I want to
Read file.txt
Block A
Available at
DataNode
[1,7,9]
A A
A
MetaData
File.txt =
Block:
DN: 1,7,9
A
HDFS SHELL COMMANDS
● bin/hadoop fs -ls
● bin/hadoop fs -mkdir
● bin/hadoop fs -copyFromLocal
● bin/hadoop fs -copyToLocal
● bin/hadoop fs -moveToLocal
● bin/hadoop fs -rm
● bin/hadoop fs -tail
● bin/hadoop fs -chmod
● bin/hadoop fs -setrep -w 4 -R /dir1/s-dir/
GOALS OF HDFS
Very Large Distributed File System
◦10K nodes, 100 million files, 10PB
Assumes Commodity Hardware
◦Files are replicated to handle hardware failure
◦Detect failures and recover from them
Optimized for Batch Processing
◦Data locations exposed so that computations can move to where data resides
◦Provides very high aggregate bandwidth
SCALABILITY OF HADOOP
EASE TO PROGRAMMERS
HADOOP VS. OTHER SYSTEMS
HADOOP USERS
TO LEARN MORE
Source code
◦http://hadoop.apache.org/version_control.html
◦http://svn.apache.org/viewvc/hadoop/common/trunk/
Hadoop releases
◦http://hadoop.apache.org/releases.html
Contribute to it
◦http://wiki.apache.org/hadoop/HowToContribute
CONCLUSION
Hdfs provides a reliable, scalable and manageable solution for
working with huge amounts of data
Future secure
Hdfs has been deployed in clusters of 10 to 4k datanodes
◦Used in production at companies such as yahoo! , FB , Twitter , ebay
◦Many enterprises including financial companies use hadoop
REFERENCES
[1] M. Zukowski, S. Heman, N. Nes, And P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing In A DBMS. In
VLDB ’07: Proceedings Of The 33rd International Conference On Very Large Data Bases, Pages 23–34, 2007.
[2] Tom White, Hadoop The Definite Guide, O’reilly Media ,Third Edition, May 2012
[3] Jeffrey Shafer, Scott Rixner, And Alan L. Cox, The Hadoop Distributed Filesystem: Balancing Portability And
Performance, Rice University, Houston, TX
[4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System,
Yahoo, Sunnyvale, California, USA
[5] Jens Dittrich, Jorge-arnulfo Quian, E-ruiz, Information Systems Group, Efficient Big Data Processing In
Hadoop Mapreduce , Saarland University
Thankyou.
Queries

Contenu connexe

Tendances

Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
Vasil Remeniuk
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
 

Tendances (20)

Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 
Lecture 2 part 2
Lecture 2 part 2Lecture 2 part 2
Lecture 2 part 2
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflows
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
 
Hadoop
HadoopHadoop
Hadoop
 
Containerized Storage
Containerized StorageContainerized Storage
Containerized Storage
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 

Similaire à HDFS presented by VIJAY

Similaire à HDFS presented by VIJAY (20)

Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

HDFS presented by VIJAY

  • 1. Hadoop Distributed File System (HDFS) SEMINAR GUIDE Mr. PRAMOD PAVITHRAN HEAD OF DIVISION COMPUTER SCIENCE & ENGINEERING SCHOOL OF ENGINEERING, CUSAT PRESENTED BY VIJAY PRATAP SINGH REG NO: 12110083 S7, CS-B ROLL NO: 81
  • 2. CONTENTS WHAT IS HADOOP PROJECT COMPONENTS IN HADOOP MAP/REDUCE HDFS ARCHITECTURE WRITE & READ IN HDFS GOALS OF HADOOP COMPARISION WITH OTHER SYSTEMS CONCLUSION REFERENCES
  • 6. WHAT IS HADOOP ? o Hadoop is an open-source software framework . o Hadoop framework consists on two main layers ● Distributed file system (HDFS) ● Execution engine (MapReduce) o Supports data-intensive distributed applications. o Licensed under the Apache v2 license. o It enables applications to work with thousands of computation-independent computers and petabytes of data
  • 9. MAP/REDUCE o Hadoop is the popular open source implementation of map/reduce o MapReduce is a programming model for processing large data sets o MapReduce is typically used to do distributed computing on clusters of computers o MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. oThe model is inspired by the map and reduce functions o"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to slave nodes. The slave node processes the smaller problem, and passes the answer back to its master node. o"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the final output
  • 11. HDFS Highly scalable file system ◦ 6K nodes and 120PB ◦ Add commodity servers and disks to scale storage and IO bandwidth Supports parallel reading & processing of data ◦ Optimized for streaming reads/writes of large files ◦ Bandwidth scales linearly with the number of nodes and disks Fault tolerant & easy management ◦ Built in redundancy ◦ Tolerate disk and node failure ◦ Automatically manages addition/removal of nodes ◦ One operator per 3K nodes Scalable, Reliable & Manageable
  • 12. LIMITATIONS OF EXISTING DATA ANALYTICS ARCHITECTURE
  • 23. HDFS WRITE Client Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Rack 1 Core Switch Switch SwitchF DataNode 1 DataNode 9 DataNode 7 Rack 5 BA C Name Node I want to write file.txt Block A OK, Write to DataNode [1,7,9] Ready DN 7,9 Ready DN 9 Ready
  • 24. PIPELINED WRITE Client Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Rack 1 Core Switch Switch SwitchF DataNode 1 DataNode 9 DataNode 7 Rack 5 BA C Name Node A A A
  • 25. PIPELINED WRITE Client Rack Awareness Rack 1:DN 1 Rack 2:DN7,9 Rack 1 Core Switch Switch SwitchF DataNode 1 DataNode 9 DataNode 7 Rack 5 BA C Name Node A A A Block Received Success MetaData File.txt = Block: DN: 1,7,9 A
  • 26. HDFS READ Client Rack 1 Core Switch Switch Switch DataNode 1 DataNode 9 DataNode 7 Rack 5 Name Node I want to Read file.txt Block A Available at DataNode [1,7,9] A A A MetaData File.txt = Block: DN: 1,7,9 A
  • 27. HDFS SHELL COMMANDS ● bin/hadoop fs -ls ● bin/hadoop fs -mkdir ● bin/hadoop fs -copyFromLocal ● bin/hadoop fs -copyToLocal ● bin/hadoop fs -moveToLocal ● bin/hadoop fs -rm ● bin/hadoop fs -tail ● bin/hadoop fs -chmod ● bin/hadoop fs -setrep -w 4 -R /dir1/s-dir/
  • 28. GOALS OF HDFS Very Large Distributed File System ◦10K nodes, 100 million files, 10PB Assumes Commodity Hardware ◦Files are replicated to handle hardware failure ◦Detect failures and recover from them Optimized for Batch Processing ◦Data locations exposed so that computations can move to where data resides ◦Provides very high aggregate bandwidth
  • 31. HADOOP VS. OTHER SYSTEMS
  • 33. TO LEARN MORE Source code ◦http://hadoop.apache.org/version_control.html ◦http://svn.apache.org/viewvc/hadoop/common/trunk/ Hadoop releases ◦http://hadoop.apache.org/releases.html Contribute to it ◦http://wiki.apache.org/hadoop/HowToContribute
  • 34. CONCLUSION Hdfs provides a reliable, scalable and manageable solution for working with huge amounts of data Future secure Hdfs has been deployed in clusters of 10 to 4k datanodes ◦Used in production at companies such as yahoo! , FB , Twitter , ebay ◦Many enterprises including financial companies use hadoop
  • 35. REFERENCES [1] M. Zukowski, S. Heman, N. Nes, And P. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing In A DBMS. In VLDB ’07: Proceedings Of The 33rd International Conference On Very Large Data Bases, Pages 23–34, 2007. [2] Tom White, Hadoop The Definite Guide, O’reilly Media ,Third Edition, May 2012 [3] Jeffrey Shafer, Scott Rixner, And Alan L. Cox, The Hadoop Distributed Filesystem: Balancing Portability And Performance, Rice University, Houston, TX [4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, Yahoo, Sunnyvale, California, USA [5] Jens Dittrich, Jorge-arnulfo Quian, E-ruiz, Information Systems Group, Efficient Big Data Processing In Hadoop Mapreduce , Saarland University