SlideShare une entreprise Scribd logo
1  sur  21
www.protechskills.com
Protechskills
HDFS
Hadoop Distributed File System
www.protechskills.com
Protechskills
Topics Covered
 Design Goals
 Hadoop Blocks
 Rack Awareness, Replica Placement & Selection
 Permissions Model
 Anatomy of a File Write / Read on HDFS
 FileSystem Image and Edit Logs
 HDFS Check Pointing Process
 Directory Structure - NameNode, Secondary NameNode , DataNode
 Safe Mode
www.protechskills.com
Protechskills
HDFS Design Goals
 Hardware Failure - Detection of faults and quick, automatic recovery
 Streaming Data Access - High throughput of data access (Batch Processing of data)
 Large Data Sets - Gigabytes to terabytes in size.
 Simple Coherency Model – Write once read many access model for files
 Moving computation is cheaper than moving data
www.protechskills.com
Protechskills
NameNode
Datanode_1 Datanode_2 Datanode_3
HDFS
Block 1
HDFS
Block 2
HDFS
Block 3 Block 4
Storage & Replication of Blocks in HDFS
Filedividedintoblocks
Block 1
Block 2
Block 3
Block 4
www.protechskills.com
Protechskills
Blocks
 Minimum amount of data that can be read or write - 64 MB by default
 Minimize the cost of seeks
 A file can be larger than any single disk in the network
 Simplifies the storage subsystem – Same size & eliminating metadata concerns
 Provides fault tolerance and availability
www.protechskills.com
Protechskills
Hadoop Rack Awareness
 Get maximum performance out of Hadoop
cluster
 To provide reliable storage when dealing with
DataNode failure issues.
 Resolution of the slave's DNS name (also IP
address) to a rack id.
 Interface provided in Hadoop
DNSToSwitchMapping, Default implementation
is ScriptBasedMapping
Rack Topology - /rack1 & /rack2
www.protechskills.com
Protechskills
HADOOP_CONF={${HADOOP_HOME}/conf
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/topology.data
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/default/rack "
else
echo -n "$result "
fi
done
Sample Script
192.168.56.101 /dc1/rack1
192.168.56.102 /dc1/rack2
192.168.56.103 /dc1/rack2
Sample Data (topology.data)
Rack Awareness Configuration
File : hdfs-site.xml
Property : topology.node.switch.mapping.impl
Default Value : ScriptBasedMapping class
Property : topology.script.file.name
Value : <Absolute path to script file>
Sample Command : ./topology.sh 192.168.56.101
Output : /dc1/rack1
Ref Hadoop Wiki
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
www.protechskills.com
Protechskills
Replica Placement
 Critical to HDFS reliability and
performance
 Improve data reliability,
availability, and network
bandwidth utilization
Distance b/w Hadoop
Client and DataNode
Same Node : d=0
Same Rack : d=2
Same Data Centre
different rack
: d=4
Different Data Centre : d=6
www.protechskills.com
Protechskills
Replica Placement cont..
Default Strategy :
a) First replica on the same node as the client.
b) Second replica is placed on a different rack from the first (off-rack) chosen at random
c) Third replica is placed on the same rack as the second, but on a different node chosen at random.
d) Further replicas are placed on random nodes on the cluster
Replica Selection - HDFS tries to satisfy a read request from a replica that is closest to the
reader.
www.protechskills.com
Protechskills
Permissions Model
 Directory or file flag, permissions, replication, owner, group, file size, modification date,
and full path.
 User Name : ‘whoami’
 Group List : ‘bash -c groups’
 Super-User & Web Server
www.protechskills.com
Protechskills
Anatomy of a File Write
 Client creates the file by calling create()
method
 NameNode validates & processes the
create request
 Spilt file into packets (DataQueue)
 DataStreamer asks NameNode for
block / node mapping & pipelines are
created among nodes.
www.protechskills.com
Protechskills
Anatomy of a File Write (cont..)
 DataStreamer streams the packets to
the first DataNode
 DataNode forwards the copied packet
to the next DataNode in the pipeline
 DFSOutputStream also maintains the
ack queue and removes the packets
after receiving acknowledgement from
the DataNodes
 Client calls close() on the stream
www.protechskills.com
Protechskills
Anatomy of a File Read
 Client calls open() on the FileSystem
object
 DistributedFileSystem calls the
NameNode to determine the locations of
the blocks
 NameNode validates request & for each
block returns the list of DataNodes.
 DistributedFileSystem returns an input
stream that supports file seeks to the
client
www.protechskills.com
Protechskills
Anatomy of a File Read (cont..)
 Client calls read() on the stream
 When the end of the block is reached,
DFSInputStream will close the connection
to the DataNode, then find the best
DataNode for the next block.
 Client calls close() on the stream
www.protechskills.com
Protechskills
FileSystem Image and Edit Logs
 fsimage file is a persistent checkpoint of the file-system metadata
 When a client performs a write operation, it is first recorded in the edit log.
 The NameNode also has an in-memory representation of the files-ystem
metadata, which it updates after the edit log has been modified
 Secondary NameNode is used to produce checkpoints of the primary’s in-
memory files-ystem metadata
www.protechskills.com
Protechskills
Check Pointing Process
 Secondary NameNode asks the primary to roll its edits file,
so new edits go to a new file.
 NameNode sends the fsimage and edits (using HTTP GET).
 Secondary NameNode loads fsimage into memory, applies
each operation from edits, then creates a new consolidated
fsimage file.
 Secondary NameNode sends the new fsimage back to the
primary (using HTTP POST).
 Primary replaces the old fsimage with the new one.
Updates the fstime file to record the time for checkpoint.
www.protechskills.com
Protechskills
Directory Structure
NameNode (On NameNode only)
${dfs.name.dir}/current/VERSION
/edits
/fsimage
/fstime
Secondary NameNode (On SecNameNode Only)
${fs.checkpoint.dir}/current/VERSION
/edits
/fsimage
/fstime
/previous.checkpoint/VERSION
/edits
/fsimage
/fstime
DataNode (On all DataNodes)
${dfs.data.dir}/current/VERSION
/blk_<id_1>
/blk_<id_1>.meta
/blk_<id_2>
/blk_<id_2>.meta
/...
/blk_<id_64>
/blk_<id_64>.meta
/subdir0/
/subdir1/
/...
Block Count for a directory
dfs.datanode.numblocks property
www.protechskills.com
Protechskills
Safe Mode
 On start-up, NameNode loads its image file (fsimage) into memory and applies the edits from the edit
log (edits).
 It does the check pointing process itself. without recourse to the Secondary NameNode.
 Namenode is running in safe mode (offers only a read-only view to clients)
 The locations of blocks in the system are not persisted by the NameNode - this information resides with
the DataNodes, in the form of a list of the blocks it is storing.
 Safe mode is needed to give the DataNodes time to check in to the NameNode with their block lists
 Safe mode is exited when the minimal replication condition is reached, plus an extension time of 30
seconds.
www.protechskills.com
Protechskills
Safe Mode Properties
Property Type Default Value Description
dfs.safemode.threshold.pct float 0.999
The proportion of blocks in the system that must meet the minimum
replication level defined by dfs.replication.min before the namenode will exit
safe mode. Setting this value to 0 or less forces the namenode not to start in
safe mode. Setting this value to more than 1 means the namenode never exits
safe mode.
dfs.safemode.extension int 30, 000
The time, in milliseconds, to extend safe mode by after the minimum
replication condition defined by dfs.safemode.threshold.pct has been
satisfied. For small clusters (tens of nodes), it can be set to 0.
Syntax : hadoop dfsadmin –safemode [Options]
Options : get / wait / enter / leave
Safe Mode
Commands Options
www.protechskills.com
Protechskills
References
 Hadoop Apache Website – http://hadoop.apache.org/
 Hadoop The Definitive Guide 3rd Edition By Oreilly
www.protechskills.com
Protechskills

Contenu connexe

Tendances

Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDr. C.V. Suresh Babu
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsnehabsairam
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1sakthyvel3
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 

Tendances (20)

Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Data mining
Data miningData mining
Data mining
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
RAID
RAIDRAID
RAID
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Filepermissions in linux
Filepermissions in linuxFilepermissions in linux
Filepermissions in linux
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 

Similaire à Hadoop HDFS Concepts

Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiUnmesh Baile
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiUnmesh Baile
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemsrikanthhadoop
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 

Similaire à Hadoop HDFS Concepts (20)

Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
DFSNov1.pptx
DFSNov1.pptxDFSNov1.pptx
DFSNov1.pptx
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 

Dernier

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Dernier (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Hadoop HDFS Concepts

  • 2. www.protechskills.com Protechskills Topics Covered  Design Goals  Hadoop Blocks  Rack Awareness, Replica Placement & Selection  Permissions Model  Anatomy of a File Write / Read on HDFS  FileSystem Image and Edit Logs  HDFS Check Pointing Process  Directory Structure - NameNode, Secondary NameNode , DataNode  Safe Mode
  • 3. www.protechskills.com Protechskills HDFS Design Goals  Hardware Failure - Detection of faults and quick, automatic recovery  Streaming Data Access - High throughput of data access (Batch Processing of data)  Large Data Sets - Gigabytes to terabytes in size.  Simple Coherency Model – Write once read many access model for files  Moving computation is cheaper than moving data
  • 4. www.protechskills.com Protechskills NameNode Datanode_1 Datanode_2 Datanode_3 HDFS Block 1 HDFS Block 2 HDFS Block 3 Block 4 Storage & Replication of Blocks in HDFS Filedividedintoblocks Block 1 Block 2 Block 3 Block 4
  • 5. www.protechskills.com Protechskills Blocks  Minimum amount of data that can be read or write - 64 MB by default  Minimize the cost of seeks  A file can be larger than any single disk in the network  Simplifies the storage subsystem – Same size & eliminating metadata concerns  Provides fault tolerance and availability
  • 6. www.protechskills.com Protechskills Hadoop Rack Awareness  Get maximum performance out of Hadoop cluster  To provide reliable storage when dealing with DataNode failure issues.  Resolution of the slave's DNS name (also IP address) to a rack id.  Interface provided in Hadoop DNSToSwitchMapping, Default implementation is ScriptBasedMapping Rack Topology - /rack1 & /rack2
  • 7. www.protechskills.com Protechskills HADOOP_CONF={${HADOOP_HOME}/conf while [ $# -gt 0 ] ; do nodeArg=$1 exec< ${HADOOP_CONF}/topology.data result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/default/rack " else echo -n "$result " fi done Sample Script 192.168.56.101 /dc1/rack1 192.168.56.102 /dc1/rack2 192.168.56.103 /dc1/rack2 Sample Data (topology.data) Rack Awareness Configuration File : hdfs-site.xml Property : topology.node.switch.mapping.impl Default Value : ScriptBasedMapping class Property : topology.script.file.name Value : <Absolute path to script file> Sample Command : ./topology.sh 192.168.56.101 Output : /dc1/rack1 Ref Hadoop Wiki http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
  • 8. www.protechskills.com Protechskills Replica Placement  Critical to HDFS reliability and performance  Improve data reliability, availability, and network bandwidth utilization Distance b/w Hadoop Client and DataNode Same Node : d=0 Same Rack : d=2 Same Data Centre different rack : d=4 Different Data Centre : d=6
  • 9. www.protechskills.com Protechskills Replica Placement cont.. Default Strategy : a) First replica on the same node as the client. b) Second replica is placed on a different rack from the first (off-rack) chosen at random c) Third replica is placed on the same rack as the second, but on a different node chosen at random. d) Further replicas are placed on random nodes on the cluster Replica Selection - HDFS tries to satisfy a read request from a replica that is closest to the reader.
  • 10. www.protechskills.com Protechskills Permissions Model  Directory or file flag, permissions, replication, owner, group, file size, modification date, and full path.  User Name : ‘whoami’  Group List : ‘bash -c groups’  Super-User & Web Server
  • 11. www.protechskills.com Protechskills Anatomy of a File Write  Client creates the file by calling create() method  NameNode validates & processes the create request  Spilt file into packets (DataQueue)  DataStreamer asks NameNode for block / node mapping & pipelines are created among nodes.
  • 12. www.protechskills.com Protechskills Anatomy of a File Write (cont..)  DataStreamer streams the packets to the first DataNode  DataNode forwards the copied packet to the next DataNode in the pipeline  DFSOutputStream also maintains the ack queue and removes the packets after receiving acknowledgement from the DataNodes  Client calls close() on the stream
  • 13. www.protechskills.com Protechskills Anatomy of a File Read  Client calls open() on the FileSystem object  DistributedFileSystem calls the NameNode to determine the locations of the blocks  NameNode validates request & for each block returns the list of DataNodes.  DistributedFileSystem returns an input stream that supports file seeks to the client
  • 14. www.protechskills.com Protechskills Anatomy of a File Read (cont..)  Client calls read() on the stream  When the end of the block is reached, DFSInputStream will close the connection to the DataNode, then find the best DataNode for the next block.  Client calls close() on the stream
  • 15. www.protechskills.com Protechskills FileSystem Image and Edit Logs  fsimage file is a persistent checkpoint of the file-system metadata  When a client performs a write operation, it is first recorded in the edit log.  The NameNode also has an in-memory representation of the files-ystem metadata, which it updates after the edit log has been modified  Secondary NameNode is used to produce checkpoints of the primary’s in- memory files-ystem metadata
  • 16. www.protechskills.com Protechskills Check Pointing Process  Secondary NameNode asks the primary to roll its edits file, so new edits go to a new file.  NameNode sends the fsimage and edits (using HTTP GET).  Secondary NameNode loads fsimage into memory, applies each operation from edits, then creates a new consolidated fsimage file.  Secondary NameNode sends the new fsimage back to the primary (using HTTP POST).  Primary replaces the old fsimage with the new one. Updates the fstime file to record the time for checkpoint.
  • 17. www.protechskills.com Protechskills Directory Structure NameNode (On NameNode only) ${dfs.name.dir}/current/VERSION /edits /fsimage /fstime Secondary NameNode (On SecNameNode Only) ${fs.checkpoint.dir}/current/VERSION /edits /fsimage /fstime /previous.checkpoint/VERSION /edits /fsimage /fstime DataNode (On all DataNodes) ${dfs.data.dir}/current/VERSION /blk_<id_1> /blk_<id_1>.meta /blk_<id_2> /blk_<id_2>.meta /... /blk_<id_64> /blk_<id_64>.meta /subdir0/ /subdir1/ /... Block Count for a directory dfs.datanode.numblocks property
  • 18. www.protechskills.com Protechskills Safe Mode  On start-up, NameNode loads its image file (fsimage) into memory and applies the edits from the edit log (edits).  It does the check pointing process itself. without recourse to the Secondary NameNode.  Namenode is running in safe mode (offers only a read-only view to clients)  The locations of blocks in the system are not persisted by the NameNode - this information resides with the DataNodes, in the form of a list of the blocks it is storing.  Safe mode is needed to give the DataNodes time to check in to the NameNode with their block lists  Safe mode is exited when the minimal replication condition is reached, plus an extension time of 30 seconds.
  • 19. www.protechskills.com Protechskills Safe Mode Properties Property Type Default Value Description dfs.safemode.threshold.pct float 0.999 The proportion of blocks in the system that must meet the minimum replication level defined by dfs.replication.min before the namenode will exit safe mode. Setting this value to 0 or less forces the namenode not to start in safe mode. Setting this value to more than 1 means the namenode never exits safe mode. dfs.safemode.extension int 30, 000 The time, in milliseconds, to extend safe mode by after the minimum replication condition defined by dfs.safemode.threshold.pct has been satisfied. For small clusters (tens of nodes), it can be set to 0. Syntax : hadoop dfsadmin –safemode [Options] Options : get / wait / enter / leave Safe Mode Commands Options
  • 20. www.protechskills.com Protechskills References  Hadoop Apache Website – http://hadoop.apache.org/  Hadoop The Definitive Guide 3rd Edition By Oreilly