SlideShare une entreprise Scribd logo
1  sur  25
www.protechskills.com
HDFS
Hadoop Distributed File System
www.protechskills.com
Topics Covered
 Design Goals
 Hadoop Blocks
 Rack Awareness, Replica Placement & Selection
 Permissions Model
 Anatomy of a File Write / Read on HDFS
 FileSystem Image and Edit Logs
 HDFS Check Pointing Process
 Directory Structure - NameNode, Secondary NameNode , DataNode
 Safe Mode, Trash, Name and Space Quota
www.protechskills.com
HDFS Design Goals
 Hardware Failure - Detection of faults and quick, automatic recovery
 Streaming Data Access - High throughput of data access (Batch Processing of data)
 Large Data Sets - Gigabytes to terabytes in size.
 Simple Coherency Model – Write once read many access model for files
 Moving computation is cheaper than moving data
www.protechskills.com
NameNode
Datanode_1 Datanode_2 Datanode_3
HDFS
Block 1
HDFS
Block 2
HDFS
Block 3 Block 4
Storage & Replication of Blocks in HDFS
Filedividedintoblocks
Block 1
Block 2
Block 3
Block 4
www.protechskills.com
Blocks
 Minimum amount of data that can be read or write - 64 MB by default
 Minimize the cost of seeks
 A file can be larger than any single disk in the network
 Simplifies the storage subsystem – Same size & eliminating metadata concerns
 Provides fault tolerance and availability
www.protechskills.com
Hadoop Rack Awareness
 Get maximum performance out of Hadoop
cluster
 To provide reliable storage when dealing with
DataNode failure issues.
 Resolution of the slave's DNS name (also IP
address) to a rack id.
 Interface provided in Hadoop
DNSToSwitchMapping, Default implementation
is ScriptBasedMapping
Rack Topology - /rack1 & /rack2
www.protechskills.com
HADOOP_CONF={${HADOOP_HOME}/conf
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/topology.data
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/default/rack "
else
echo -n "$result "
fi
done
Sample Script
192.168.56.101 /dc1/rack1
192.168.56.102 /dc1/rack2
192.168.56.103 /dc1/rack2
Sample Data (topology.data)
Rack Awareness Configuration
File : hdfs-site.xml
Property : topology.node.switch.mapping.impl
Default Value : ScriptBasedMapping class
Property : topology.script.file.name
Value : <Absolute path to script file>
Sample Command : ./topology.sh 192.168.56.101
Output : /dc1/rack1
Ref Hadoop Wiki
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
www.protechskills.com
Replica Placement
 Critical to HDFS reliability and
performance
 Improve data reliability,
availability, and network
bandwidth utilization
Distance b/w Hadoop
Client and DataNode
Same Node : d=0
Same Rack : d=2
Same Data Centre
different rack
: d=4
Different Data Centre : d=6
www.protechskills.com
Replica Placement cont..
Default Strategy :
a) First replica on the same node as the client.
b) Second replica is placed on a different rack from the first (off-rack) chosen at random
c) Third replica is placed on the same rack as the second, but on a different node chosen at random.
d) Further replicas are placed on random nodes on the cluster
Replica Selection - HDFS tries to satisfy a read request from a replica that is closest to the
reader.
www.protechskills.com
Permissions Model
 Directory or file flag, permissions, replication, owner, group, file size, modification date,
and full path.
 User Name : ‘whoami’
 Group List : ‘bash -c groups’
 Super-User & Web Server
www.protechskills.com
Anatomy of a File Write
 Client creates the file by calling create()
method
 NameNode validates & processes the
create request
 Spilt file into packets (DataQueue)
 DataStreamer asks NameNode for
block / node mapping & pipelines are
created among nodes.
www.protechskills.com
Anatomy of a File Write (cont..)
 DataStreamer streams the packets to
the first DataNode
 DataNode forwards the copied packet
to the next DataNode in the pipeline
 DFSOutputStream also maintains the
ack queue and removes the packets
after receiving acknowledgement from
the DataNodes
 Client calls close() on the stream
www.protechskills.com
Anatomy of a File Read
 Client calls open() on the FileSystem
object
 DistributedFileSystem calls the
NameNode to determine the locations of
the blocks
 NameNode validates request & for each
block returns the list of DataNodes.
 DistributedFileSystem returns an input
stream that supports file seeks to the
client
www.protechskills.com
Anatomy of a File Read (cont..)
 Client calls read() on the stream
 When the end of the block is reached,
DFSInputStream will close the connection
to the DataNode, then find the best
DataNode for the next block.
 Client calls close() on the stream
www.protechskills.com
FileSystem Image and Edit Logs
 fsimage file is a persistent checkpoint of the file-system metadata
 When a client performs a write operation, it is first recorded in the edit log.
 The NameNode also has an in-memory representation of the files-ystem
metadata, which it updates after the edit log has been modified
 Secondary NameNode is used to produce checkpoints of the primary’s in-
memory files-ystem metadata
www.protechskills.com
Check Pointing Process
 Secondary NameNode asks the primary to roll its edits file,
so new edits go to a new file.
 NameNode sends the fsimage and edits (using HTTP GET).
 Secondary NameNode loads fsimage into memory, applies
each operation from edits, then creates a new consolidated
fsimage file.
 Secondary NameNode sends the new fsimage back to the
primary (using HTTP POST).
 Primary replaces the old fsimage with the new one.
Updates the fstime file to record the time for checkpoint.
www.protechskills.com
Directory Structure
NameNode (On NameNode only)
${dfs.name.dir}/current/VERSION
/edits
/fsimage
/fstime
Secondary NameNode (On SecNameNode Only)
${fs.checkpoint.dir}/current/VERSION
/edits
/fsimage
/fstime
/previous.checkpoint/VERSION
/edits
/fsimage
/fstime
DataNode (On all DataNodes)
${dfs.data.dir}/current/VERSION
/blk_<id_1>
/blk_<id_1>.meta
/blk_<id_2>
/blk_<id_2>.meta
/...
/blk_<id_64>
/blk_<id_64>.meta
/subdir0/
/subdir1/
/...
Block Count for a directory
dfs.datanode.numblocks property
www.protechskills.com
Safe Mode
 On start-up, NameNode loads its image file (fsimage) into memory and applies the edits from the edit
log (edits).
 It does the check pointing process itself. without recourse to the Secondary NameNode.
 Namenode is running in safe mode (offers only a read-only view to clients)
 The locations of blocks in the system are not persisted by the NameNode - this information resides with
the DataNodes, in the form of a list of the blocks it is storing.
 Safe mode is needed to give the DataNodes time to check in to the NameNode with their block lists
 Safe mode is exited when the minimal replication condition is reached, plus an extension time of 30
seconds.
www.protechskills.com
HDFS Safe Mode
SafeMode Properties – Configure safe mode as per cluster recommendations.
SafeMode Admin Commands – A command option for dfsadmin command: hadoop dfsadmin -safemode
dfs.replication.min Default - 1 Minimum no. of replicas for file wrtie success
dfs.safemode.threshold.pct Default – 0.999 Min. portion of blocks statisfying the minimum replication
to leave safe mode. Value 0 – Forces NN not to start in
safemode Value 1 – Ensurs NN never leave safemode
dfs.safemode.extension Default – 30,000 Extension time in milliseconds before NN leaves safemode
get : Get the NameNode safemode status enter : NameNode enters safemode
Wait : Wait for NameNode to exit safemode leave : NameNode leaves the safemode
www.protechskills.com
HDFS Trash – Recycle Bin
When a file is deleted by a user, it is not immediately removed from HDFS. HDFS moves it to a file in the /trash directory.
A file remains in /trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the
file from the HDFS namespace.
Undelete a file: User needs to navigate the /trash directory and retrieve the file by using mv command.
File : core-site.xml
Property : fs.trash.interval
Description : Number of minutes after which the checkpoint gets deleted.
File : core-site.xml
Property : fs.trash.checkpoint.interval
Description : Number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval.
www.protechskills.com
HDFS Quotas
Name Quota - a hard limit on the number of file and directory names in the tree rooted at that directory.
Space Quota - a hard limit on the number of bytes used by files in the tree rooted at that directory.
Reporting Quota - count command of the HDFS shell reports quota values and the current count of names and
bytes in use. With the -q option, also report the name quota value set for each directory, the available name quota
remaining, the space quota value set, and the available space quota remaining.
fs -count -q <directory>..
dfsadmin -setQuota <N> <directory>... Set the name quota to be N for each directory.
dfsadmin -clrQuota <directory>... Remove any name quota for each directory.
dfsadmin -setSpaceQuota <N> directory>.. Set the space quota to be N bytes for each directory.
dfsadmin -clrSpaceQuota <directory>... Remove any spce quota for each directory.
www.protechskills.com
FS Shell – Some Basic Commands
chgrp - Change group association of files.
Usage: hdfs dfs -chgrp [-R] GROUP URI [URI …]
chmod - Change the permissions of files.
Usage: hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI …]
chown - Change the owner of files.
Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
du - Displays sizes of files and directories
Usage: hdfs dfs -du [-s] [-h] URI [URI …]
The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files.
The -h option will format file sizes in a "human-readable" fashion.
dus - Displays a summary of file lengths
Usage: hdfs dfs -dus <args>
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-
common/FileSystemShell.html
http://hadoop.apache.org/docs/r1.2.1/file_system_shell.html
www.protechskills.com
DfsAdmin Command
 bin/hadoop dfsadmin [Generic Options] [Command Options]
-safemode enter /
leave / get / wait
Safe mode maintenance command. Safe mode can also be entered manually, but then it can only be
turned off manually as well.
-report Reports basic filesystem information and statistics.
-refreshNodes Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the
Namenode and those that should be decommissioned or recommissioned.
-metasave filename Save Namenode's primary data structures to filename in the directory specified by hadoop.log.dir
property. filename is overwritten if it exists. filename will contain one line for each of the following
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currrently being replicated
4. Blocks waiting to be deleted
www.protechskills.com
References
 Hadoop Apache Website – http://hadoop.apache.org/
 Hadoop The Definitive Guide 3rd Edition By Oreilly
www.protechskills.com

Contenu connexe

Tendances

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Secondary storage structure
Secondary storage structureSecondary storage structure
Secondary storage structurePriya Selvaraj
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Tendances (20)

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Secondary storage structure
Secondary storage structureSecondary storage structure
Secondary storage structure
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similaire à Hadoop HDFS Concepts

Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiUnmesh Baile
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiUnmesh Baile
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolationallingeek
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxsunithachphd
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemsrikanthhadoop
 

Similaire à Hadoop HDFS Concepts (20)

Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
DFSNov1.pptx
DFSNov1.pptxDFSNov1.pptx
DFSNov1.pptx
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolation
 
Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 

Dernier

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Dernier (20)

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Hadoop HDFS Concepts

  • 2. www.protechskills.com Topics Covered  Design Goals  Hadoop Blocks  Rack Awareness, Replica Placement & Selection  Permissions Model  Anatomy of a File Write / Read on HDFS  FileSystem Image and Edit Logs  HDFS Check Pointing Process  Directory Structure - NameNode, Secondary NameNode , DataNode  Safe Mode, Trash, Name and Space Quota
  • 3. www.protechskills.com HDFS Design Goals  Hardware Failure - Detection of faults and quick, automatic recovery  Streaming Data Access - High throughput of data access (Batch Processing of data)  Large Data Sets - Gigabytes to terabytes in size.  Simple Coherency Model – Write once read many access model for files  Moving computation is cheaper than moving data
  • 4. www.protechskills.com NameNode Datanode_1 Datanode_2 Datanode_3 HDFS Block 1 HDFS Block 2 HDFS Block 3 Block 4 Storage & Replication of Blocks in HDFS Filedividedintoblocks Block 1 Block 2 Block 3 Block 4
  • 5. www.protechskills.com Blocks  Minimum amount of data that can be read or write - 64 MB by default  Minimize the cost of seeks  A file can be larger than any single disk in the network  Simplifies the storage subsystem – Same size & eliminating metadata concerns  Provides fault tolerance and availability
  • 6. www.protechskills.com Hadoop Rack Awareness  Get maximum performance out of Hadoop cluster  To provide reliable storage when dealing with DataNode failure issues.  Resolution of the slave's DNS name (also IP address) to a rack id.  Interface provided in Hadoop DNSToSwitchMapping, Default implementation is ScriptBasedMapping Rack Topology - /rack1 & /rack2
  • 7. www.protechskills.com HADOOP_CONF={${HADOOP_HOME}/conf while [ $# -gt 0 ] ; do nodeArg=$1 exec< ${HADOOP_CONF}/topology.data result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/default/rack " else echo -n "$result " fi done Sample Script 192.168.56.101 /dc1/rack1 192.168.56.102 /dc1/rack2 192.168.56.103 /dc1/rack2 Sample Data (topology.data) Rack Awareness Configuration File : hdfs-site.xml Property : topology.node.switch.mapping.impl Default Value : ScriptBasedMapping class Property : topology.script.file.name Value : <Absolute path to script file> Sample Command : ./topology.sh 192.168.56.101 Output : /dc1/rack1 Ref Hadoop Wiki http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
  • 8. www.protechskills.com Replica Placement  Critical to HDFS reliability and performance  Improve data reliability, availability, and network bandwidth utilization Distance b/w Hadoop Client and DataNode Same Node : d=0 Same Rack : d=2 Same Data Centre different rack : d=4 Different Data Centre : d=6
  • 9. www.protechskills.com Replica Placement cont.. Default Strategy : a) First replica on the same node as the client. b) Second replica is placed on a different rack from the first (off-rack) chosen at random c) Third replica is placed on the same rack as the second, but on a different node chosen at random. d) Further replicas are placed on random nodes on the cluster Replica Selection - HDFS tries to satisfy a read request from a replica that is closest to the reader.
  • 10. www.protechskills.com Permissions Model  Directory or file flag, permissions, replication, owner, group, file size, modification date, and full path.  User Name : ‘whoami’  Group List : ‘bash -c groups’  Super-User & Web Server
  • 11. www.protechskills.com Anatomy of a File Write  Client creates the file by calling create() method  NameNode validates & processes the create request  Spilt file into packets (DataQueue)  DataStreamer asks NameNode for block / node mapping & pipelines are created among nodes.
  • 12. www.protechskills.com Anatomy of a File Write (cont..)  DataStreamer streams the packets to the first DataNode  DataNode forwards the copied packet to the next DataNode in the pipeline  DFSOutputStream also maintains the ack queue and removes the packets after receiving acknowledgement from the DataNodes  Client calls close() on the stream
  • 13. www.protechskills.com Anatomy of a File Read  Client calls open() on the FileSystem object  DistributedFileSystem calls the NameNode to determine the locations of the blocks  NameNode validates request & for each block returns the list of DataNodes.  DistributedFileSystem returns an input stream that supports file seeks to the client
  • 14. www.protechskills.com Anatomy of a File Read (cont..)  Client calls read() on the stream  When the end of the block is reached, DFSInputStream will close the connection to the DataNode, then find the best DataNode for the next block.  Client calls close() on the stream
  • 15. www.protechskills.com FileSystem Image and Edit Logs  fsimage file is a persistent checkpoint of the file-system metadata  When a client performs a write operation, it is first recorded in the edit log.  The NameNode also has an in-memory representation of the files-ystem metadata, which it updates after the edit log has been modified  Secondary NameNode is used to produce checkpoints of the primary’s in- memory files-ystem metadata
  • 16. www.protechskills.com Check Pointing Process  Secondary NameNode asks the primary to roll its edits file, so new edits go to a new file.  NameNode sends the fsimage and edits (using HTTP GET).  Secondary NameNode loads fsimage into memory, applies each operation from edits, then creates a new consolidated fsimage file.  Secondary NameNode sends the new fsimage back to the primary (using HTTP POST).  Primary replaces the old fsimage with the new one. Updates the fstime file to record the time for checkpoint.
  • 17. www.protechskills.com Directory Structure NameNode (On NameNode only) ${dfs.name.dir}/current/VERSION /edits /fsimage /fstime Secondary NameNode (On SecNameNode Only) ${fs.checkpoint.dir}/current/VERSION /edits /fsimage /fstime /previous.checkpoint/VERSION /edits /fsimage /fstime DataNode (On all DataNodes) ${dfs.data.dir}/current/VERSION /blk_<id_1> /blk_<id_1>.meta /blk_<id_2> /blk_<id_2>.meta /... /blk_<id_64> /blk_<id_64>.meta /subdir0/ /subdir1/ /... Block Count for a directory dfs.datanode.numblocks property
  • 18. www.protechskills.com Safe Mode  On start-up, NameNode loads its image file (fsimage) into memory and applies the edits from the edit log (edits).  It does the check pointing process itself. without recourse to the Secondary NameNode.  Namenode is running in safe mode (offers only a read-only view to clients)  The locations of blocks in the system are not persisted by the NameNode - this information resides with the DataNodes, in the form of a list of the blocks it is storing.  Safe mode is needed to give the DataNodes time to check in to the NameNode with their block lists  Safe mode is exited when the minimal replication condition is reached, plus an extension time of 30 seconds.
  • 19. www.protechskills.com HDFS Safe Mode SafeMode Properties – Configure safe mode as per cluster recommendations. SafeMode Admin Commands – A command option for dfsadmin command: hadoop dfsadmin -safemode dfs.replication.min Default - 1 Minimum no. of replicas for file wrtie success dfs.safemode.threshold.pct Default – 0.999 Min. portion of blocks statisfying the minimum replication to leave safe mode. Value 0 – Forces NN not to start in safemode Value 1 – Ensurs NN never leave safemode dfs.safemode.extension Default – 30,000 Extension time in milliseconds before NN leaves safemode get : Get the NameNode safemode status enter : NameNode enters safemode Wait : Wait for NameNode to exit safemode leave : NameNode leaves the safemode
  • 20. www.protechskills.com HDFS Trash – Recycle Bin When a file is deleted by a user, it is not immediately removed from HDFS. HDFS moves it to a file in the /trash directory. A file remains in /trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the file from the HDFS namespace. Undelete a file: User needs to navigate the /trash directory and retrieve the file by using mv command. File : core-site.xml Property : fs.trash.interval Description : Number of minutes after which the checkpoint gets deleted. File : core-site.xml Property : fs.trash.checkpoint.interval Description : Number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval.
  • 21. www.protechskills.com HDFS Quotas Name Quota - a hard limit on the number of file and directory names in the tree rooted at that directory. Space Quota - a hard limit on the number of bytes used by files in the tree rooted at that directory. Reporting Quota - count command of the HDFS shell reports quota values and the current count of names and bytes in use. With the -q option, also report the name quota value set for each directory, the available name quota remaining, the space quota value set, and the available space quota remaining. fs -count -q <directory>.. dfsadmin -setQuota <N> <directory>... Set the name quota to be N for each directory. dfsadmin -clrQuota <directory>... Remove any name quota for each directory. dfsadmin -setSpaceQuota <N> directory>.. Set the space quota to be N bytes for each directory. dfsadmin -clrSpaceQuota <directory>... Remove any spce quota for each directory.
  • 22. www.protechskills.com FS Shell – Some Basic Commands chgrp - Change group association of files. Usage: hdfs dfs -chgrp [-R] GROUP URI [URI …] chmod - Change the permissions of files. Usage: hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI …] chown - Change the owner of files. Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ] du - Displays sizes of files and directories Usage: hdfs dfs -du [-s] [-h] URI [URI …] The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. The -h option will format file sizes in a "human-readable" fashion. dus - Displays a summary of file lengths Usage: hdfs dfs -dus <args> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop- common/FileSystemShell.html http://hadoop.apache.org/docs/r1.2.1/file_system_shell.html
  • 23. www.protechskills.com DfsAdmin Command  bin/hadoop dfsadmin [Generic Options] [Command Options] -safemode enter / leave / get / wait Safe mode maintenance command. Safe mode can also be entered manually, but then it can only be turned off manually as well. -report Reports basic filesystem information and statistics. -refreshNodes Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Namenode and those that should be decommissioned or recommissioned. -metasave filename Save Namenode's primary data structures to filename in the directory specified by hadoop.log.dir property. filename is overwritten if it exists. filename will contain one line for each of the following 1. Datanodes heart beating with Namenode 2. Blocks waiting to be replicated 3. Blocks currrently being replicated 4. Blocks waiting to be deleted
  • 24. www.protechskills.com References  Hadoop Apache Website – http://hadoop.apache.org/  Hadoop The Definitive Guide 3rd Edition By Oreilly