SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
Hadoop Architecture | Features and Objectives
What is Hadoop?
Hadoop is an Apache open-source framework. It was written using Java that allows the
distributed processing of large datasets across clusters of computers using simple
programming models. The Hadoop framework application works in a platform that provides
distributed storage and computation across clusters of computers. Using Google’s solution,
Doug Cutting and his team developed an Open Source Project which is named as HADOOP.
Using the Map Reduce algorithm, Hadoop runs the applications where the data is processed in
parallel with others. In simple, Hadoop is used to develop applications that could perform a
complete statistical analysis of huge amounts of data.
Architecture of Hadoop
Hadoop has two major layers namely
• Processing/Computation layer (Map Reduce), and
• Storage layer (Hadoop Distributed File System).
Map Reduce
Map Reduce is a parallel programming model that is used for writing distributed applications.
These distributed applications are devised at Google for efficient processing of large amounts
of data, on large clusters of commodity hardware in a reliable, fault-tolerant manner. The Map
Reduce program runs on the Hadoop framework.
Hadoop Distributed File System
The Hadoop Distributed File System is based on the Google File System. It provides a
distributed file system that is designed to run on commodity hardware. HDFS has many
similarities with existing distributed file systems. It is designed to be deployed on low-cost
hardware and highly fault-tolerant. It provides high throughput access to application data.
The below following two modules are also included in Hadoop Framework −
a. Hadoop Common
These are Java libraries and utilities which are required by other Hadoop modules.
b. Hadoop YARN
Hadoop a framework for job scheduling and cluster resource management.
How Does Hadoop Work?
It’s quite expensive to build bigger servers with heavy configurations that handle large scale
processing , As it is cheaper than one high-end server We can use Hadoop as an alternative. So
this is the major factor behind using Hadoop that it runs across clustered and low-cost
machines.
The following core tasks that Hadoop performs are clearly mentioned below:-
1. Data is firstly segmented into directories and files. Files are further divided into
uniform-sized blocks of 128M and 64M (preferably 128M).
2. These files are then again distributed across various cluster nodes for further
processing.
3. Being on top of the local file system, HDFS supervises the processing.
4. All the Blocks are replicated for handling hardware failure.
5. It Checks that the code was executed successfully.
6. Performs the sort that takes place between the map and reduces stages.
7. Sends the sorted data to a certain computer.
8. Writes the debugging logs for each job.
Hadoop File System was developed using a distributed file system design. It runs on commodity
hardware. Comparing to other distributed systems, HDFS is highly faulted tolerant and
designed using low-cost hardware.
HDFS holds a very large amount of data and it maintains easier access. The files are stored
across multiple machines for storing such huge data. HDFS also makes applications available to
parallel processing.
Features of HDFS.
1. To interact with HDFS, Hadoop provides a command interface
2. Users can easily check the status of the cluster with the help of name node and data node
3. Available of streaming access to file system data.
4. HDFS provides file permissions and authentication.
HDFS Architecture
It mainly follows the master-slave architecture
Name node
It is the commodity hardware that consists of the GNU/Linux operating system and the name
node software. It is software that runs on commodity hardware. Below mentioned are the
following tasks that it can perform
a. It manages the file system namespace.
b. It regulates the client’s access to files.
c. Executes the file system operations such as renaming, closing, and opening files and
directories.
Data node
It is a commodity hardware that consists of the GNU/Linux operating system and data node
software. There will be a data node. For every node in a cluster, these nodes will manage the
data storage of their system.
a. As per client request, Data nodes perform read-write operations on the file systems
b. They also perform other operations such as block creation, deletion, and replication.
Block
The file in a file system is divided into one or more segments. These file segments are called
blocks. In simple words, we can say that the minimum amount of data that HDFS can read or
write is called a Block. Generally, the default block size is 64MB, but we can increase the block
size as per the need to change in HDFS configuration.
Objectives of HDFS
1. Fault detection and recovery
As HDFS includes a large number of commodity hardware, there is a probability of having
failures in components. To overcome this HDFS should have mechanisms for quick and
automatic fault detection and recovery.
2. Huge datasets
To manage the applications having huge datasets HDFS should have hundreds of nodes per
cluster
3. Hardware at data
When the computation takes place near the data a requested task can be done. The network
traffic is reduced and results in increment in the throughput.
Hadoop Advantages :-
1. Varied data sources
2. Availability
3. Scalable
4. Cost effective
5. Low network traffic
6. Ease of use
7. Performance
8. High throughput
9. Compatibility
10. Fault tolerant
11. Open source
12. Multi-Language support
Limitations of Hadoop:-
1. Issues with small files
2. Slow processing speed
3. Latency
4. Security
5. No real time data processing
6. Uncertainty
7. Lengthy line of code
8. Not easy to use
9. No caching
10. Supports only batch processing
Summary:-
This brings us to the end of this article on Hadoop. In this article you have learn what is Hadoop,
Architecture of Hadoop, Features and HDFS Architecture. We have also come up with a
curriculum that covers exactly what you would need to be expert in Hadoop Development! You
can have a look at the course details for Hadoop.
Hadoop  architecture-tutorial

Contenu connexe

Tendances

Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++Hortonworks
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of HadoopKnoldus Inc.
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS Dr Neelesh Jain
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 

Tendances (20)

Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 

Similaire à Hadoop architecture-tutorial

Similaire à Hadoop architecture-tutorial (20)

Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Anju
AnjuAnju
Anju
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop
HadoopHadoop
Hadoop
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 

Dernier

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Dernier (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Hadoop architecture-tutorial

  • 1. Hadoop Architecture | Features and Objectives What is Hadoop? Hadoop is an Apache open-source framework. It was written using Java that allows the distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in a platform that provides distributed storage and computation across clusters of computers. Using Google’s solution, Doug Cutting and his team developed an Open Source Project which is named as HADOOP. Using the Map Reduce algorithm, Hadoop runs the applications where the data is processed in parallel with others. In simple, Hadoop is used to develop applications that could perform a complete statistical analysis of huge amounts of data. Architecture of Hadoop Hadoop has two major layers namely • Processing/Computation layer (Map Reduce), and • Storage layer (Hadoop Distributed File System).
  • 2. Map Reduce Map Reduce is a parallel programming model that is used for writing distributed applications. These distributed applications are devised at Google for efficient processing of large amounts of data, on large clusters of commodity hardware in a reliable, fault-tolerant manner. The Map Reduce program runs on the Hadoop framework. Hadoop Distributed File System The Hadoop Distributed File System is based on the Google File System. It provides a distributed file system that is designed to run on commodity hardware. HDFS has many similarities with existing distributed file systems. It is designed to be deployed on low-cost hardware and highly fault-tolerant. It provides high throughput access to application data. The below following two modules are also included in Hadoop Framework − a. Hadoop Common These are Java libraries and utilities which are required by other Hadoop modules. b. Hadoop YARN Hadoop a framework for job scheduling and cluster resource management. How Does Hadoop Work? It’s quite expensive to build bigger servers with heavy configurations that handle large scale processing , As it is cheaper than one high-end server We can use Hadoop as an alternative. So this is the major factor behind using Hadoop that it runs across clustered and low-cost machines. The following core tasks that Hadoop performs are clearly mentioned below:- 1. Data is firstly segmented into directories and files. Files are further divided into uniform-sized blocks of 128M and 64M (preferably 128M). 2. These files are then again distributed across various cluster nodes for further processing. 3. Being on top of the local file system, HDFS supervises the processing. 4. All the Blocks are replicated for handling hardware failure.
  • 3. 5. It Checks that the code was executed successfully. 6. Performs the sort that takes place between the map and reduces stages. 7. Sends the sorted data to a certain computer. 8. Writes the debugging logs for each job. Hadoop File System was developed using a distributed file system design. It runs on commodity hardware. Comparing to other distributed systems, HDFS is highly faulted tolerant and designed using low-cost hardware. HDFS holds a very large amount of data and it maintains easier access. The files are stored across multiple machines for storing such huge data. HDFS also makes applications available to parallel processing. Features of HDFS. 1. To interact with HDFS, Hadoop provides a command interface 2. Users can easily check the status of the cluster with the help of name node and data node 3. Available of streaming access to file system data. 4. HDFS provides file permissions and authentication. HDFS Architecture It mainly follows the master-slave architecture
  • 4. Name node It is the commodity hardware that consists of the GNU/Linux operating system and the name node software. It is software that runs on commodity hardware. Below mentioned are the following tasks that it can perform a. It manages the file system namespace. b. It regulates the client’s access to files. c. Executes the file system operations such as renaming, closing, and opening files and directories. Data node It is a commodity hardware that consists of the GNU/Linux operating system and data node software. There will be a data node. For every node in a cluster, these nodes will manage the data storage of their system. a. As per client request, Data nodes perform read-write operations on the file systems b. They also perform other operations such as block creation, deletion, and replication. Block The file in a file system is divided into one or more segments. These file segments are called blocks. In simple words, we can say that the minimum amount of data that HDFS can read or write is called a Block. Generally, the default block size is 64MB, but we can increase the block size as per the need to change in HDFS configuration. Objectives of HDFS 1. Fault detection and recovery As HDFS includes a large number of commodity hardware, there is a probability of having failures in components. To overcome this HDFS should have mechanisms for quick and automatic fault detection and recovery. 2. Huge datasets To manage the applications having huge datasets HDFS should have hundreds of nodes per cluster 3. Hardware at data
  • 5. When the computation takes place near the data a requested task can be done. The network traffic is reduced and results in increment in the throughput. Hadoop Advantages :- 1. Varied data sources 2. Availability 3. Scalable 4. Cost effective 5. Low network traffic 6. Ease of use 7. Performance 8. High throughput 9. Compatibility 10. Fault tolerant 11. Open source 12. Multi-Language support Limitations of Hadoop:- 1. Issues with small files 2. Slow processing speed 3. Latency 4. Security 5. No real time data processing 6. Uncertainty 7. Lengthy line of code 8. Not easy to use 9. No caching 10. Supports only batch processing Summary:- This brings us to the end of this article on Hadoop. In this article you have learn what is Hadoop, Architecture of Hadoop, Features and HDFS Architecture. We have also come up with a curriculum that covers exactly what you would need to be expert in Hadoop Development! You can have a look at the course details for Hadoop.