SlideShare une entreprise Scribd logo
1  sur  39
MAPREDUCE
Hadoop Mapreduce paradigm
• Hadoop is an open-source software framework
for storing and processing large datasets ranging
in size from gigabytes to petabytes.
• developed at the Apache Software Foundation.
• basically two components in Hadoop:
1. Massive data storage
2. Faster data processing
2
Hadoop Mapreduce paradigm
• Hadoop distributed File System (HDFS):
• It allows you to store data of various formats
across a cluster.
• Map-Reduce:
• For resource management in Hadoop. It allows
parallel processing over the data stored across
HDFS.
3
History of Hadoop
4
Why Hadoop?
• Cost Effective System
• Computing power
• Scalability
• Storage flexibility
• Inherent data protection
• Varied Data Sources
• Fault-Tolerant
• Highly Available
• Low Network Traffic
• High Throughput
• Multiple Languages Supported
5
Disadvantages of Hadoop
• Issue With Small Files
• Vulnerable By Nature
• Processing Overhead
• Supports Only Batch Processing
• Iterative Processing
• Security
6
Traditional restaurant scenerio
7
Traditional Scenario
8
Distributed Processing Scenario
9
Distributed Processing Scenario Failure
10
Solution of Restaurant problem
11
Hadoop in Restaurant Analogy
12
Map tasks
• Process independent chunks in a parallel manner
• Out of map task stored as intermediate data on
local disk of that server
13
• Out of mapper automatically shuffled and stored
by framework
• Sorts the output based on key
• Provide reduced output by combining the output
f various mappers
Reduce task
14
Map-reduce daemons
1. JobTrackers
2. TaskTrackers
15
JobTracker
• Master daemon
• Single JobTracker per Hadoop cluster
• Provide connectivity between Hadoop and client
application
• Execution plan creation(which task to assign to
which node)
• Monitor all running tasks
• If task failed then rescheduling
16
Task Tracker
• Responsible for executing individual task which
is assigned by JobTracker
• Single Task Tracker per slave
• Continuously sends heartbeat message to Job
Tracker
• If no heartbeat message then task will be
allocated to other Task Trackers
17
Map-reduce execution pipeline
18
Mapper
• Mapper maps the input key-value pairs into a set of
intermediate key-value pairs
• Phases:
1. RecordReader:
• Converts tasks with key value pairs
• <Key , value>  <positional information, chunk of
data that constitutes the record>
2. Map:
• generate zero or more intermediate key-value pairs
19
3. Combiner
• Optimization technique for mapreduce job,
applies user specific aggregate function to only
that mapper
• Also known as Local reducer
4. Partitioner
• Intermediate key-value pairs
• Usually Number of partitions are equal to the
number of reducer
20
Mapper
Reducer
1. Shuffle and sort:
• consumes the output of Mapping phase
• consolidate the relevant records from Mapping
phase output.
• the same words are clubbed together along with
their respective frequency.
21
Reducer
2. Reducer:
• Grouped data produced by the shuffle and sort phase
• Apply reduce function
• Process one group at a time
• Reducer function iterate all the values associated with that key
• Aggregation, filtering,combining
22
3. Output format:
• Separates key value pair with tab
• Write it out to a file using record writer
23
API
• Main Class file Packages
• Mapper Class Packages
• Reducer Class Packages
24
Main class file packages
25
• import org.apache.hadoop.conf.Configured; (Configuration of system parameters)
• import org.apache.hadoop.fs.Path; (Configuration of file system path)
• import org.apache.hadoop.io.IntWritable; (Input/output package to display in output screen)
• import org.apache.hadoop.io.Text; ( to read and write the text)
• import org.apache.hadoop.mapred.FileInputFormat; ( MapRed file input format)
• import org.apache.hadoop.mapred.FileOutputFormat; ; ( MapRed file output format)
• import org.apache.hadoop.mapred.JobClient; ( assign the input job and process)
• import org.apache.hadoop.mapred.JobConf; (configuration file to execute I/O process)
• import org.apache.hadoop.util.Tool; (interface
(command line options) used to access MapRed
functions)
• import org.apache.hadoop.util.ToolRunner;
( Interface use to call run function)
26
Mapper File Packages
• import java.io.IOException; ( Exception handle)
• import org.apache.hadoop.io.IntWritable; ( to read the integer file)
• import org.apache.hadoop.io.LongWritable; (to read files range exceeding integer)
• import org.apache.hadoop.io.Text; (Input and output text)
• import org.apache.hadoop.mapred.MapReduceBase;( Inherited class of MapReduce functions)
• import org.apache.hadoop.mapred.Mapper; (Mapper Class)
• import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class)
• import org.apache.hadoop.mapred.Reporter; (to display the information)
27
Reducer file Package
• import java.io.IOException; ( Exception handle)
• import java.util.Iterator; (to call utility function has more elements from iterator class)
• import org.apache.hadoop.io.IntWritable; ( to read the integer file)
• import org.apache.hadoop.io.Text; (Input and output text)
28
Reducer file Package
• import org.apache.hadoop.mapred.MapReduceBase; ( Inherited class of
MapReduce functions)
• import org.apache.hadoop.mapred.OutputCollector; ( to collect and
display class)
• import org.apache.hadoop.mapred.Reducer; (Reducer Class)
• import org.apache.hadoop.mapred.Reporter; (to display the
information)
29
Hadoop 2.0 features
• HDFS Federation – horizontal scalability of
NameNode
• NameNode High Availability – NameNode is no
longer a Single Point of Failure
• YARN – ability to process Terabytes and
Petabytes of data available in HDFS using Non-
MapReduce applications such as MPI, GIRAPH
30
Hadoop 2.0 features
• Resource Manager – splits up the two major
functionalities of overburdened JobTracker
(resource management and job
scheduling/monitoring) into two separate
daemons: a global Resource Manager and per-
application ApplicationMaster
• Capacity Scheduler
• Data Snapshot
• Support for Windows
31
Namenode high availability
• Hadoop 1.x, NameNode was single point of failure
• Hadoop Administrators need to manually recover
the NameNode using Secondary NameNode.
• Hadoop 2.0 Architecture supports multiple
NameNodes to remove this bottleneck
• Passive Standby NameNode support.
• In case of Active NameNode failure, the passive
NameNode becomes the Active NameNode and
starts writing to the shared storage
32
YARN(Yet Another Resource Negotiator)
• Main idea is splitting the JobTracker
responsibility of resource management and Job
scheduling into separate daemons.
33
YARN daemons
1. Global resource manager:
a) Scheduler(allocation of resources among
various running applications)
b) Application manager(Accepting job
submission, restarting application master in
case of failure)
34
YARN daemons
2. Node manager:
• Pre machine slave daemon
• Launching application container for application
execution
• Report usage of resources to the global resource
manager
35
YARN daemons
3. Application master:
• Application specific entity
• Negotiate required resources for execution from
the resource manager
• Works with node manager for executing and
monitoring component tasks
36
YARN
37
YARN workflow
1. Client submits an application
2. The Resource Manager allocates a container to start the
Application Manager
3. The Application Manager registers itself with the Resource
Manager
4. The Application Manager negotiates containers from the Resource
Manager
5. The Application Manager notifies the Node Manager to launch
containers
6. Application code is executed in the container
7. Client contacts Resource Manager/Application Manager to
monitor application’s status
8. Once the processing is complete, the Application Manager un-
registers with the Resource Manager
38
39

Contenu connexe

Similaire à Hadoop Map-Reduce from the subject: Big Data Analytics

Similaire à Hadoop Map-Reduce from the subject: Big Data Analytics (20)

Hadoop data analysis
Hadoop data analysisHadoop data analysis
Hadoop data analysis
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptx
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Presentation
PresentationPresentation
Presentation
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Dernier

Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesRashidFaridChishti
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdftawat puangthong
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfMadan Karki
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdfKamal Acharya
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoAbhimanyu Sangale
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5T.D. Shashikala
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxRashidFaridChishti
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGKOUSTAV SARKAR
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdfKamal Acharya
 

Dernier (20)

Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 

Hadoop Map-Reduce from the subject: Big Data Analytics

  • 2. Hadoop Mapreduce paradigm • Hadoop is an open-source software framework for storing and processing large datasets ranging in size from gigabytes to petabytes. • developed at the Apache Software Foundation. • basically two components in Hadoop: 1. Massive data storage 2. Faster data processing 2
  • 3. Hadoop Mapreduce paradigm • Hadoop distributed File System (HDFS): • It allows you to store data of various formats across a cluster. • Map-Reduce: • For resource management in Hadoop. It allows parallel processing over the data stored across HDFS. 3
  • 5. Why Hadoop? • Cost Effective System • Computing power • Scalability • Storage flexibility • Inherent data protection • Varied Data Sources • Fault-Tolerant • Highly Available • Low Network Traffic • High Throughput • Multiple Languages Supported 5
  • 6. Disadvantages of Hadoop • Issue With Small Files • Vulnerable By Nature • Processing Overhead • Supports Only Batch Processing • Iterative Processing • Security 6
  • 12. Hadoop in Restaurant Analogy 12
  • 13. Map tasks • Process independent chunks in a parallel manner • Out of map task stored as intermediate data on local disk of that server 13 • Out of mapper automatically shuffled and stored by framework • Sorts the output based on key • Provide reduced output by combining the output f various mappers Reduce task
  • 14. 14
  • 16. JobTracker • Master daemon • Single JobTracker per Hadoop cluster • Provide connectivity between Hadoop and client application • Execution plan creation(which task to assign to which node) • Monitor all running tasks • If task failed then rescheduling 16
  • 17. Task Tracker • Responsible for executing individual task which is assigned by JobTracker • Single Task Tracker per slave • Continuously sends heartbeat message to Job Tracker • If no heartbeat message then task will be allocated to other Task Trackers 17
  • 19. Mapper • Mapper maps the input key-value pairs into a set of intermediate key-value pairs • Phases: 1. RecordReader: • Converts tasks with key value pairs • <Key , value>  <positional information, chunk of data that constitutes the record> 2. Map: • generate zero or more intermediate key-value pairs 19
  • 20. 3. Combiner • Optimization technique for mapreduce job, applies user specific aggregate function to only that mapper • Also known as Local reducer 4. Partitioner • Intermediate key-value pairs • Usually Number of partitions are equal to the number of reducer 20 Mapper
  • 21. Reducer 1. Shuffle and sort: • consumes the output of Mapping phase • consolidate the relevant records from Mapping phase output. • the same words are clubbed together along with their respective frequency. 21
  • 22. Reducer 2. Reducer: • Grouped data produced by the shuffle and sort phase • Apply reduce function • Process one group at a time • Reducer function iterate all the values associated with that key • Aggregation, filtering,combining 22 3. Output format: • Separates key value pair with tab • Write it out to a file using record writer
  • 23. 23
  • 24. API • Main Class file Packages • Mapper Class Packages • Reducer Class Packages 24
  • 25. Main class file packages 25 • import org.apache.hadoop.conf.Configured; (Configuration of system parameters) • import org.apache.hadoop.fs.Path; (Configuration of file system path) • import org.apache.hadoop.io.IntWritable; (Input/output package to display in output screen) • import org.apache.hadoop.io.Text; ( to read and write the text) • import org.apache.hadoop.mapred.FileInputFormat; ( MapRed file input format) • import org.apache.hadoop.mapred.FileOutputFormat; ; ( MapRed file output format) • import org.apache.hadoop.mapred.JobClient; ( assign the input job and process) • import org.apache.hadoop.mapred.JobConf; (configuration file to execute I/O process)
  • 26. • import org.apache.hadoop.util.Tool; (interface (command line options) used to access MapRed functions) • import org.apache.hadoop.util.ToolRunner; ( Interface use to call run function) 26
  • 27. Mapper File Packages • import java.io.IOException; ( Exception handle) • import org.apache.hadoop.io.IntWritable; ( to read the integer file) • import org.apache.hadoop.io.LongWritable; (to read files range exceeding integer) • import org.apache.hadoop.io.Text; (Input and output text) • import org.apache.hadoop.mapred.MapReduceBase;( Inherited class of MapReduce functions) • import org.apache.hadoop.mapred.Mapper; (Mapper Class) • import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class) • import org.apache.hadoop.mapred.Reporter; (to display the information) 27
  • 28. Reducer file Package • import java.io.IOException; ( Exception handle) • import java.util.Iterator; (to call utility function has more elements from iterator class) • import org.apache.hadoop.io.IntWritable; ( to read the integer file) • import org.apache.hadoop.io.Text; (Input and output text) 28
  • 29. Reducer file Package • import org.apache.hadoop.mapred.MapReduceBase; ( Inherited class of MapReduce functions) • import org.apache.hadoop.mapred.OutputCollector; ( to collect and display class) • import org.apache.hadoop.mapred.Reducer; (Reducer Class) • import org.apache.hadoop.mapred.Reporter; (to display the information) 29
  • 30. Hadoop 2.0 features • HDFS Federation – horizontal scalability of NameNode • NameNode High Availability – NameNode is no longer a Single Point of Failure • YARN – ability to process Terabytes and Petabytes of data available in HDFS using Non- MapReduce applications such as MPI, GIRAPH 30
  • 31. Hadoop 2.0 features • Resource Manager – splits up the two major functionalities of overburdened JobTracker (resource management and job scheduling/monitoring) into two separate daemons: a global Resource Manager and per- application ApplicationMaster • Capacity Scheduler • Data Snapshot • Support for Windows 31
  • 32. Namenode high availability • Hadoop 1.x, NameNode was single point of failure • Hadoop Administrators need to manually recover the NameNode using Secondary NameNode. • Hadoop 2.0 Architecture supports multiple NameNodes to remove this bottleneck • Passive Standby NameNode support. • In case of Active NameNode failure, the passive NameNode becomes the Active NameNode and starts writing to the shared storage 32
  • 33. YARN(Yet Another Resource Negotiator) • Main idea is splitting the JobTracker responsibility of resource management and Job scheduling into separate daemons. 33
  • 34. YARN daemons 1. Global resource manager: a) Scheduler(allocation of resources among various running applications) b) Application manager(Accepting job submission, restarting application master in case of failure) 34
  • 35. YARN daemons 2. Node manager: • Pre machine slave daemon • Launching application container for application execution • Report usage of resources to the global resource manager 35
  • 36. YARN daemons 3. Application master: • Application specific entity • Negotiate required resources for execution from the resource manager • Works with node manager for executing and monitoring component tasks 36
  • 38. YARN workflow 1. Client submits an application 2. The Resource Manager allocates a container to start the Application Manager 3. The Application Manager registers itself with the Resource Manager 4. The Application Manager negotiates containers from the Resource Manager 5. The Application Manager notifies the Node Manager to launch containers 6. Application code is executed in the container 7. Client contacts Resource Manager/Application Manager to monitor application’s status 8. Once the processing is complete, the Application Manager un- registers with the Resource Manager 38
  • 39. 39