SlideShare une entreprise Scribd logo
1  sur  13
MapReduce
Introduction to MapReduce
• MapReduce is a programming model and an
associated implementation for processing
and generating large data sets with a
parallel, distributed algorithm on a cluster.
• Map() procedure that performs filtering and
sorting.
• Reduce() procedure that performs a
summary operation (such as statistical
operations)
What is MapReduce in Hadoop?
• Hadoop is a free, Java-based programming framework that supports
the processing of large data sets in a distributed computing
environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
• MapReduce is defined as the framework of Hadoop, which is used
to process a huge amount of data parallelly on large clusters of
commodity hardware in a reliable manner. It allows the application
to store the data in the distributed form and process large dataset
across groups of computers using simple programming models
Why use MapReduce?
MapReduce Architecture
How MapReduce in Hadoop works?
Steps of MapReduce in Hadoop
• Map stage − The map or mapper’s job is to process the input
data. Generally, the input data is in the form of file or directory
and is stored in the Hadoop file system (HDFS). The input file is
passed to the mapper function line by line. The mapper
processes the data and creates several small chunks of data.
• Reduce stage − This stage is the combination of
the Shuffle stage and the Reduce stage. The Reducer’s job is to
process the data that comes from the mapper. After processing,
it produces a new set of output, which will be stored in the
HDFS.
Algorithm Behind MapReduce
MapReduce Jobs
Application Master
• Responsible for the
execution of a single
application or MapReduce
job
• Divides job requests into
tasks and assigns them to
Node-Managers running on
the slave node
Node-Manager
• Has many dynamic
resource containers
which executes each
active map or reduce
task
• Communicates regularly
with the Application
Master
MapReduce jobs workflow
Advantage of MapReduce
• Fault tolerance: It can handle failures without downtime.
Speed: It splits, shuffles, and reduces the unstructured data in a short
time.
• Cost-effective: Hadoop MapReduce has a scale-out feature that enables
users to process or store the data in a cost-effective manner.
• Scalability: It provides a highly scalable framework. MapReduce allows
users to run applications from many nodes.
• Parallel processing: As Hadoop storage data in the distributed file system
and the MapReduce program’s working, it divides tasks task map and
reduce and that could execute in parallel. And again, because of the
parallel execution, it reduces the entire run time.
Limitations Of MapReduce
• MapReduce cannot cache the intermediate data in memory
for a further requirement which diminishes the performance
of Hadoop.
• It is only suitable for Batch Processing of a Huge amounts
of Data, and it is not flexible, means the MapReduce framework
is rigid.
• A lot of manual coding is required, even for common operations
such as join, filter, projection, aggregates, sorting, distinct...
• Semantics are hidden inside the map and reduce functions, so it
is difficult to maintain, extend and optimize them.
Mapreduce Hadop.pptx

Contenu connexe

Similaire à Mapreduce Hadop.pptx

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 

Similaire à Mapreduce Hadop.pptx (20)

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Anju
AnjuAnju
Anju
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Mapreduce Hadop.pptx

  • 2. Introduction to MapReduce • MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. • Map() procedure that performs filtering and sorting. • Reduce() procedure that performs a summary operation (such as statistical operations)
  • 3. What is MapReduce in Hadoop? • Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. • MapReduce is defined as the framework of Hadoop, which is used to process a huge amount of data parallelly on large clusters of commodity hardware in a reliable manner. It allows the application to store the data in the distributed form and process large dataset across groups of computers using simple programming models
  • 6. How MapReduce in Hadoop works?
  • 7. Steps of MapReduce in Hadoop • Map stage − The map or mapper’s job is to process the input data. Generally, the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. • Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • 9. MapReduce Jobs Application Master • Responsible for the execution of a single application or MapReduce job • Divides job requests into tasks and assigns them to Node-Managers running on the slave node Node-Manager • Has many dynamic resource containers which executes each active map or reduce task • Communicates regularly with the Application Master
  • 11. Advantage of MapReduce • Fault tolerance: It can handle failures without downtime. Speed: It splits, shuffles, and reduces the unstructured data in a short time. • Cost-effective: Hadoop MapReduce has a scale-out feature that enables users to process or store the data in a cost-effective manner. • Scalability: It provides a highly scalable framework. MapReduce allows users to run applications from many nodes. • Parallel processing: As Hadoop storage data in the distributed file system and the MapReduce program’s working, it divides tasks task map and reduce and that could execute in parallel. And again, because of the parallel execution, it reduces the entire run time.
  • 12. Limitations Of MapReduce • MapReduce cannot cache the intermediate data in memory for a further requirement which diminishes the performance of Hadoop. • It is only suitable for Batch Processing of a Huge amounts of Data, and it is not flexible, means the MapReduce framework is rigid. • A lot of manual coding is required, even for common operations such as join, filter, projection, aggregates, sorting, distinct... • Semantics are hidden inside the map and reduce functions, so it is difficult to maintain, extend and optimize them.