SlideShare une entreprise Scribd logo
1  sur  13
MapReduce
Introduction to MapReduce
• MapReduce is a programming model and an
associated implementation for processing
and generating large data sets with a
parallel, distributed algorithm on a cluster.
• Map() procedure that performs filtering and
sorting.
• Reduce() procedure that performs a
summary operation (such as statistical
operations)
What is MapReduce in Hadoop?
• Hadoop is a free, Java-based programming framework that supports
the processing of large data sets in a distributed computing
environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
• MapReduce is defined as the framework of Hadoop, which is used
to process a huge amount of data parallelly on large clusters of
commodity hardware in a reliable manner. It allows the application
to store the data in the distributed form and process large dataset
across groups of computers using simple programming models
Why use MapReduce?
MapReduce Architecture
How MapReduce in Hadoop works?
Steps of MapReduce in Hadoop
• Map stage − The map or mapper’s job is to process the input
data. Generally, the input data is in the form of file or directory
and is stored in the Hadoop file system (HDFS). The input file is
passed to the mapper function line by line. The mapper
processes the data and creates several small chunks of data.
• Reduce stage − This stage is the combination of
the Shuffle stage and the Reduce stage. The Reducer’s job is to
process the data that comes from the mapper. After processing,
it produces a new set of output, which will be stored in the
HDFS.
Algorithm Behind MapReduce
MapReduce Jobs
Application Master
• Responsible for the
execution of a single
application or MapReduce
job
• Divides job requests into
tasks and assigns them to
Node-Managers running on
the slave node
Node-Manager
• Has many dynamic
resource containers
which executes each
active map or reduce
task
• Communicates regularly
with the Application
Master
MapReduce jobs workflow
Advantage of MapReduce
• Fault tolerance: It can handle failures without downtime.
Speed: It splits, shuffles, and reduces the unstructured data in a short
time.
• Cost-effective: Hadoop MapReduce has a scale-out feature that enables
users to process or store the data in a cost-effective manner.
• Scalability: It provides a highly scalable framework. MapReduce allows
users to run applications from many nodes.
• Parallel processing: As Hadoop storage data in the distributed file system
and the MapReduce program’s working, it divides tasks task map and
reduce and that could execute in parallel. And again, because of the
parallel execution, it reduces the entire run time.
Limitations Of MapReduce
• MapReduce cannot cache the intermediate data in memory
for a further requirement which diminishes the performance
of Hadoop.
• It is only suitable for Batch Processing of a Huge amounts
of Data, and it is not flexible, means the MapReduce framework
is rigid.
• A lot of manual coding is required, even for common operations
such as join, filter, projection, aggregates, sorting, distinct...
• Semantics are hidden inside the map and reduce functions, so it
is difficult to maintain, extend and optimize them.
Mapreduce Hadop.pptx

Contenu connexe

Similaire à Mapreduce Hadop.pptx

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 

Similaire à Mapreduce Hadop.pptx (20)

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Anju
AnjuAnju
Anju
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Mapreduce Hadop.pptx

  • 2. Introduction to MapReduce • MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. • Map() procedure that performs filtering and sorting. • Reduce() procedure that performs a summary operation (such as statistical operations)
  • 3. What is MapReduce in Hadoop? • Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. • MapReduce is defined as the framework of Hadoop, which is used to process a huge amount of data parallelly on large clusters of commodity hardware in a reliable manner. It allows the application to store the data in the distributed form and process large dataset across groups of computers using simple programming models
  • 6. How MapReduce in Hadoop works?
  • 7. Steps of MapReduce in Hadoop • Map stage − The map or mapper’s job is to process the input data. Generally, the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. • Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • 9. MapReduce Jobs Application Master • Responsible for the execution of a single application or MapReduce job • Divides job requests into tasks and assigns them to Node-Managers running on the slave node Node-Manager • Has many dynamic resource containers which executes each active map or reduce task • Communicates regularly with the Application Master
  • 11. Advantage of MapReduce • Fault tolerance: It can handle failures without downtime. Speed: It splits, shuffles, and reduces the unstructured data in a short time. • Cost-effective: Hadoop MapReduce has a scale-out feature that enables users to process or store the data in a cost-effective manner. • Scalability: It provides a highly scalable framework. MapReduce allows users to run applications from many nodes. • Parallel processing: As Hadoop storage data in the distributed file system and the MapReduce program’s working, it divides tasks task map and reduce and that could execute in parallel. And again, because of the parallel execution, it reduces the entire run time.
  • 12. Limitations Of MapReduce • MapReduce cannot cache the intermediate data in memory for a further requirement which diminishes the performance of Hadoop. • It is only suitable for Batch Processing of a Huge amounts of Data, and it is not flexible, means the MapReduce framework is rigid. • A lot of manual coding is required, even for common operations such as join, filter, projection, aggregates, sorting, distinct... • Semantics are hidden inside the map and reduce functions, so it is difficult to maintain, extend and optimize them.