SlideShare une entreprise Scribd logo
1  sur  11
MapReduce
Dr M Zunnun Khan
Map Reduce
 MapReduce
◦ processing technique
◦ a program model
◦ For any type distributed computing
◦ based on java.
 The MapReduce algorithm contains two important tasks
◦ namely Map and Reduce.
◦ Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs).
◦ Secondly, reduce task, which takes the output from a map as an input
and combines those data tuples into a smaller set of tuples.
◦ As the sequence of the name MapReduce implies, the reduce task is
always performed after the map job.
Inputs and Outputs (Java Perspective)
 The MapReduce framework operates on <key, value> pairs,
 the input to the job as a set of <key, value> pairs
 produces a set of <key, value> pairs as the output of the job,
 conceivably of different types.
 The key and the value classes should be in serialized manner by
the framework and hence, need to implement the Writable
interface.
 Additionally, the key classes have to implement the Writable-
Comparable interface to facilitate sorting by the framework.
 Input and Output types of a MapReduce job −
 (Input) <k1, v1> → map → <k2, v2> → reduce → <k3,
v3>(Output).
 Example Scenario
 Given sample is the random data
Input Sequence
Welcome to Hadoop
Class Hadoop is good
Hadoop is bad
 The final output of the MapReduce task is
◦ bad 1
◦ Class 1
◦ good 1
◦ Hadoop 3is 2to 1
◦ Welcome 1
 Input Splits:
An input to a MapReduce job is divided into fixed-size pieces called input
splits Input split is a chunk of the input that is consumed by a single map
 Mapping
This is the very first phase in the execution of map-reduce program. In this
phase data in each split is passed to a mapping function to produce output
values. In our example, a job of mapping phase is to count a number of
occurrences of each word from input splits (more details about input-split is
given below) and prepare a list in the form of <word, frequency>
 Shuffling
This phase consumes the output of Mapping phase. Its task is to consolidate
the relevant records from Mapping phase output. In our example, the same
words are clubed together along with their respective frequency.
 Reducing
In this phase, output values from the Shuffling phase are aggregated. This
phase combines values from Shuffling phase and returns a single output
value. In short, this phase summarizes the complete dataset.
 How MapReduce Organizes Work?
 Hadoop divides the job into tasks. There are two types of tasks:
 Map tasks (Splits & Mapping)
 Reduce tasks (Shuffling, Reducing)
 as mentioned above.
 The complete execution process (execution of Map and Reduce tasks, both) is
controlled by two types of entities called a
 Jobtracker: Acts like a master (responsible for complete execution of
submitted job)
 Multiple Task Trackers: Acts like slaves, each of them performing the job
 For every job submitted for execution in the system, there is
one Jobtracker that resides on Namenode and there are multiple
tasktrackers which reside on Datanode.
 A job is divided into multiple tasks which are then run onto
multiple data nodes in a cluster.
 It is the responsibility of job tracker to coordinate the
activity by scheduling tasks to run on different data nodes.
 Execution of individual task is then to look after by task
tracker, which resides on every data node executing part of
the job.
 Task tracker's responsibility is to send the progress report
to the job tracker.
 In addition, task tracker periodically
sends 'heartbeat' signal to the Jobtracker so as to notify
him of the current state of the system.
 Thus job tracker keeps track of the overall progress of each
job. In the event of task failure, the job tracker can
reschedule it on a different task tracker.

Contenu connexe

Tendances

Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
Uday Vakalapudi
 

Tendances (20)

Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 
Map reduce
Map reduceMap reduce
Map reduce
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and Graph
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Provided MATLAB functions, convert the State Space Model into a classical con...
Provided MATLAB functions, convert the State Space Model into a classical con...Provided MATLAB functions, convert the State Space Model into a classical con...
Provided MATLAB functions, convert the State Space Model into a classical con...
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
BIG DATA ANALYTICS
BIG DATA ANALYTICSBIG DATA ANALYTICS
BIG DATA ANALYTICS
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
 
An Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing SystemAn Algorithm for Optimized Cost in a Distributed Computing System
An Algorithm for Optimized Cost in a Distributed Computing System
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioning
 
Pregel - Paper Review
Pregel - Paper ReviewPregel - Paper Review
Pregel - Paper Review
 
Programming models for event controlled programs
Programming models for event controlled programsProgramming models for event controlled programs
Programming models for event controlled programs
 
Load balancing
Load balancingLoad balancing
Load balancing
 

Similaire à Unit3 MapReduce

Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
Indhujeni
 

Similaire à Unit3 MapReduce (20)

MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Join Algorithms in MapReduce
Join Algorithms in MapReduceJoin Algorithms in MapReduce
Join Algorithms in MapReduce
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 

Plus de Integral university, India

Plus de Integral university, India (18)

Cloud Security_ Unit 4
Cloud Security_ Unit 4Cloud Security_ Unit 4
Cloud Security_ Unit 4
 
Cloud resilience, provisioning
Cloud resilience, provisioning Cloud resilience, provisioning
Cloud resilience, provisioning
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 
Data and software privacy
Data and software privacyData and software privacy
Data and software privacy
 
Unit4 next
Unit4 nextUnit4 next
Unit4 next
 
U nit 4
U nit 4U nit 4
U nit 4
 
Unit4 cry
Unit4 cryUnit4 cry
Unit4 cry
 
Unit4
Unit4Unit4
Unit4
 
Unit5
Unit5Unit5
Unit5
 
Unit5 Cloud Federation,
Unit5 Cloud Federation,Unit5 Cloud Federation,
Unit5 Cloud Federation,
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 
cloud Resilience
cloud Resilience cloud Resilience
cloud Resilience
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 
Software Security
Software SecuritySoftware Security
Software Security
 
Block Level and File Level
Block Level and File LevelBlock Level and File Level
Block Level and File Level
 
Software Security
Software SecuritySoftware Security
Software Security
 
Security threats
Security threatsSecurity threats
Security threats
 
Virtualization concepts in cloud computing
Virtualization concepts in cloud computingVirtualization concepts in cloud computing
Virtualization concepts in cloud computing
 

Dernier

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Dernier (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 

Unit3 MapReduce

  • 2. Map Reduce  MapReduce ◦ processing technique ◦ a program model ◦ For any type distributed computing ◦ based on java.  The MapReduce algorithm contains two important tasks ◦ namely Map and Reduce. ◦ Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). ◦ Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. ◦ As the sequence of the name MapReduce implies, the reduce task is always performed after the map job.
  • 3.
  • 4. Inputs and Outputs (Java Perspective)  The MapReduce framework operates on <key, value> pairs,  the input to the job as a set of <key, value> pairs  produces a set of <key, value> pairs as the output of the job,  conceivably of different types.  The key and the value classes should be in serialized manner by the framework and hence, need to implement the Writable interface.  Additionally, the key classes have to implement the Writable- Comparable interface to facilitate sorting by the framework.  Input and Output types of a MapReduce job −  (Input) <k1, v1> → map → <k2, v2> → reduce → <k3, v3>(Output).
  • 5.  Example Scenario  Given sample is the random data Input Sequence Welcome to Hadoop Class Hadoop is good Hadoop is bad
  • 6.
  • 7.  The final output of the MapReduce task is ◦ bad 1 ◦ Class 1 ◦ good 1 ◦ Hadoop 3is 2to 1 ◦ Welcome 1
  • 8.  Input Splits: An input to a MapReduce job is divided into fixed-size pieces called input splits Input split is a chunk of the input that is consumed by a single map  Mapping This is the very first phase in the execution of map-reduce program. In this phase data in each split is passed to a mapping function to produce output values. In our example, a job of mapping phase is to count a number of occurrences of each word from input splits (more details about input-split is given below) and prepare a list in the form of <word, frequency>  Shuffling This phase consumes the output of Mapping phase. Its task is to consolidate the relevant records from Mapping phase output. In our example, the same words are clubed together along with their respective frequency.  Reducing In this phase, output values from the Shuffling phase are aggregated. This phase combines values from Shuffling phase and returns a single output value. In short, this phase summarizes the complete dataset.
  • 9.  How MapReduce Organizes Work?  Hadoop divides the job into tasks. There are two types of tasks:  Map tasks (Splits & Mapping)  Reduce tasks (Shuffling, Reducing)  as mentioned above.  The complete execution process (execution of Map and Reduce tasks, both) is controlled by two types of entities called a  Jobtracker: Acts like a master (responsible for complete execution of submitted job)  Multiple Task Trackers: Acts like slaves, each of them performing the job  For every job submitted for execution in the system, there is one Jobtracker that resides on Namenode and there are multiple tasktrackers which reside on Datanode.
  • 10.
  • 11.  A job is divided into multiple tasks which are then run onto multiple data nodes in a cluster.  It is the responsibility of job tracker to coordinate the activity by scheduling tasks to run on different data nodes.  Execution of individual task is then to look after by task tracker, which resides on every data node executing part of the job.  Task tracker's responsibility is to send the progress report to the job tracker.  In addition, task tracker periodically sends 'heartbeat' signal to the Jobtracker so as to notify him of the current state of the system.  Thus job tracker keeps track of the overall progress of each job. In the event of task failure, the job tracker can reschedule it on a different task tracker.