SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
HADOOP MAPREDUCE
Darwade Sandip
MNIT Jaipur
December 25, 2013
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 1 / 21
Outline
What is HADOOP
What is MapReduce
Componants of Hadoop
Architecture
Implementation
Bibliography
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 2 / 21
What is Hadoop?
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming
models.
Hadoop is best known for MapReduce and its distributed
filesystem (HDFS),and large-scale data processing.
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 3 / 21
MapReduce
Programming model for data processing
Hadoop can run MapReduce programs written in various
languages Java,Python
Parallel Processing,put Mapreduce in very large-scale
data analysis
Mapper produce intermediate results
Reducer aggregates the results
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 4 / 21
Componants Of Hadoop
Two Main Components of Hadoop
HDFS
MAPREDUCE
HDFS
Files are stored in HDFS and divided into blocks, which are then
copied to multiple Data Nodes
Hadoop cluster contains only one Name Node and many
DataNodes
Data blocks are replicated for High Availability and fast access
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 5 / 21
HDFS
NameNode
Run on a separate machine
Manage the file system namespace,
and control access of external clients
Store file system Meta-data in memory
File information, each block information of files,
and every file block information in Data Node
DataNode
Run on Separate machine,which is the basic unit of file storage
Sent all messages of existing Blocks periodically to Name Node
Data Node response read and write request from the Name Node,
and also respond, create, delete, and copy the block command
from Name Node
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 6 / 21
MapReduce
Files are split into fixed sized blocks
and stored on data nodes (Default 64MB)
Programs written, can process on distributed clusters in parallel
Input data is a set of key / value pairs, the output is also the key /
value pairs
Mainly Two Phase Map and Reduce
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 7 / 21
MapReduce (continue...)
Figure: MapReduce Process Architecture
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 8 / 21
MapReduce (continue...)
Map
Map process each block separately in parallel
Generate an intermediate key/value pairs set
Results of these logic blocks are reassembled
Reduce
Accepts an intermediate key and related value
Processed the intermediate key and value
Form a set of relatively small value set
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 9 / 21
How Hadoop runs a MapReduce.
The client, which submits the MapReduce job.
The jobtracker, which coordinates the job.
The tasktrackers, which run the tasks that the job has
been split into.
Tasktrackers are Java applications whose main class is
TaskTracker.
The distributed filesystem, which is used for sharing job
files between the other entities.
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 10 / 21
How Hadoop runs a MapReduce.
Job Submission
Job Initialization
Task Assignment
Task Execution
Job Completion
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 11 / 21
How Hadoop runs a MapReduce
Figure: How Hadoop runs a MapReduce job using the classic framework
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 12 / 21
How Hadoop runs a MapReduce.
Job Submission
submit() method creates an internal JobSummitter calls
submitJobInternal()
The job, waitForCompletion() polls the jobs progress once per
second
JobSummitter does
Asks the jobtracker for a new job ID (by calling getNewJobId() on
JobTracker
Checks the output specification of the job
Computes the input splits for the job.
Copies the resources.
Tells the jobtracker that the job is ready for execution by calling
submitJob() .
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 13 / 21
How Hadoop runs a MapReduce.
Job Initialization
When the JobTracker receives a call submitJob(), it puts it into an
internal queue.
retrieves the input splits computed by the client from the shared
filesystem
Job Assignment
Tasktrackers periodically sends heartbeat.
Assign task to Tasktracker
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 14 / 21
How Hadoop runs a MapReduce.
Job Execution
Next step for the TaskTracker is to run the task.
It localizes the job JAR by copying it from local HDFS
Creates an instance of TaskRunner to run the task.
Job completion
When the jobtracker receives a notification that the last task for a
job is complete, it changes the status for the job to ”successful”.
And tell the user that it returns from the waitForCompletion()
method.
The jobtracker cleans up its working state
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 15 / 21
Implementation
Figure: Minimum Tempurature
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 16 / 21
Implementation
Figure: Maximum Tempurature
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 17 / 21
Implementation (continue...)
Figure: Word Count
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 18 / 21
Implementation (continue...)
Figure: Word Count
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 19 / 21
Bibliography I
G. Yang, “The application of mapreduce in the cloud computing,” Intelligence
Information Processing and Trusted Computing (IPTC) 2011, vol. 9,
pp. 154–156, Oct 2011.
X. Zhang, G. Wang, Z. Yang, and Y. Ding, “A two-phase execution engine of
reduce tasks in hadoop mapreduce.,” 2012 International Conference on Systems
and Informatics (ICSAI 2012), pp. 858–864, May 2012.
T. White, Hadoop:The Definitive Guide, Third Edition.
1005 Gravenstein Highway North, Sebastopol, CA 95472: OReilly Media, Inc.,
2012.
J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large
clusters,” Operating System Design and Implementation (OSDI 2004), vol. 6,
pp. 137–150, 2004.
X. Lin, Z. Meng, C. Xu, and M. Wang, “A practical performance model for
hadoop mapreduce,” 2012 IEEE International Conference on Cluster
Computing Workshops, pp. 231–239, Sept 2012.
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 20 / 21
Bibliography II
Z. Gua, M. Pierce, G. Fox, and M. Zhou, “Automatic task re-organization in
mapreduce,” 2011 IEEE International Conference on Cluster Computing,
pp. 335–343, May 2011.
K. Wang, X. Lin, and W. Tang, “An experience guided configuration optimizer
for hadoop mapreduce,” Cloud Computing Technology and Science
(CloudCom), pp. 419–426, Dec 2012.
Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 21 / 21

Contenu connexe

Tendances

Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
jencyjayastina
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
Mohammad Mustaqeem
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
 

Tendances (19)

Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Approximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processingApproximation algorithms for stream and batch processing
Approximation algorithms for stream and batch processing
 
Introduction to HADOOP
Introduction to HADOOPIntroduction to HADOOP
Introduction to HADOOP
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
D04501036040
D04501036040D04501036040
D04501036040
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 

Similaire à Hadoop Mapreduce

Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
Edureka!
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
BRNSSPublicationHubI
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 

Similaire à Hadoop Mapreduce (20)

Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Map reduce
Map reduceMap reduce
Map reduce
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
big dat ppt
big dat pptbig dat ppt
big dat ppt
 
E031201032036
E031201032036E031201032036
E031201032036
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
B017320612
B017320612B017320612
B017320612
 

Dernier

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Dernier (20)

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 

Hadoop Mapreduce

  • 1. HADOOP MAPREDUCE Darwade Sandip MNIT Jaipur December 25, 2013 Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 1 / 21
  • 2. Outline What is HADOOP What is MapReduce Componants of Hadoop Architecture Implementation Bibliography Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 2 / 21
  • 3. What is Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is best known for MapReduce and its distributed filesystem (HDFS),and large-scale data processing. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 3 / 21
  • 4. MapReduce Programming model for data processing Hadoop can run MapReduce programs written in various languages Java,Python Parallel Processing,put Mapreduce in very large-scale data analysis Mapper produce intermediate results Reducer aggregates the results Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 4 / 21
  • 5. Componants Of Hadoop Two Main Components of Hadoop HDFS MAPREDUCE HDFS Files are stored in HDFS and divided into blocks, which are then copied to multiple Data Nodes Hadoop cluster contains only one Name Node and many DataNodes Data blocks are replicated for High Availability and fast access Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 5 / 21
  • 6. HDFS NameNode Run on a separate machine Manage the file system namespace, and control access of external clients Store file system Meta-data in memory File information, each block information of files, and every file block information in Data Node DataNode Run on Separate machine,which is the basic unit of file storage Sent all messages of existing Blocks periodically to Name Node Data Node response read and write request from the Name Node, and also respond, create, delete, and copy the block command from Name Node Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 6 / 21
  • 7. MapReduce Files are split into fixed sized blocks and stored on data nodes (Default 64MB) Programs written, can process on distributed clusters in parallel Input data is a set of key / value pairs, the output is also the key / value pairs Mainly Two Phase Map and Reduce Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 7 / 21
  • 8. MapReduce (continue...) Figure: MapReduce Process Architecture Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 8 / 21
  • 9. MapReduce (continue...) Map Map process each block separately in parallel Generate an intermediate key/value pairs set Results of these logic blocks are reassembled Reduce Accepts an intermediate key and related value Processed the intermediate key and value Form a set of relatively small value set Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 9 / 21
  • 10. How Hadoop runs a MapReduce. The client, which submits the MapReduce job. The jobtracker, which coordinates the job. The tasktrackers, which run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker. The distributed filesystem, which is used for sharing job files between the other entities. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 10 / 21
  • 11. How Hadoop runs a MapReduce. Job Submission Job Initialization Task Assignment Task Execution Job Completion Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 11 / 21
  • 12. How Hadoop runs a MapReduce Figure: How Hadoop runs a MapReduce job using the classic framework Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 12 / 21
  • 13. How Hadoop runs a MapReduce. Job Submission submit() method creates an internal JobSummitter calls submitJobInternal() The job, waitForCompletion() polls the jobs progress once per second JobSummitter does Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker Checks the output specification of the job Computes the input splits for the job. Copies the resources. Tells the jobtracker that the job is ready for execution by calling submitJob() . Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 13 / 21
  • 14. How Hadoop runs a MapReduce. Job Initialization When the JobTracker receives a call submitJob(), it puts it into an internal queue. retrieves the input splits computed by the client from the shared filesystem Job Assignment Tasktrackers periodically sends heartbeat. Assign task to Tasktracker Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 14 / 21
  • 15. How Hadoop runs a MapReduce. Job Execution Next step for the TaskTracker is to run the task. It localizes the job JAR by copying it from local HDFS Creates an instance of TaskRunner to run the task. Job completion When the jobtracker receives a notification that the last task for a job is complete, it changes the status for the job to ”successful”. And tell the user that it returns from the waitForCompletion() method. The jobtracker cleans up its working state Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 15 / 21
  • 16. Implementation Figure: Minimum Tempurature Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 16 / 21
  • 17. Implementation Figure: Maximum Tempurature Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 17 / 21
  • 18. Implementation (continue...) Figure: Word Count Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 18 / 21
  • 19. Implementation (continue...) Figure: Word Count Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 19 / 21
  • 20. Bibliography I G. Yang, “The application of mapreduce in the cloud computing,” Intelligence Information Processing and Trusted Computing (IPTC) 2011, vol. 9, pp. 154–156, Oct 2011. X. Zhang, G. Wang, Z. Yang, and Y. Ding, “A two-phase execution engine of reduce tasks in hadoop mapreduce.,” 2012 International Conference on Systems and Informatics (ICSAI 2012), pp. 858–864, May 2012. T. White, Hadoop:The Definitive Guide, Third Edition. 1005 Gravenstein Highway North, Sebastopol, CA 95472: OReilly Media, Inc., 2012. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Operating System Design and Implementation (OSDI 2004), vol. 6, pp. 137–150, 2004. X. Lin, Z. Meng, C. Xu, and M. Wang, “A practical performance model for hadoop mapreduce,” 2012 IEEE International Conference on Cluster Computing Workshops, pp. 231–239, Sept 2012. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 20 / 21
  • 21. Bibliography II Z. Gua, M. Pierce, G. Fox, and M. Zhou, “Automatic task re-organization in mapreduce,” 2011 IEEE International Conference on Cluster Computing, pp. 335–343, May 2011. K. Wang, X. Lin, and W. Tang, “An experience guided configuration optimizer for hadoop mapreduce,” Cloud Computing Technology and Science (CloudCom), pp. 419–426, Dec 2012. Darwade Sandip (MNIT) HADOOP MAPREDUCE December 25, 2013 21 / 21