SlideShare une entreprise Scribd logo
1  sur  40
MapReduce
Programming
Model
OUTLINE
Motivation
Sales exemples
words count exemple
1.wordcount in Hadoop
using python
2.Arraysum Demo using
Java
MapReduce daemons
in Hadoop
Big Data
Map
Reduce
INTRODUCTION
Parallel Processing
01 02
04
Task tracker
Job tracker
MapReduce
03
Demo code
05
2
Summary
Conclusion
06
INTRODUCTION
01
3
Parallel
Processing
4
Task is broken up to multiple parts with a software tool
and each part is distributed to a processor, then each
processor will perform the assigned part.
5
Finally, the parts are reassembled to deliver
the final solution or execute the task.
Reminder !
6
Multiprocessing
Parallel Processing is not
Motivation
7
8
NO !
9
What is the proposed
solution to deal with
?
Motivation
10
• Motivations
● Large-scale data processing on clusters
● Massively parallel (hundreds or thousands of CPUs)
● Reliable execution with easy data access
• Functions
● Fault-tolerance
● Status and monitoring tools
● A clean abstraction for programmers
Inspired by LISP
Function
Programming
Map
11
Reduce
12
Lisp map function
● Input parameters: a function and a set of values
● This function is applied to each of the values.
Lisp reduce function
● given a binary function and a set of values.
● It combines all the values together using the
binary function.
(map ‘length ‘(() (a) (ab) (abc)))
(length(()) length(a) length(ab)
length(abc))
(0 1 2 3)
use the + (add) function to reduce the
list
(reduce #'+ '(0 1 2 3))
6
Example
MapReduce
02
13
14
Instead of browsing the file sequentially, it is divided into chunks that are browsed in
parallel.
Example 1 :
Principal
15
Calculate the total sales for the current year ?
Solution
16
+
++
Instead of having one person
cover the whole book
we hire several !
A first group is called mappers
the second is called reducers
Divide the book in several parts
and give one to each mapper .
17
18
(key , value)
(key , values)
Intermediate registration
Results
shuffle & sort
The Famous
words count
example
02
19
20
Example 2 :
More Details
21
Input/output specification of the WC mapreduce job
Input : a set of (key values) stored in files
key: document ID
value: a list of words as content of each document
Output: a set of (key values) stored in files
key: wordID
value: word frequency appeared in all documents
MapReduce function specification:
map(String input_key, String input_value):
reduce(String output_key, Iterator intermediate_values):
22
Pseudo-code
23
MapReduce
Daemons in
Hadoop
03
24
25
“MapReduce has been implemented in many
programming languages and frameworks, such
as Apache Hadoop, Pig, Hive, etc. “
26
Divides the work on mappers
and reducers
runs on each node to execute
the real mapreduce tasks
Brief introduction for later use
mapReduce daemons
Demo Code
1
05
27
Sum array elements using mapReduce
28
Map: Split the array of 1000
elements into 10 small data
chunks (each chunk will have 100
elements)
Each chunk will be processed by a
separate thread concurrently.
We will have 10 threads and each
thread will iterate 100 elements to
produce the sum of those 100
elements.
Reducer: takes the output of
these 10 threads and will be
summed again to produce the
final output.
Sum array elements using mapReduce with java
29
Project structure Main
Call map task and Reduce Task to
perform mapReduce fn
Environnement
30
create thread pool of 10
save each task
of each chunk
in queue
split array of 1k into
chunks each of
100
save map result
of each chunk
into mapOutput
31
getoutput of map and
aggregate results
For each element
in mapOut(
the result from
previous map)
source code link : https://github.com/HabibaAbderrahim/thread_mapReduce
Demo Code
2
32
Words count using Hadoop framework
33
Environnement
Pseudo Distributed environment
PS : This is a pseudo environment that simulate a fully distributed environment since
we have one server / one pc
java should be installed
create hadoop
sudo user
install hadoop
for the official
website
check hadoop
is installed
version : 3.2.1
34
Environnement
Pseudo Distributed environment
Files configuration
version : 3.2.1
java home and hadoop home
add java path
HDFS : hadoop file system
HDFS configuration : namenode/datanode/replication
mapReduce configuration
mapReduce runs on Yarn
Verify Hadoop daemons
35
We decided to work with python
just to test hadoop
streaming Features
Environnement
version : 3.2.1
version : 3.5.1
word count using mapReduce in Hadoop with python
36
Environnement
version : 3.2.1
version : 3.5.1
Mapper
Reducer
37
Environnement
version : 3.2.1
version : 3.5.1
see what is inside our file
data.txt
Words count in
data.txt
MapReduce
sort results alphabetic
Conclusion
06
38
The ideas, concepts and diagrams are taken from the following websites:
● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly-
pdf/Poly-chap6.pdf
● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce.
pdf
● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce
● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work
[References]
39
Thanks!
Do you have any questions?
40

Contenu connexe

Similaire à mapReduce.pptx

Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Map reduce
Map reduceMap reduce
Map reducexydii
 

Similaire à mapReduce.pptx (20)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Hadoop
HadoopHadoop
Hadoop
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Unit 2
Unit 2Unit 2
Unit 2
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 

Dernier

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 

Dernier (20)

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 

mapReduce.pptx

  • 2. OUTLINE Motivation Sales exemples words count exemple 1.wordcount in Hadoop using python 2.Arraysum Demo using Java MapReduce daemons in Hadoop Big Data Map Reduce INTRODUCTION Parallel Processing 01 02 04 Task tracker Job tracker MapReduce 03 Demo code 05 2 Summary Conclusion 06
  • 5. Task is broken up to multiple parts with a software tool and each part is distributed to a processor, then each processor will perform the assigned part. 5 Finally, the parts are reassembled to deliver the final solution or execute the task.
  • 9. 9 What is the proposed solution to deal with ?
  • 10. Motivation 10 • Motivations ● Large-scale data processing on clusters ● Massively parallel (hundreds or thousands of CPUs) ● Reliable execution with easy data access • Functions ● Fault-tolerance ● Status and monitoring tools ● A clean abstraction for programmers
  • 12. 12 Lisp map function ● Input parameters: a function and a set of values ● This function is applied to each of the values. Lisp reduce function ● given a binary function and a set of values. ● It combines all the values together using the binary function. (map ‘length ‘(() (a) (ab) (abc))) (length(()) length(a) length(ab) length(abc)) (0 1 2 3) use the + (add) function to reduce the list (reduce #'+ '(0 1 2 3)) 6 Example
  • 14. 14 Instead of browsing the file sequentially, it is divided into chunks that are browsed in parallel. Example 1 : Principal
  • 15. 15 Calculate the total sales for the current year ? Solution
  • 16. 16 + ++ Instead of having one person cover the whole book we hire several ! A first group is called mappers the second is called reducers Divide the book in several parts and give one to each mapper .
  • 17. 17
  • 18. 18 (key , value) (key , values) Intermediate registration Results shuffle & sort
  • 21. 21 Input/output specification of the WC mapreduce job Input : a set of (key values) stored in files key: document ID value: a list of words as content of each document Output: a set of (key values) stored in files key: wordID value: word frequency appeared in all documents MapReduce function specification: map(String input_key, String input_value): reduce(String output_key, Iterator intermediate_values):
  • 23. 23
  • 25. 25 “MapReduce has been implemented in many programming languages and frameworks, such as Apache Hadoop, Pig, Hive, etc. “
  • 26. 26 Divides the work on mappers and reducers runs on each node to execute the real mapreduce tasks Brief introduction for later use mapReduce daemons
  • 27. Demo Code 1 05 27 Sum array elements using mapReduce
  • 28. 28 Map: Split the array of 1000 elements into 10 small data chunks (each chunk will have 100 elements) Each chunk will be processed by a separate thread concurrently. We will have 10 threads and each thread will iterate 100 elements to produce the sum of those 100 elements. Reducer: takes the output of these 10 threads and will be summed again to produce the final output. Sum array elements using mapReduce with java
  • 29. 29 Project structure Main Call map task and Reduce Task to perform mapReduce fn Environnement
  • 30. 30 create thread pool of 10 save each task of each chunk in queue split array of 1k into chunks each of 100 save map result of each chunk into mapOutput
  • 31. 31 getoutput of map and aggregate results For each element in mapOut( the result from previous map) source code link : https://github.com/HabibaAbderrahim/thread_mapReduce
  • 32. Demo Code 2 32 Words count using Hadoop framework
  • 33. 33 Environnement Pseudo Distributed environment PS : This is a pseudo environment that simulate a fully distributed environment since we have one server / one pc java should be installed create hadoop sudo user install hadoop for the official website check hadoop is installed version : 3.2.1
  • 34. 34 Environnement Pseudo Distributed environment Files configuration version : 3.2.1 java home and hadoop home add java path HDFS : hadoop file system HDFS configuration : namenode/datanode/replication mapReduce configuration mapReduce runs on Yarn Verify Hadoop daemons
  • 35. 35 We decided to work with python just to test hadoop streaming Features Environnement version : 3.2.1 version : 3.5.1 word count using mapReduce in Hadoop with python
  • 37. 37 Environnement version : 3.2.1 version : 3.5.1 see what is inside our file data.txt Words count in data.txt MapReduce sort results alphabetic
  • 39. The ideas, concepts and diagrams are taken from the following websites: ● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly- pdf/Poly-chap6.pdf ● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce. pdf ● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce ● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work [References] 39
  • 40. Thanks! Do you have any questions? 40

Notes de l'éditeur

  1. should not be confused with Multiprocessing in where multiple processors or cores are working on solving different tasks, instead of parts of the same task as in parallel processing.
  2. before driven into detail , take a moment and ask yourself what does
  3. » Functional programming meets distributed computing » A batch data processing system
  4. Traditional approach In this approach we will iterate each element in an array and will add it to produce final sum.