SlideShare une entreprise Scribd logo
1  sur  20
1 Mapreduce algorithm design Web Intelligence and Data Mining Laboratory Presenter / Allen 2011/4/26
Outline MapReduce Framework Pairs Approach Stripes Approach Issues 2011/4/26 2
MapReduce Framework 2011/4/26 3 ,[object Object]
Combiners can be viewed as mini-reducers" in the map phase.
Partitioners determine which reducer is responsible for a particular key.
Reducers are applied to all values associated with the same key.,[object Object]
Motivating Example Term co-occurrence matrix for a text collection M=NN matrix (N = vocabulary size) Mij: number of times term i and j co-occur in some context (for concreteness, let’s say context = sentence) Why? Distributional profiles as a way of measuring semantic distance Semantic distance useful for many language processing tasks 5 2011/4/26
MapReduce: Large counting problems Term co-occurrence matrix for a text collection = specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep tracking of interesting statistics about the events Basic idea Mappers  generate partial counts Reducers aggregate partial counts How do we aggregate partial counts efficiently? 6 2011/4/26
First try “Pairs” Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit(a, b)  count Reducers sums up counts associated with these pairs Use combiners! 7 2011/4/26
“Pairs”Algorithm 2011/4/26 8
“Pairs” Analysis Advantages Easy to implement, easy to understand Disadvantages Lots of pairs to sort and shuffle around (upper bound?) 9 2011/4/26
Another try “Stripes” Idea: group together pairs into an associate array 	(a, b) 1 	(a, c) 2 (a, d) 5		a{b:1, c:2, d:5, e:3, f:2} (a, e) 3 	(a, f) 2 Each mapper takes a sentence: Generating all co-occurring term pairs For each term, emit a {b:countb, c:countc, d:countd,…} Reducers perform element-wise sum of associate arrays                              a{b:1,         d:5, e:3} + a{b:1, c:2, d:2,        f:2}                             a{b:2, c:2, d:7, e:3, f:2} 10 2011/4/26
“Stripes”Algorithm 2011/4/26 11
“Stripes” Analysis Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners Disadvantages More difficult to implement Underlying  objects is more heavyweight Fundamental limitation in terms of size of event space 12 2011/4/26
Running time of the “Pairs” and “Stripes” 13 2011/4/26
Conditional probabilities How do we estimate conditional probabilities from counts? Why do we want to do this? How do we do this with MapReduce? 14 2011/4/26
P(B|A) “Stripes” a{b1:3, b2:12, b3:7, b4:1,…} Easy! One pass to compute (a, *) Another pass to directly compute P(B|A)  15 2011/4/26
P(B|A) “Pairs” (a, *)  32 	Reducer holds this value in memory (a, b1)  3			 (a, b1)  3/32 (a, b2)  12			 (a, b2)  12/32 (a, b3)  7			 (a, b3)  7/32 (a, b4)  1			 (a, b1)  1/32 …						… For this to work: Must emit extra (a, *) for every bn in mapper. Must make sure all a’s get sent to same reducer (use partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value pairs 16 2011/4/26
Synchronization in Hadoop Approach 1: turn synchronization into an ordering problem Sort keys into correct order of computation Partition key space so that each reducer gets the appropriate set of partial results Hold state in reducer across multiple key-value pairs to perform computation Illustrated by the “pairs” approach 17 2011/4/26
Synchronization in Hadoop Approach 2: construct data structures that “bring the pieces together” Each reducer receives all the data it needs to complete the computation  Illustrated by the “stripes” approach 18 2011/4/26

Contenu connexe

Tendances

Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Vinit Rajput
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFishAnushree Prasanna Kumar
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
search engine for images
search engine for imagessearch engine for images
search engine for imagesAnjani
 
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...Erdi Karaçal
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Matlab programming project
Matlab programming projectMatlab programming project
Matlab programming projectAssignmentpedia
 
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...Erdi Karaçal
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLABRavikiran A
 

Tendances (20)

Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Ppt 2 d ploting k10998
Ppt 2 d ploting k10998
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Stack and Queue
Stack and QueueStack and Queue
Stack and Queue
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
search engine for images
search engine for imagessearch engine for images
search engine for images
 
Project 2
Project 2Project 2
Project 2
 
Simulink
SimulinkSimulink
Simulink
 
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
 
Presentation
PresentationPresentation
Presentation
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
working with matrices in r
working with matrices in rworking with matrices in r
working with matrices in r
 
Control Systems
Control SystemsControl Systems
Control Systems
 
Extrapolation
ExtrapolationExtrapolation
Extrapolation
 
Programming with matlab session 3 notes
Programming with matlab session 3 notesProgramming with matlab session 3 notes
Programming with matlab session 3 notes
 
Quiz 2
Quiz 2Quiz 2
Quiz 2
 
Matlab programming project
Matlab programming projectMatlab programming project
Matlab programming project
 
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 

En vedette

Fundraising PowerPoint
Fundraising PowerPointFundraising PowerPoint
Fundraising PowerPointjohnlwelday
 
Final Presentation
Final PresentationFinal Presentation
Final Presentationscottthorpe
 
Our Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres FranceOur Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres FranceDenis Verloes
 
Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)Pravin Dahale
 
Cruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur pptCruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur pptmayur takalikar
 
Minerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - GeographyMinerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - GeographyAthira S
 
Night vision system in Automobiles
Night vision system in AutomobilesNight vision system in Automobiles
Night vision system in Automobilessarang Bire
 
NIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGYNIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGYMihika Shah
 
The human brain presentation
The human brain presentationThe human brain presentation
The human brain presentationSilvia Borba
 
PRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch VehiclePRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch VehicleBitan Dolai
 
Night vision technology ppt
Night vision technology pptNight vision technology ppt
Night vision technology pptEkta Singh
 
Mars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIAMars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIAArchit Jindal
 
Electrical Modalities
Electrical ModalitiesElectrical Modalities
Electrical ModalitiesWSSU
 
Bringing Design to Life
Bringing Design to LifeBringing Design to Life
Bringing Design to LifeBill Scott
 

En vedette (20)

Fundraising PowerPoint
Fundraising PowerPointFundraising PowerPoint
Fundraising PowerPoint
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Dif fft
Dif fftDif fft
Dif fft
 
Our Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres FranceOur Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres France
 
Svpwm
SvpwmSvpwm
Svpwm
 
Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)
 
ISRO MARS MISSION
ISRO MARS MISSIONISRO MARS MISSION
ISRO MARS MISSION
 
Cruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur pptCruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur ppt
 
Minerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - GeographyMinerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - Geography
 
Night Vision Technology
Night Vision TechnologyNight Vision Technology
Night Vision Technology
 
Night vision system in Automobiles
Night vision system in AutomobilesNight vision system in Automobiles
Night vision system in Automobiles
 
NIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGYNIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGY
 
The human brain presentation
The human brain presentationThe human brain presentation
The human brain presentation
 
PRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch VehiclePRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch Vehicle
 
Night vision technology ppt
Night vision technology pptNight vision technology ppt
Night vision technology ppt
 
Mars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIAMars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIA
 
ISRO
ISROISRO
ISRO
 
Electrical Modalities
Electrical ModalitiesElectrical Modalities
Electrical Modalities
 
Space frames
Space framesSpace frames
Space frames
 
Bringing Design to Life
Bringing Design to LifeBringing Design to Life
Bringing Design to Life
 

Similaire à Ch4.mapreduce algorithm design

Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica"FENG "GEORGE"" YU
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceLeonidas Akritidis
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
An important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfAn important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfARORACOCKERY2111
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenderssscdotopen
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 

Similaire à Ch4.mapreduce algorithm design (20)

Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
An Introduction to MATLAB with Worked Examples
An Introduction to MATLAB with Worked ExamplesAn Introduction to MATLAB with Worked Examples
An Introduction to MATLAB with Worked Examples
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
An important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfAn important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdf
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
R studio
R studio R studio
R studio
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenders
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 

Plus de AllenWu

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAMAllenWu
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented dataAllenWu
 
地震知識
地震知識地震知識
地震知識AllenWu
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixAllenWu
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decompositionAllenWu
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co ClusteringAllenWu
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisAllenWu
 

Plus de AllenWu (9)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAM
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented data
 
地震知識
地震知識地震知識
地震知識
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co Clustering
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Dernier (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Ch4.mapreduce algorithm design

  • 1. 1 Mapreduce algorithm design Web Intelligence and Data Mining Laboratory Presenter / Allen 2011/4/26
  • 2. Outline MapReduce Framework Pairs Approach Stripes Approach Issues 2011/4/26 2
  • 3.
  • 4. Combiners can be viewed as mini-reducers" in the map phase.
  • 5. Partitioners determine which reducer is responsible for a particular key.
  • 6.
  • 7. Motivating Example Term co-occurrence matrix for a text collection M=NN matrix (N = vocabulary size) Mij: number of times term i and j co-occur in some context (for concreteness, let’s say context = sentence) Why? Distributional profiles as a way of measuring semantic distance Semantic distance useful for many language processing tasks 5 2011/4/26
  • 8. MapReduce: Large counting problems Term co-occurrence matrix for a text collection = specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep tracking of interesting statistics about the events Basic idea Mappers generate partial counts Reducers aggregate partial counts How do we aggregate partial counts efficiently? 6 2011/4/26
  • 9. First try “Pairs” Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit(a, b)  count Reducers sums up counts associated with these pairs Use combiners! 7 2011/4/26
  • 11. “Pairs” Analysis Advantages Easy to implement, easy to understand Disadvantages Lots of pairs to sort and shuffle around (upper bound?) 9 2011/4/26
  • 12. Another try “Stripes” Idea: group together pairs into an associate array (a, b) 1 (a, c) 2 (a, d) 5 a{b:1, c:2, d:5, e:3, f:2} (a, e) 3 (a, f) 2 Each mapper takes a sentence: Generating all co-occurring term pairs For each term, emit a {b:countb, c:countc, d:countd,…} Reducers perform element-wise sum of associate arrays a{b:1, d:5, e:3} + a{b:1, c:2, d:2, f:2} a{b:2, c:2, d:7, e:3, f:2} 10 2011/4/26
  • 14. “Stripes” Analysis Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners Disadvantages More difficult to implement Underlying objects is more heavyweight Fundamental limitation in terms of size of event space 12 2011/4/26
  • 15. Running time of the “Pairs” and “Stripes” 13 2011/4/26
  • 16. Conditional probabilities How do we estimate conditional probabilities from counts? Why do we want to do this? How do we do this with MapReduce? 14 2011/4/26
  • 17. P(B|A) “Stripes” a{b1:3, b2:12, b3:7, b4:1,…} Easy! One pass to compute (a, *) Another pass to directly compute P(B|A) 15 2011/4/26
  • 18. P(B|A) “Pairs” (a, *)  32 Reducer holds this value in memory (a, b1)  3 (a, b1)  3/32 (a, b2)  12 (a, b2)  12/32 (a, b3)  7 (a, b3)  7/32 (a, b4)  1 (a, b1)  1/32 … … For this to work: Must emit extra (a, *) for every bn in mapper. Must make sure all a’s get sent to same reducer (use partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value pairs 16 2011/4/26
  • 19. Synchronization in Hadoop Approach 1: turn synchronization into an ordering problem Sort keys into correct order of computation Partition key space so that each reducer gets the appropriate set of partial results Hold state in reducer across multiple key-value pairs to perform computation Illustrated by the “pairs” approach 17 2011/4/26
  • 20. Synchronization in Hadoop Approach 2: construct data structures that “bring the pieces together” Each reducer receives all the data it needs to complete the computation Illustrated by the “stripes” approach 18 2011/4/26
  • 21. Issues Number of key-value pairs Object creation overhead Times for sorting and shuffling pairs across the network Size of each key-value pair De/serialization overhead Combiners make a big difference! RAM vs. disk vs. network Arrange data to maximize opportunities to aggregate partial results 19 2011/4/26
  • 22. 20 Thank you! 2011/4/26