SlideShare une entreprise Scribd logo
1  sur  20
1 Mapreduce algorithm design Web Intelligence and Data Mining Laboratory Presenter / Allen 2011/4/26
Outline MapReduce Framework Pairs Approach Stripes Approach Issues 2011/4/26 2
MapReduce Framework 2011/4/26 3 ,[object Object]
Combiners can be viewed as mini-reducers" in the map phase.
Partitioners determine which reducer is responsible for a particular key.
Reducers are applied to all values associated with the same key.,[object Object]
Motivating Example Term co-occurrence matrix for a text collection M=NN matrix (N = vocabulary size) Mij: number of times term i and j co-occur in some context (for concreteness, let’s say context = sentence) Why? Distributional profiles as a way of measuring semantic distance Semantic distance useful for many language processing tasks 5 2011/4/26
MapReduce: Large counting problems Term co-occurrence matrix for a text collection = specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep tracking of interesting statistics about the events Basic idea Mappers  generate partial counts Reducers aggregate partial counts How do we aggregate partial counts efficiently? 6 2011/4/26
First try “Pairs” Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit(a, b)  count Reducers sums up counts associated with these pairs Use combiners! 7 2011/4/26
“Pairs”Algorithm 2011/4/26 8
“Pairs” Analysis Advantages Easy to implement, easy to understand Disadvantages Lots of pairs to sort and shuffle around (upper bound?) 9 2011/4/26
Another try “Stripes” Idea: group together pairs into an associate array 	(a, b) 1 	(a, c) 2 (a, d) 5		a{b:1, c:2, d:5, e:3, f:2} (a, e) 3 	(a, f) 2 Each mapper takes a sentence: Generating all co-occurring term pairs For each term, emit a {b:countb, c:countc, d:countd,…} Reducers perform element-wise sum of associate arrays                              a{b:1,         d:5, e:3} + a{b:1, c:2, d:2,        f:2}                             a{b:2, c:2, d:7, e:3, f:2} 10 2011/4/26
“Stripes”Algorithm 2011/4/26 11
“Stripes” Analysis Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners Disadvantages More difficult to implement Underlying  objects is more heavyweight Fundamental limitation in terms of size of event space 12 2011/4/26
Running time of the “Pairs” and “Stripes” 13 2011/4/26
Conditional probabilities How do we estimate conditional probabilities from counts? Why do we want to do this? How do we do this with MapReduce? 14 2011/4/26
P(B|A) “Stripes” a{b1:3, b2:12, b3:7, b4:1,…} Easy! One pass to compute (a, *) Another pass to directly compute P(B|A)  15 2011/4/26
P(B|A) “Pairs” (a, *)  32 	Reducer holds this value in memory (a, b1)  3			 (a, b1)  3/32 (a, b2)  12			 (a, b2)  12/32 (a, b3)  7			 (a, b3)  7/32 (a, b4)  1			 (a, b1)  1/32 …						… For this to work: Must emit extra (a, *) for every bn in mapper. Must make sure all a’s get sent to same reducer (use partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value pairs 16 2011/4/26
Synchronization in Hadoop Approach 1: turn synchronization into an ordering problem Sort keys into correct order of computation Partition key space so that each reducer gets the appropriate set of partial results Hold state in reducer across multiple key-value pairs to perform computation Illustrated by the “pairs” approach 17 2011/4/26
Synchronization in Hadoop Approach 2: construct data structures that “bring the pieces together” Each reducer receives all the data it needs to complete the computation  Illustrated by the “stripes” approach 18 2011/4/26

Contenu connexe

Tendances

Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Vinit Rajput
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduceThibault Debatty
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFishAnushree Prasanna Kumar
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
search engine for images
search engine for imagessearch engine for images
search engine for imagesAnjani
 
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...Erdi Karaçal
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Matlab programming project
Matlab programming projectMatlab programming project
Matlab programming projectAssignmentpedia
 
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...Erdi Karaçal
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLABRavikiran A
 

Tendances (20)

Ppt 2 d ploting k10998
Ppt 2 d ploting k10998Ppt 2 d ploting k10998
Ppt 2 d ploting k10998
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Stack and Queue
Stack and QueueStack and Queue
Stack and Queue
 
Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
search engine for images
search engine for imagessearch engine for images
search engine for images
 
Project 2
Project 2Project 2
Project 2
 
Simulink
SimulinkSimulink
Simulink
 
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...Me 443   1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
Me 443 1 what is mathematica Erdi Karaçal Mechanical Engineer University of...
 
Presentation
PresentationPresentation
Presentation
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
working with matrices in r
working with matrices in rworking with matrices in r
working with matrices in r
 
Control Systems
Control SystemsControl Systems
Control Systems
 
Extrapolation
ExtrapolationExtrapolation
Extrapolation
 
Programming with matlab session 3 notes
Programming with matlab session 3 notesProgramming with matlab session 3 notes
Programming with matlab session 3 notes
 
Quiz 2
Quiz 2Quiz 2
Quiz 2
 
Matlab programming project
Matlab programming projectMatlab programming project
Matlab programming project
 
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...Me 443   4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
Me 443 4 plotting curves Erdi Karaçal Mechanical Engineer University of Gaz...
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 

En vedette

Fundraising PowerPoint
Fundraising PowerPointFundraising PowerPoint
Fundraising PowerPointjohnlwelday
 
Final Presentation
Final PresentationFinal Presentation
Final Presentationscottthorpe
 
Our Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres FranceOur Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres FranceDenis Verloes
 
Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)Pravin Dahale
 
Cruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur pptCruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur pptmayur takalikar
 
Minerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - GeographyMinerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - GeographyAthira S
 
Night vision system in Automobiles
Night vision system in AutomobilesNight vision system in Automobiles
Night vision system in Automobilessarang Bire
 
NIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGYNIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGYMihika Shah
 
The human brain presentation
The human brain presentationThe human brain presentation
The human brain presentationSilvia Borba
 
PRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch VehiclePRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch VehicleBitan Dolai
 
Night vision technology ppt
Night vision technology pptNight vision technology ppt
Night vision technology pptEkta Singh
 
Mars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIAMars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIAArchit Jindal
 
Electrical Modalities
Electrical ModalitiesElectrical Modalities
Electrical ModalitiesWSSU
 
Bringing Design to Life
Bringing Design to LifeBringing Design to Life
Bringing Design to LifeBill Scott
 

En vedette (20)

Fundraising PowerPoint
Fundraising PowerPointFundraising PowerPoint
Fundraising PowerPoint
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Dif fft
Dif fftDif fft
Dif fft
 
Our Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres FranceOur Mobile Planet - Les chiffres France
Our Mobile Planet - Les chiffres France
 
Svpwm
SvpwmSvpwm
Svpwm
 
Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)Mars Mission of india (MANGALYAN)
Mars Mission of india (MANGALYAN)
 
ISRO MARS MISSION
ISRO MARS MISSIONISRO MARS MISSION
ISRO MARS MISSION
 
Cruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur pptCruise Missile Technology By Takalikar Mayur ppt
Cruise Missile Technology By Takalikar Mayur ppt
 
Minerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - GeographyMinerals And Energy Resources - Class 10 - Geography
Minerals And Energy Resources - Class 10 - Geography
 
Night Vision Technology
Night Vision TechnologyNight Vision Technology
Night Vision Technology
 
Night vision system in Automobiles
Night vision system in AutomobilesNight vision system in Automobiles
Night vision system in Automobiles
 
NIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGYNIGHT VISION TECHNOLOGY
NIGHT VISION TECHNOLOGY
 
The human brain presentation
The human brain presentationThe human brain presentation
The human brain presentation
 
PRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch VehiclePRESENTATION ON Polar Satellite Launch Vehicle
PRESENTATION ON Polar Satellite Launch Vehicle
 
Night vision technology ppt
Night vision technology pptNight vision technology ppt
Night vision technology ppt
 
Mars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIAMars orbiter mission (Mangalyaan)The govt. of INDIA
Mars orbiter mission (Mangalyaan)The govt. of INDIA
 
ISRO
ISROISRO
ISRO
 
Electrical Modalities
Electrical ModalitiesElectrical Modalities
Electrical Modalities
 
Space frames
Space framesSpace frames
Space frames
 
Bringing Design to Life
Bringing Design to LifeBringing Design to Life
Bringing Design to Life
 

Similaire à Ch4.mapreduce algorithm design

Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica"FENG "GEORGE"" YU
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceLeonidas Akritidis
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
An important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfAn important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfARORACOCKERY2111
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenderssscdotopen
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 

Similaire à Ch4.mapreduce algorithm design (20)

Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
An Introduction to MATLAB with Worked Examples
An Introduction to MATLAB with Worked ExamplesAn Introduction to MATLAB with Worked Examples
An Introduction to MATLAB with Worked Examples
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduceComputing Scientometrics in Large-Scale Academic Search Engines with MapReduce
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
An important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfAn important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdf
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
R studio
R studio R studio
R studio
 
New Directions in Mahout's Recommenders
New Directions in Mahout's RecommendersNew Directions in Mahout's Recommenders
New Directions in Mahout's Recommenders
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 

Plus de AllenWu

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAMAllenWu
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented dataAllenWu
 
地震知識
地震知識地震知識
地震知識AllenWu
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixAllenWu
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decompositionAllenWu
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co ClusteringAllenWu
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisAllenWu
 

Plus de AllenWu (9)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAM
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented data
 
地震知識
地震知識地震知識
地震知識
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
Information Theoretic Co Clustering
Information Theoretic Co ClusteringInformation Theoretic Co Clustering
Information Theoretic Co Clustering
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
 

Dernier

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 

Dernier (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 

Ch4.mapreduce algorithm design

  • 1. 1 Mapreduce algorithm design Web Intelligence and Data Mining Laboratory Presenter / Allen 2011/4/26
  • 2. Outline MapReduce Framework Pairs Approach Stripes Approach Issues 2011/4/26 2
  • 3.
  • 4. Combiners can be viewed as mini-reducers" in the map phase.
  • 5. Partitioners determine which reducer is responsible for a particular key.
  • 6.
  • 7. Motivating Example Term co-occurrence matrix for a text collection M=NN matrix (N = vocabulary size) Mij: number of times term i and j co-occur in some context (for concreteness, let’s say context = sentence) Why? Distributional profiles as a way of measuring semantic distance Semantic distance useful for many language processing tasks 5 2011/4/26
  • 8. MapReduce: Large counting problems Term co-occurrence matrix for a text collection = specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep tracking of interesting statistics about the events Basic idea Mappers generate partial counts Reducers aggregate partial counts How do we aggregate partial counts efficiently? 6 2011/4/26
  • 9. First try “Pairs” Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit(a, b)  count Reducers sums up counts associated with these pairs Use combiners! 7 2011/4/26
  • 11. “Pairs” Analysis Advantages Easy to implement, easy to understand Disadvantages Lots of pairs to sort and shuffle around (upper bound?) 9 2011/4/26
  • 12. Another try “Stripes” Idea: group together pairs into an associate array (a, b) 1 (a, c) 2 (a, d) 5 a{b:1, c:2, d:5, e:3, f:2} (a, e) 3 (a, f) 2 Each mapper takes a sentence: Generating all co-occurring term pairs For each term, emit a {b:countb, c:countc, d:countd,…} Reducers perform element-wise sum of associate arrays a{b:1, d:5, e:3} + a{b:1, c:2, d:2, f:2} a{b:2, c:2, d:7, e:3, f:2} 10 2011/4/26
  • 14. “Stripes” Analysis Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners Disadvantages More difficult to implement Underlying objects is more heavyweight Fundamental limitation in terms of size of event space 12 2011/4/26
  • 15. Running time of the “Pairs” and “Stripes” 13 2011/4/26
  • 16. Conditional probabilities How do we estimate conditional probabilities from counts? Why do we want to do this? How do we do this with MapReduce? 14 2011/4/26
  • 17. P(B|A) “Stripes” a{b1:3, b2:12, b3:7, b4:1,…} Easy! One pass to compute (a, *) Another pass to directly compute P(B|A) 15 2011/4/26
  • 18. P(B|A) “Pairs” (a, *)  32 Reducer holds this value in memory (a, b1)  3 (a, b1)  3/32 (a, b2)  12 (a, b2)  12/32 (a, b3)  7 (a, b3)  7/32 (a, b4)  1 (a, b1)  1/32 … … For this to work: Must emit extra (a, *) for every bn in mapper. Must make sure all a’s get sent to same reducer (use partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value pairs 16 2011/4/26
  • 19. Synchronization in Hadoop Approach 1: turn synchronization into an ordering problem Sort keys into correct order of computation Partition key space so that each reducer gets the appropriate set of partial results Hold state in reducer across multiple key-value pairs to perform computation Illustrated by the “pairs” approach 17 2011/4/26
  • 20. Synchronization in Hadoop Approach 2: construct data structures that “bring the pieces together” Each reducer receives all the data it needs to complete the computation Illustrated by the “stripes” approach 18 2011/4/26
  • 21. Issues Number of key-value pairs Object creation overhead Times for sorting and shuffling pairs across the network Size of each key-value pair De/serialization overhead Combiners make a big difference! RAM vs. disk vs. network Arrange data to maximize opportunities to aggregate partial results 19 2011/4/26
  • 22. 20 Thank you! 2011/4/26