SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Big DataProblem
Map-Reduce
Computer Lab
Big Data processing using MapReduce
N.Venkatesh 1
1Asst.Professor of CSE
JNTUK University College of Engineering,Vizianagaram
Email-id:nvenkatesh@jntukucev.ac.in
27 January 2015
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Table of contents
1 Big DataProblem
2 Map-Reduce
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
3 Computer Lab
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Data is Growing much faster than the Computation speeds.
Reasons: Data Sources like web,sensors,telescope,RFID, mobiles
cheaper storage.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Motivation
Distribute the Data into set of nodes which are connected in a
network.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Motivation
Distribute the Data into set of nodes which are connected in a
network.
Question is how do we Program? Issues while
Programming: How to divide the work across
nodes(scheduling),How to deal with node failures, stragglers
mera machine stuck huva yaar
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Motivation
Distribute the Data into set of nodes which are connected in a
network.
Question is how do we Program? Issues while
Programming: How to divide the work across
nodes(scheduling),How to deal with node failures, stragglers
mera machine stuck huva yaar
Data Parallel model: Automatically takes care of Scheduling,
node failures Good example of parallel model is Map-Reduce
it Was Invented by Engineers at Google as a system for
building Search Index
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Map-Reduce Programming Model
Data-type: Key-Value records
Map Function:
(Kin, Vin) ⇒ list(Kinter , Vinter )
Reduce Function:
(Kinter , list(Vinter )) ⇒ list(Kout, Vout)
Key and Value can be any type.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Writing MapReduce from Scratch
What ever the Data(Data set) and What ever the application
Everything Should be Converted into <Key K,Value V> Pairs
InputFormat <K,V>
Defines Input Splits,Record Reader,Input to the Mapper
Mapper <K,V,K,V>
Uses map function to Produce Intermediate <K,V> Pairs
Combiner<K,V,K,V> and Partitioner <K,V>
on Same Mapper Multiple Values associated with the same key
Partition the key space based on number of Reducers
Reducer<K,V,K,V>
Uses reduce function(executed one per Key)
OutputFormat<K,V> and Driver(containing main function
with job details).
Beauty is Everything can be Customized Including Key,Value.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
Writing map function of Mapper based on the Record
returned Record Reader of Input Format
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
Writing map function of Mapper based on the Record
returned Record Reader of Input Format
An optional Combiner if there is any possibility of Local
Aggregation
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
Writing map function of Mapper based on the Record
returned Record Reader of Input Format
An optional Combiner if there is any possibility of Local
Aggregation
Writing reduce function of Reducer based on the Record
returned by Record Reader of Input Format
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Mapper of WordCount
public class WordMapper extends Mapper<LongWritable,
Text, Text, LongWritable> {
LongWritable one = new LongWritable(1);
@Override
public void map(LongWritable key, Text value,
Context contex) throws IOException,
InterruptedException {
String line=value.toStrnig();
String [] wordsinline= line.split(" ");
for(i=0;i<wordsinline.length;i++)
contex.write(wordsinline[i], one);
}
}
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Reducer of WordCount
public class WordReducer extends Reducer<Text,
LongWritable, Text, LongWritable> {
LongWritable totalWC = new LongWritable();
@Override
public void reduce(Text _key, Iterable<LongWritable>
values, Context context) throws IOException,
InterruptedException {
int wordCount = 0;
for(LongWritable val :values)
{ wordCount=wordCount+val;
}
totalWordCount.set(wordCount);
context.write(key, totalWC);
}}
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Job Configuration and Job Submission
Jobs are controlled by using Configuration,Job Class Objects
Configurations are maps from attribute names to string value,
Specified by using either set or addResource
conf.set(propName, propValue); or
conf.addResource(PathContainsPropertiesfile)
conf.set(”mapreduce.job.jar”,”/home/hadoop/x.jar”);
Job Objects will take Configuartion Object and Parameters
like InputPath,OutputPath,Mapper,Reducer etc.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Job Driver Example
public class WordCount {
public static void main(String[] args) throws IOException{
Configuration conf = new Configuration();
conf.set("mapreduce.job.jar","x.jar");
conf.addResource(new Path("conf.xml"));
Job job=Job.getInstace(conf,"xyz");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
job.setInputPath(new Path(args[0]);
job.setOutputPath(new Path(args[1]);
job.waitForCompletion(true);
}
}
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Building Word Co-Occurrence Matrix From a Large Corpus
In general, A Co-occurence matrix could be described as the
tracking of an event, and given a window time or space, What
other events may occurs. In this context ”words” are events,
”window” relative position of targeted words.
Ex:The way to love anything is to realize that it may be lost
Co-occurence for the word love is [way,to,anything,is] for window size 2.
Solution(Think !!!): how to convert this problem into Problem of
starting from some <key,value> pairs and ending at <key,value>
pairs i.e pair of words and its count.
Similar to wordcount only difference is <word,neighbor> should be
mapper output key instead of <word>.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
More Examples
Simple Search : Display all the lines which has given word
Map: filter all those lines which has word
Reduce: Identity Reducer
typical Map only job.
Sort the list of word according their count:
Nontrivial: two jobs or add a custom key (word, count)
Find the number of lines in a file:
Tricky one: ....
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Intresting Examples
Social Networking Site Common Friends List: When you visit
someone’s profile, you see a list of friends that you have in
common. This list doesn’t change frequently so it’d be wasteful to
recalculate it every time you visit the profile.
venky: priya mouni santosh suresh kumar
suresh: santi kumar santosh srinivas divakar ....
venky visits suresh profile he should get two common friends
srinivas, kumar
Map:
key (venky,suresh ) : venky’s friends after processing venky friends
key (venky, suresh) : suresh’s friends after processing suresh friends
Reduce:
(venky,suresh): intersection of venky’s and suresh’s friends.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Intresting Examples
Social Networking Sites Friend Recommend er: People you
might know systems based on the Common friends.
venky: priya mouni santosh kumar srinivas
sahaja: priya mouni santi kumar santosh srinivas divakar ....
People you might know system should recommend venky you
might know sahaja and also to sahaja you might know venky. take
facebook as an example you are taking about 1 billion records.in
the context of india 100 million records.
Map: after processing every record recommend every friend with
other friend in the list. (priya;mouni,c=venky)
(mouni;priya,c=venky) ...
precaution they might be already friends (venky; priya,c=null)
Reduce: combine them same key (priya :mouni 2(venky,sahaja))
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little bit About Lab Environment
Two Clusters (23-node,11-node)
Ubuntu
Windows Compilation Environment.
Eclipse
N.Venkatesh Big Data processing using MapReduce

Contenu connexe

Tendances

Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011Bin Cai
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudBharat Rane
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceBhupesh Chawda
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Hyun Wong Choi
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?TerrierTeam
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010BOSC 2010
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 

Tendances (20)

IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
Planet
PlanetPlanet
Planet
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
T180304125129
T180304125129T180304125129
T180304125129
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Hadoop
HadoopHadoop
Hadoop
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
E031201032036
E031201032036E031201032036
E031201032036
 
Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 

En vedette

Discriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leavesDiscriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leavesDanChitwood
 
площадка
площадкаплощадка
площадкаmegikatq
 
Permanenthiring
PermanenthiringPermanenthiring
PermanenthiringNeuhiring
 
Vocabulario agronomico rodriguez avila, rivera giron
Vocabulario agronomico  rodriguez avila, rivera gironVocabulario agronomico  rodriguez avila, rivera giron
Vocabulario agronomico rodriguez avila, rivera gironJussely Rodríguez
 
Getting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteusGetting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteusrnrao569
 
Fiware IoT Proposal & Community
Fiware IoT Proposal & Community Fiware IoT Proposal & Community
Fiware IoT Proposal & Community TIDChile
 
Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)pgkikasv
 
3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- Track3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- TrackTIDChile
 
My daily routine
My daily routineMy daily routine
My daily routinepgkikasv
 
Jo Casserley September 2016
Jo Casserley September 2016Jo Casserley September 2016
Jo Casserley September 2016Jo Casserley
 
Venezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericanaVenezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericanaCarlos Turdera
 
How to make a vision board powerpoint
How to make a vision board powerpointHow to make a vision board powerpoint
How to make a vision board powerpointdonavon1991
 
Portugal za romenia
Portugal za romeniaPortugal za romenia
Portugal za romeniamegikatq
 
портфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівнапортфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівнаkviatkovska
 
игри белгия
игри  белгияигри  белгия
игри белгияmegikatq
 
Verbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avilaVerbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avilaJussely Rodríguez
 
2013 Green Rep Sustainability Report
2013 Green Rep Sustainability Report2013 Green Rep Sustainability Report
2013 Green Rep Sustainability ReportErin Zseller
 
Portfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowresPortfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowresCaitlin Bouey
 

En vedette (20)

Discriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leavesDiscriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leaves
 
The Beatles monument in Rostov-on-Don project
The Beatles monument in Rostov-on-Don projectThe Beatles monument in Rostov-on-Don project
The Beatles monument in Rostov-on-Don project
 
Vacuna opv
Vacuna opvVacuna opv
Vacuna opv
 
площадка
площадкаплощадка
площадка
 
Permanenthiring
PermanenthiringPermanenthiring
Permanenthiring
 
Vocabulario agronomico rodriguez avila, rivera giron
Vocabulario agronomico  rodriguez avila, rivera gironVocabulario agronomico  rodriguez avila, rivera giron
Vocabulario agronomico rodriguez avila, rivera giron
 
Getting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteusGetting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteus
 
Fiware IoT Proposal & Community
Fiware IoT Proposal & Community Fiware IoT Proposal & Community
Fiware IoT Proposal & Community
 
Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)
 
3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- Track3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- Track
 
My daily routine
My daily routineMy daily routine
My daily routine
 
Jo Casserley September 2016
Jo Casserley September 2016Jo Casserley September 2016
Jo Casserley September 2016
 
Venezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericanaVenezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericana
 
How to make a vision board powerpoint
How to make a vision board powerpointHow to make a vision board powerpoint
How to make a vision board powerpoint
 
Portugal za romenia
Portugal za romeniaPortugal za romenia
Portugal za romenia
 
портфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівнапортфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівна
 
игри белгия
игри  белгияигри  белгия
игри белгия
 
Verbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avilaVerbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avila
 
2013 Green Rep Sustainability Report
2013 Green Rep Sustainability Report2013 Green Rep Sustainability Report
2013 Green Rep Sustainability Report
 
Portfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowresPortfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowres
 

Similaire à Mypreson 27

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data Jay Nagar
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
Map reduce
Map reduceMap reduce
Map reducexydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introductionYogender Singh
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiJoydeep Sen Sarma
 

Similaire à Mypreson 27 (20)

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introduction
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
Hadoop MapReduce
Hadoop MapReduceHadoop MapReduce
Hadoop MapReduce
 

Dernier

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 

Dernier (20)

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Mypreson 27

  • 1. Big DataProblem Map-Reduce Computer Lab Big Data processing using MapReduce N.Venkatesh 1 1Asst.Professor of CSE JNTUK University College of Engineering,Vizianagaram Email-id:nvenkatesh@jntukucev.ac.in 27 January 2015 N.Venkatesh Big Data processing using MapReduce
  • 2. Big DataProblem Map-Reduce Computer Lab Table of contents 1 Big DataProblem 2 Map-Reduce Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More 3 Computer Lab N.Venkatesh Big Data processing using MapReduce
  • 3. Big DataProblem Map-Reduce Computer Lab Data is Growing much faster than the Computation speeds. Reasons: Data Sources like web,sensors,telescope,RFID, mobiles cheaper storage. N.Venkatesh Big Data processing using MapReduce
  • 4. Big DataProblem Map-Reduce Computer Lab N.Venkatesh Big Data processing using MapReduce
  • 5. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Motivation Distribute the Data into set of nodes which are connected in a network. N.Venkatesh Big Data processing using MapReduce
  • 6. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Motivation Distribute the Data into set of nodes which are connected in a network. Question is how do we Program? Issues while Programming: How to divide the work across nodes(scheduling),How to deal with node failures, stragglers mera machine stuck huva yaar N.Venkatesh Big Data processing using MapReduce
  • 7. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Motivation Distribute the Data into set of nodes which are connected in a network. Question is how do we Program? Issues while Programming: How to divide the work across nodes(scheduling),How to deal with node failures, stragglers mera machine stuck huva yaar Data Parallel model: Automatically takes care of Scheduling, node failures Good example of parallel model is Map-Reduce it Was Invented by Engineers at Google as a system for building Search Index N.Venkatesh Big Data processing using MapReduce
  • 8. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Map-Reduce Programming Model Data-type: Key-Value records Map Function: (Kin, Vin) ⇒ list(Kinter , Vinter ) Reduce Function: (Kinter , list(Vinter )) ⇒ list(Kout, Vout) Key and Value can be any type. N.Venkatesh Big Data processing using MapReduce
  • 9. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Writing MapReduce from Scratch What ever the Data(Data set) and What ever the application Everything Should be Converted into <Key K,Value V> Pairs InputFormat <K,V> Defines Input Splits,Record Reader,Input to the Mapper Mapper <K,V,K,V> Uses map function to Produce Intermediate <K,V> Pairs Combiner<K,V,K,V> and Partitioner <K,V> on Same Mapper Multiple Values associated with the same key Partition the key space based on number of Reducers Reducer<K,V,K,V> Uses reduce function(executed one per Key) OutputFormat<K,V> and Driver(containing main function with job details). Beauty is Everything can be Customized Including Key,Value. N.Venkatesh Big Data processing using MapReduce
  • 10. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. N.Venkatesh Big Data processing using MapReduce
  • 11. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format N.Venkatesh Big Data processing using MapReduce
  • 12. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format Writing map function of Mapper based on the Record returned Record Reader of Input Format N.Venkatesh Big Data processing using MapReduce
  • 13. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format Writing map function of Mapper based on the Record returned Record Reader of Input Format An optional Combiner if there is any possibility of Local Aggregation N.Venkatesh Big Data processing using MapReduce
  • 14. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format Writing map function of Mapper based on the Record returned Record Reader of Input Format An optional Combiner if there is any possibility of Local Aggregation Writing reduce function of Reducer based on the Record returned by Record Reader of Input Format N.Venkatesh Big Data processing using MapReduce
  • 15. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Mapper of WordCount public class WordMapper extends Mapper<LongWritable, Text, Text, LongWritable> { LongWritable one = new LongWritable(1); @Override public void map(LongWritable key, Text value, Context contex) throws IOException, InterruptedException { String line=value.toStrnig(); String [] wordsinline= line.split(" "); for(i=0;i<wordsinline.length;i++) contex.write(wordsinline[i], one); } } N.Venkatesh Big Data processing using MapReduce
  • 16. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Reducer of WordCount public class WordReducer extends Reducer<Text, LongWritable, Text, LongWritable> { LongWritable totalWC = new LongWritable(); @Override public void reduce(Text _key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; for(LongWritable val :values) { wordCount=wordCount+val; } totalWordCount.set(wordCount); context.write(key, totalWC); }} N.Venkatesh Big Data processing using MapReduce
  • 17. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Job Configuration and Job Submission Jobs are controlled by using Configuration,Job Class Objects Configurations are maps from attribute names to string value, Specified by using either set or addResource conf.set(propName, propValue); or conf.addResource(PathContainsPropertiesfile) conf.set(”mapreduce.job.jar”,”/home/hadoop/x.jar”); Job Objects will take Configuartion Object and Parameters like InputPath,OutputPath,Mapper,Reducer etc. N.Venkatesh Big Data processing using MapReduce
  • 18. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Job Driver Example public class WordCount { public static void main(String[] args) throws IOException{ Configuration conf = new Configuration(); conf.set("mapreduce.job.jar","x.jar"); conf.addResource(new Path("conf.xml")); Job job=Job.getInstace(conf,"xyz"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(WordReducer.class); job.setInputPath(new Path(args[0]); job.setOutputPath(new Path(args[1]); job.waitForCompletion(true); } } N.Venkatesh Big Data processing using MapReduce
  • 19. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Building Word Co-Occurrence Matrix From a Large Corpus In general, A Co-occurence matrix could be described as the tracking of an event, and given a window time or space, What other events may occurs. In this context ”words” are events, ”window” relative position of targeted words. Ex:The way to love anything is to realize that it may be lost Co-occurence for the word love is [way,to,anything,is] for window size 2. Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e pair of words and its count. Similar to wordcount only difference is <word,neighbor> should be mapper output key instead of <word>. N.Venkatesh Big Data processing using MapReduce
  • 20. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More More Examples Simple Search : Display all the lines which has given word Map: filter all those lines which has word Reduce: Identity Reducer typical Map only job. Sort the list of word according their count: Nontrivial: two jobs or add a custom key (word, count) Find the number of lines in a file: Tricky one: .... N.Venkatesh Big Data processing using MapReduce
  • 21. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Intresting Examples Social Networking Site Common Friends List: When you visit someone’s profile, you see a list of friends that you have in common. This list doesn’t change frequently so it’d be wasteful to recalculate it every time you visit the profile. venky: priya mouni santosh suresh kumar suresh: santi kumar santosh srinivas divakar .... venky visits suresh profile he should get two common friends srinivas, kumar Map: key (venky,suresh ) : venky’s friends after processing venky friends key (venky, suresh) : suresh’s friends after processing suresh friends Reduce: (venky,suresh): intersection of venky’s and suresh’s friends. N.Venkatesh Big Data processing using MapReduce
  • 22. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Intresting Examples Social Networking Sites Friend Recommend er: People you might know systems based on the Common friends. venky: priya mouni santosh kumar srinivas sahaja: priya mouni santi kumar santosh srinivas divakar .... People you might know system should recommend venky you might know sahaja and also to sahaja you might know venky. take facebook as an example you are taking about 1 billion records.in the context of india 100 million records. Map: after processing every record recommend every friend with other friend in the list. (priya;mouni,c=venky) (mouni;priya,c=venky) ... precaution they might be already friends (venky; priya,c=null) Reduce: combine them same key (priya :mouni 2(venky,sahaja)) N.Venkatesh Big Data processing using MapReduce
  • 23. Big DataProblem Map-Reduce Computer Lab Little bit About Lab Environment Two Clusters (23-node,11-node) Ubuntu Windows Compilation Environment. Eclipse N.Venkatesh Big Data processing using MapReduce