SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Big DataProblem
Map-Reduce
Computer Lab
Big Data processing using MapReduce
N.Venkatesh 1
1Asst.Professor of CSE
JNTUK University College of Engineering,Vizianagaram
Email-id:nvenkatesh@jntukucev.ac.in
27 January 2015
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Table of contents
1 Big DataProblem
2 Map-Reduce
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
3 Computer Lab
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Data is Growing much faster than the Computation speeds.
Reasons: Data Sources like web,sensors,telescope,RFID, mobiles
cheaper storage.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Motivation
Distribute the Data into set of nodes which are connected in a
network.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Motivation
Distribute the Data into set of nodes which are connected in a
network.
Question is how do we Program? Issues while
Programming: How to divide the work across
nodes(scheduling),How to deal with node failures, stragglers
mera machine stuck huva yaar
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Motivation
Distribute the Data into set of nodes which are connected in a
network.
Question is how do we Program? Issues while
Programming: How to divide the work across
nodes(scheduling),How to deal with node failures, stragglers
mera machine stuck huva yaar
Data Parallel model: Automatically takes care of Scheduling,
node failures Good example of parallel model is Map-Reduce
it Was Invented by Engineers at Google as a system for
building Search Index
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Map-Reduce Programming Model
Data-type: Key-Value records
Map Function:
(Kin, Vin) ⇒ list(Kinter , Vinter )
Reduce Function:
(Kinter , list(Vinter )) ⇒ list(Kout, Vout)
Key and Value can be any type.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Writing MapReduce from Scratch
What ever the Data(Data set) and What ever the application
Everything Should be Converted into <Key K,Value V> Pairs
InputFormat <K,V>
Defines Input Splits,Record Reader,Input to the Mapper
Mapper <K,V,K,V>
Uses map function to Produce Intermediate <K,V> Pairs
Combiner<K,V,K,V> and Partitioner <K,V>
on Same Mapper Multiple Values associated with the same key
Partition the key space based on number of Reducers
Reducer<K,V,K,V>
Uses reduce function(executed one per Key)
OutputFormat<K,V> and Driver(containing main function
with job details).
Beauty is Everything can be Customized Including Key,Value.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
Writing map function of Mapper based on the Record
returned Record Reader of Input Format
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
Writing map function of Mapper based on the Record
returned Record Reader of Input Format
An optional Combiner if there is any possibility of Local
Aggregation
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Deep-Dive Into MapReduce Hello World(WordCount)
Definition: Find the number of occurrences of every word in a
document or set of documents
Solution(Think !!!): how to convert this problem into
Problem of starting from some <key,value> pairs and ending
at <key,value> pairs i.e word and its count.
Selecting Input Formats Fitting your needs,if not go for
Customized Input Format
Writing map function of Mapper based on the Record
returned Record Reader of Input Format
An optional Combiner if there is any possibility of Local
Aggregation
Writing reduce function of Reducer based on the Record
returned by Record Reader of Input Format
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Mapper of WordCount
public class WordMapper extends Mapper<LongWritable,
Text, Text, LongWritable> {
LongWritable one = new LongWritable(1);
@Override
public void map(LongWritable key, Text value,
Context contex) throws IOException,
InterruptedException {
String line=value.toStrnig();
String [] wordsinline= line.split(" ");
for(i=0;i<wordsinline.length;i++)
contex.write(wordsinline[i], one);
}
}
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Reducer of WordCount
public class WordReducer extends Reducer<Text,
LongWritable, Text, LongWritable> {
LongWritable totalWC = new LongWritable();
@Override
public void reduce(Text _key, Iterable<LongWritable>
values, Context context) throws IOException,
InterruptedException {
int wordCount = 0;
for(LongWritable val :values)
{ wordCount=wordCount+val;
}
totalWordCount.set(wordCount);
context.write(key, totalWC);
}}
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Job Configuration and Job Submission
Jobs are controlled by using Configuration,Job Class Objects
Configurations are maps from attribute names to string value,
Specified by using either set or addResource
conf.set(propName, propValue); or
conf.addResource(PathContainsPropertiesfile)
conf.set(”mapreduce.job.jar”,”/home/hadoop/x.jar”);
Job Objects will take Configuartion Object and Parameters
like InputPath,OutputPath,Mapper,Reducer etc.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Job Driver Example
public class WordCount {
public static void main(String[] args) throws IOException{
Configuration conf = new Configuration();
conf.set("mapreduce.job.jar","x.jar");
conf.addResource(new Path("conf.xml"));
Job job=Job.getInstace(conf,"xyz");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(WordReducer.class);
job.setInputPath(new Path(args[0]);
job.setOutputPath(new Path(args[1]);
job.waitForCompletion(true);
}
}
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Building Word Co-Occurrence Matrix From a Large Corpus
In general, A Co-occurence matrix could be described as the
tracking of an event, and given a window time or space, What
other events may occurs. In this context ”words” are events,
”window” relative position of targeted words.
Ex:The way to love anything is to realize that it may be lost
Co-occurence for the word love is [way,to,anything,is] for window size 2.
Solution(Think !!!): how to convert this problem into Problem of
starting from some <key,value> pairs and ending at <key,value>
pairs i.e pair of words and its count.
Similar to wordcount only difference is <word,neighbor> should be
mapper output key instead of <word>.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
More Examples
Simple Search : Display all the lines which has given word
Map: filter all those lines which has word
Reduce: Identity Reducer
typical Map only job.
Sort the list of word according their count:
Nontrivial: two jobs or add a custom key (word, count)
Find the number of lines in a file:
Tricky one: ....
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Intresting Examples
Social Networking Site Common Friends List: When you visit
someone’s profile, you see a list of friends that you have in
common. This list doesn’t change frequently so it’d be wasteful to
recalculate it every time you visit the profile.
venky: priya mouni santosh suresh kumar
suresh: santi kumar santosh srinivas divakar ....
venky visits suresh profile he should get two common friends
srinivas, kumar
Map:
key (venky,suresh ) : venky’s friends after processing venky friends
key (venky, suresh) : suresh’s friends after processing suresh friends
Reduce:
(venky,suresh): intersection of venky’s and suresh’s friends.
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little-bit About Map-Reduce
Elements of MapReduce Programs
Hello World and More
Intresting Examples
Social Networking Sites Friend Recommend er: People you
might know systems based on the Common friends.
venky: priya mouni santosh kumar srinivas
sahaja: priya mouni santi kumar santosh srinivas divakar ....
People you might know system should recommend venky you
might know sahaja and also to sahaja you might know venky. take
facebook as an example you are taking about 1 billion records.in
the context of india 100 million records.
Map: after processing every record recommend every friend with
other friend in the list. (priya;mouni,c=venky)
(mouni;priya,c=venky) ...
precaution they might be already friends (venky; priya,c=null)
Reduce: combine them same key (priya :mouni 2(venky,sahaja))
N.Venkatesh Big Data processing using MapReduce
Big DataProblem
Map-Reduce
Computer Lab
Little bit About Lab Environment
Two Clusters (23-node,11-node)
Ubuntu
Windows Compilation Environment.
Eclipse
N.Venkatesh Big Data processing using MapReduce

Contenu connexe

Tendances

Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011Bin Cai
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingMohammad Mustaqeem
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudBharat Rane
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceBhupesh Chawda
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Hyun Wong Choi
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?TerrierTeam
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010BOSC 2010
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 

Tendances (20)

IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
Planet
PlanetPlanet
Planet
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
T180304125129
T180304125129T180304125129
T180304125129
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Hadoop
HadoopHadoop
Hadoop
 
Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2Master defense presentation 2019 04_18_rev2
Master defense presentation 2019 04_18_rev2
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?Comparing Distributed Indexing To Mapreduce or Not?
Comparing Distributed Indexing To Mapreduce or Not?
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
E031201032036
E031201032036E031201032036
E031201032036
 
Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 

En vedette

Discriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leavesDiscriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leavesDanChitwood
 
площадка
площадкаплощадка
площадкаmegikatq
 
Permanenthiring
PermanenthiringPermanenthiring
PermanenthiringNeuhiring
 
Vocabulario agronomico rodriguez avila, rivera giron
Vocabulario agronomico  rodriguez avila, rivera gironVocabulario agronomico  rodriguez avila, rivera giron
Vocabulario agronomico rodriguez avila, rivera gironJussely Rodríguez
 
Getting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteusGetting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteusrnrao569
 
Fiware IoT Proposal & Community
Fiware IoT Proposal & Community Fiware IoT Proposal & Community
Fiware IoT Proposal & Community TIDChile
 
Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)pgkikasv
 
3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- Track3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- TrackTIDChile
 
My daily routine
My daily routineMy daily routine
My daily routinepgkikasv
 
Jo Casserley September 2016
Jo Casserley September 2016Jo Casserley September 2016
Jo Casserley September 2016Jo Casserley
 
Venezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericanaVenezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericanaCarlos Turdera
 
How to make a vision board powerpoint
How to make a vision board powerpointHow to make a vision board powerpoint
How to make a vision board powerpointdonavon1991
 
Portugal za romenia
Portugal za romeniaPortugal za romenia
Portugal za romeniamegikatq
 
портфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівнапортфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівнаkviatkovska
 
игри белгия
игри  белгияигри  белгия
игри белгияmegikatq
 
Verbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avilaVerbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avilaJussely Rodríguez
 
2013 Green Rep Sustainability Report
2013 Green Rep Sustainability Report2013 Green Rep Sustainability Report
2013 Green Rep Sustainability ReportErin Zseller
 
Portfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowresPortfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowresCaitlin Bouey
 

En vedette (20)

Discriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leavesDiscriminating shapes: On violins and the latent morphology of grape leaves
Discriminating shapes: On violins and the latent morphology of grape leaves
 
The Beatles monument in Rostov-on-Don project
The Beatles monument in Rostov-on-Don projectThe Beatles monument in Rostov-on-Don project
The Beatles monument in Rostov-on-Don project
 
Vacuna opv
Vacuna opvVacuna opv
Vacuna opv
 
площадка
площадкаплощадка
площадка
 
Permanenthiring
PermanenthiringPermanenthiring
Permanenthiring
 
Vocabulario agronomico rodriguez avila, rivera giron
Vocabulario agronomico  rodriguez avila, rivera gironVocabulario agronomico  rodriguez avila, rivera giron
Vocabulario agronomico rodriguez avila, rivera giron
 
Getting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteusGetting started with 8051 at89 c51 using keil uvision 4 and proteus
Getting started with 8051 at89 c51 using keil uvision 4 and proteus
 
Fiware IoT Proposal & Community
Fiware IoT Proposal & Community Fiware IoT Proposal & Community
Fiware IoT Proposal & Community
 
Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)Crea presentación de microsoft office power point 97 2003 (2)
Crea presentación de microsoft office power point 97 2003 (2)
 
3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- Track3° Fiware Overview-Chile- Track
3° Fiware Overview-Chile- Track
 
My daily routine
My daily routineMy daily routine
My daily routine
 
Jo Casserley September 2016
Jo Casserley September 2016Jo Casserley September 2016
Jo Casserley September 2016
 
Venezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericanaVenezuela, una tragedia latinoamericana
Venezuela, una tragedia latinoamericana
 
How to make a vision board powerpoint
How to make a vision board powerpointHow to make a vision board powerpoint
How to make a vision board powerpoint
 
Portugal za romenia
Portugal za romeniaPortugal za romenia
Portugal za romenia
 
портфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівнапортфоліо Квятковська Оксана Павлівна
портфоліо Квятковська Оксана Павлівна
 
игри белгия
игри  белгияигри  белгия
игри белгия
 
Verbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avilaVerbos y cuantificadores, rodriguez avila
Verbos y cuantificadores, rodriguez avila
 
2013 Green Rep Sustainability Report
2013 Green Rep Sustainability Report2013 Green Rep Sustainability Report
2013 Green Rep Sustainability Report
 
Portfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowresPortfolio-PDF_v01-lowres
Portfolio-PDF_v01-lowres
 

Similaire à Mypreson 27

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data Jay Nagar
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
Map reduce
Map reduceMap reduce
Map reducexydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introductionYogender Singh
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiJoydeep Sen Sarma
 

Similaire à Mypreson 27 (20)

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introduction
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
Hadoop MapReduce
Hadoop MapReduceHadoop MapReduce
Hadoop MapReduce
 

Dernier

Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineAftabkhan575376
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...ShivamTiwari995432
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfAshrafRagab14
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)NareenAsad
 
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...jiyav969
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsSheetal Jain
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Lovely Professional University
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdfKamal Acharya
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdfKamal Acharya
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 

Dernier (20)

Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
Fabrication Of Automatic Star Delta Starter Using Relay And GSM Module By Utk...
 
Piping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdfPiping and instrumentation diagram p.pdf
Piping and instrumentation diagram p.pdf
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)Operating System chapter 9 (Virtual Memory)
Operating System chapter 9 (Virtual Memory)
 
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
Activity Planning: Objectives, Project Schedule, Network Planning Model. Time...
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 

Mypreson 27

  • 1. Big DataProblem Map-Reduce Computer Lab Big Data processing using MapReduce N.Venkatesh 1 1Asst.Professor of CSE JNTUK University College of Engineering,Vizianagaram Email-id:nvenkatesh@jntukucev.ac.in 27 January 2015 N.Venkatesh Big Data processing using MapReduce
  • 2. Big DataProblem Map-Reduce Computer Lab Table of contents 1 Big DataProblem 2 Map-Reduce Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More 3 Computer Lab N.Venkatesh Big Data processing using MapReduce
  • 3. Big DataProblem Map-Reduce Computer Lab Data is Growing much faster than the Computation speeds. Reasons: Data Sources like web,sensors,telescope,RFID, mobiles cheaper storage. N.Venkatesh Big Data processing using MapReduce
  • 4. Big DataProblem Map-Reduce Computer Lab N.Venkatesh Big Data processing using MapReduce
  • 5. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Motivation Distribute the Data into set of nodes which are connected in a network. N.Venkatesh Big Data processing using MapReduce
  • 6. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Motivation Distribute the Data into set of nodes which are connected in a network. Question is how do we Program? Issues while Programming: How to divide the work across nodes(scheduling),How to deal with node failures, stragglers mera machine stuck huva yaar N.Venkatesh Big Data processing using MapReduce
  • 7. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Motivation Distribute the Data into set of nodes which are connected in a network. Question is how do we Program? Issues while Programming: How to divide the work across nodes(scheduling),How to deal with node failures, stragglers mera machine stuck huva yaar Data Parallel model: Automatically takes care of Scheduling, node failures Good example of parallel model is Map-Reduce it Was Invented by Engineers at Google as a system for building Search Index N.Venkatesh Big Data processing using MapReduce
  • 8. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Map-Reduce Programming Model Data-type: Key-Value records Map Function: (Kin, Vin) ⇒ list(Kinter , Vinter ) Reduce Function: (Kinter , list(Vinter )) ⇒ list(Kout, Vout) Key and Value can be any type. N.Venkatesh Big Data processing using MapReduce
  • 9. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Writing MapReduce from Scratch What ever the Data(Data set) and What ever the application Everything Should be Converted into <Key K,Value V> Pairs InputFormat <K,V> Defines Input Splits,Record Reader,Input to the Mapper Mapper <K,V,K,V> Uses map function to Produce Intermediate <K,V> Pairs Combiner<K,V,K,V> and Partitioner <K,V> on Same Mapper Multiple Values associated with the same key Partition the key space based on number of Reducers Reducer<K,V,K,V> Uses reduce function(executed one per Key) OutputFormat<K,V> and Driver(containing main function with job details). Beauty is Everything can be Customized Including Key,Value. N.Venkatesh Big Data processing using MapReduce
  • 10. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. N.Venkatesh Big Data processing using MapReduce
  • 11. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format N.Venkatesh Big Data processing using MapReduce
  • 12. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format Writing map function of Mapper based on the Record returned Record Reader of Input Format N.Venkatesh Big Data processing using MapReduce
  • 13. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format Writing map function of Mapper based on the Record returned Record Reader of Input Format An optional Combiner if there is any possibility of Local Aggregation N.Venkatesh Big Data processing using MapReduce
  • 14. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Deep-Dive Into MapReduce Hello World(WordCount) Definition: Find the number of occurrences of every word in a document or set of documents Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e word and its count. Selecting Input Formats Fitting your needs,if not go for Customized Input Format Writing map function of Mapper based on the Record returned Record Reader of Input Format An optional Combiner if there is any possibility of Local Aggregation Writing reduce function of Reducer based on the Record returned by Record Reader of Input Format N.Venkatesh Big Data processing using MapReduce
  • 15. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Mapper of WordCount public class WordMapper extends Mapper<LongWritable, Text, Text, LongWritable> { LongWritable one = new LongWritable(1); @Override public void map(LongWritable key, Text value, Context contex) throws IOException, InterruptedException { String line=value.toStrnig(); String [] wordsinline= line.split(" "); for(i=0;i<wordsinline.length;i++) contex.write(wordsinline[i], one); } } N.Venkatesh Big Data processing using MapReduce
  • 16. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Reducer of WordCount public class WordReducer extends Reducer<Text, LongWritable, Text, LongWritable> { LongWritable totalWC = new LongWritable(); @Override public void reduce(Text _key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; for(LongWritable val :values) { wordCount=wordCount+val; } totalWordCount.set(wordCount); context.write(key, totalWC); }} N.Venkatesh Big Data processing using MapReduce
  • 17. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Job Configuration and Job Submission Jobs are controlled by using Configuration,Job Class Objects Configurations are maps from attribute names to string value, Specified by using either set or addResource conf.set(propName, propValue); or conf.addResource(PathContainsPropertiesfile) conf.set(”mapreduce.job.jar”,”/home/hadoop/x.jar”); Job Objects will take Configuartion Object and Parameters like InputPath,OutputPath,Mapper,Reducer etc. N.Venkatesh Big Data processing using MapReduce
  • 18. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Job Driver Example public class WordCount { public static void main(String[] args) throws IOException{ Configuration conf = new Configuration(); conf.set("mapreduce.job.jar","x.jar"); conf.addResource(new Path("conf.xml")); Job job=Job.getInstace(conf,"xyz"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(WordReducer.class); job.setInputPath(new Path(args[0]); job.setOutputPath(new Path(args[1]); job.waitForCompletion(true); } } N.Venkatesh Big Data processing using MapReduce
  • 19. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Building Word Co-Occurrence Matrix From a Large Corpus In general, A Co-occurence matrix could be described as the tracking of an event, and given a window time or space, What other events may occurs. In this context ”words” are events, ”window” relative position of targeted words. Ex:The way to love anything is to realize that it may be lost Co-occurence for the word love is [way,to,anything,is] for window size 2. Solution(Think !!!): how to convert this problem into Problem of starting from some <key,value> pairs and ending at <key,value> pairs i.e pair of words and its count. Similar to wordcount only difference is <word,neighbor> should be mapper output key instead of <word>. N.Venkatesh Big Data processing using MapReduce
  • 20. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More More Examples Simple Search : Display all the lines which has given word Map: filter all those lines which has word Reduce: Identity Reducer typical Map only job. Sort the list of word according their count: Nontrivial: two jobs or add a custom key (word, count) Find the number of lines in a file: Tricky one: .... N.Venkatesh Big Data processing using MapReduce
  • 21. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Intresting Examples Social Networking Site Common Friends List: When you visit someone’s profile, you see a list of friends that you have in common. This list doesn’t change frequently so it’d be wasteful to recalculate it every time you visit the profile. venky: priya mouni santosh suresh kumar suresh: santi kumar santosh srinivas divakar .... venky visits suresh profile he should get two common friends srinivas, kumar Map: key (venky,suresh ) : venky’s friends after processing venky friends key (venky, suresh) : suresh’s friends after processing suresh friends Reduce: (venky,suresh): intersection of venky’s and suresh’s friends. N.Venkatesh Big Data processing using MapReduce
  • 22. Big DataProblem Map-Reduce Computer Lab Little-bit About Map-Reduce Elements of MapReduce Programs Hello World and More Intresting Examples Social Networking Sites Friend Recommend er: People you might know systems based on the Common friends. venky: priya mouni santosh kumar srinivas sahaja: priya mouni santi kumar santosh srinivas divakar .... People you might know system should recommend venky you might know sahaja and also to sahaja you might know venky. take facebook as an example you are taking about 1 billion records.in the context of india 100 million records. Map: after processing every record recommend every friend with other friend in the list. (priya;mouni,c=venky) (mouni;priya,c=venky) ... precaution they might be already friends (venky; priya,c=null) Reduce: combine them same key (priya :mouni 2(venky,sahaja)) N.Venkatesh Big Data processing using MapReduce
  • 23. Big DataProblem Map-Reduce Computer Lab Little bit About Lab Environment Two Clusters (23-node,11-node) Ubuntu Windows Compilation Environment. Eclipse N.Venkatesh Big Data processing using MapReduce