SlideShare une entreprise Scribd logo
1  sur  32
Big Data Project
on
Crystal Ball
Submitted By:
Sushil Sedai(984474)
Suvash Shah(984461)
Submitted to:
Prof. Prem Nair
Pair approach (Mapper) – pseudo
code
method map(docid id, doc d)
for each term w in doc d do
total = 0;
for each neighbor u in Neighbor(w) do
Emit(Pair(w, u), 1);
total++;
Emit(Pair(w, *), total);
Pair approach (Mapper) – Java
Code
Pair approach (Reducer) – pseudo
code
method reduce(Pair p, Iterable<Int> values)
if p.secondValue == *
if p.firstValue is new
currentvalue = p.firstvalue;
marginal = sum(values)
else
marginal += sum(values)
else
Emit(p, sum(values)/marginal);
Pair approach (Reducer) – Java
Code
Pair approach - input
Mapper1 input
18 29 12 34 79 18 56 12 34 92
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Pair approach – Output (Reducer1)
(10,12) 0.5
(10,34) 0.5
(12,10) 0.09090909090909091
(12,18) 0.09090909090909091
(12,34) 0.36363636363636365
(12,56) 0.18181818181818182
(12,79) 0.09090909090909091
(12,92) 0.18181818181818182
(18,12) 0.25
(18,29) 0.125
(18,34) 0.25
(18,56) 0.125
(18,79) 0.125
(18,92) 0.125
(29,10) 0.06666666666666667
(29,12) 0.26666666666666666
(29,18) 0.06666666666666667
(29,34) 0.26666666666666666
(29,56) 0.13333333333333333
(29,79) 0.06666666666666667
(29,92) 0.13333333333333333
(34,10) 0.08333333333333333
(34,12) 0.25
(34,18) 0.08333333333333333
(34,29) 0.08333333333333333
(34,56) 0.25
(34,79) 0.08333333333333333
(34,92) 0.16666666666666666
(56,10) 0.1
(56,12) 0.3
(56,29) 0.1
(56,34) 0.3
(56,92) 0.2
(92,10) 0.3333333333333333
(92,12) 0.3333333333333333
(92,34) 0.3333333333333333
Pair approach – Output (Reducer2)
(79,12) 0.2
(79,18) 0.2
(79,34) 0.2
(79,56) 0.2
(79,92)0.2
Stripe approach (Mapper) –
pseudo code
method map(docid id, doc d)
Stripe H;
for each term w in doc d do
clear(H);
for each neighbor u in Neighbor(w) do
if H.containsKey(u)
H{u} += 1;
else
H.add(u, 1);
Emit(w, H);
Stripe approach (Mapper) – Java
Code
Stripe approach (Reducer) –
pseudo code
total = 0;
method reduce(Text key, Stripe H [H1, H2, …])
total = sumValues(H);
for each Item h in H do
h.secondValue /= total;
Emit(key, H);
Stripe approach (Reducer) – Java
Code
Stripe appoach (Reducer) – Java
Code
Stripe approach – input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Stripe approach –
Output(Reducer1)
10 [ (34,0.5000) (12,0.5000) ]
12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ]
18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ]
29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667)
(12,0.2667) ]
34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833)
(12,0.2500) ]
56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ]
92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
Stripe approach –
Output(Reducer2)
79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000)
(12,0.2000) ]
Hybrid approach (Mapper) –
pseudo code
method map(docid id, doc d)
HashMap H;
for each term w in doc d do
for each neighbor u in Neighbor(w) do
if H.contains(Pair(w, u))
H{Pair(w, u)} += 1;
else
H.add(Pair(w, u));
for each Pair p in H do
Emit(p, H(p));
Hybrid approach (Mapper) – Java
Code
Hybrid approach (Reducer) –
pseudo codeprev = null;
HashMap H;
Method reduce(Pair p, Iterable<Int> values)
if p.firstValue != prev and not first
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
clear(H);
End if
prev = p.firstValue;
H.add(p.secondValue, sum(values));
Method close
//for last pair
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
Hybrid approach (Reducer) – Java
Code
Hybrid approach (Reducer) – Java
Code
Hybrid approach - Input
Mapper1 input
34 56 29 12 34 56 92 10 34 12
Mapper2 input
18 29 12 34 79 18 56 12 34 92
Hybrid approach –
Output(Reducer1)
10 (12,0.5) (34,0.5)
12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909)
(92,0.18181819)
18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125)
29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334)
(79,0.06666667) (92,0.13333334)
34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336)
(92,0.16666667)
56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2)
92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
Hybrid approach –
Output(Reducer2)
79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
Comparison
Apache Spark
Write a java program on spark to calculate total number of
students in MUM coming in different entries.This program
should display total number student by country.
Spark - Java Code
Spark - input
2014 Feb Nepal 20
2014 Feb India 15
2014 Oct Italy 2
2014 July France 1
2015 Feb Nepal 10
2015 Feb India 25
2015 Oct Italy 7
Spark - Output
(France,1)
(Italy,9)
(Nepal,30)
(India,40)
Tools Used
• VMPlayer Pro 7
• cloudera-quickstart-vm-5.4.0-0-vmware
• EclipseVersion: Luna Service Release 2 (4.4.2)
• Windows 8.1
References
• http://glebche.appspot.com/static/hadoop-
ecosystem/mapreduce-job-java.html
• https://hadoopi.wordpress.com/2013/06/05/hadoop-
implementing-the-tool-interface-for-mapreduce-driver/
• http://www.bogotobogo.com/Hadoop/BigData_hadoop_
Apache_Spark.php
ThankYou

Contenu connexe

Tendances

OPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHOPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHCool Guy
 
Bellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksBellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksSimranJain63
 
Lecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsLecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsAakash deep Singhal
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojurePaul Lam
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 
Monads from Definition
Monads from DefinitionMonads from Definition
Monads from DefinitionDierk König
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial NetsJinho Lee
 
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)Kobkrit Viriyayudhakorn
 
Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Takao Wada
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk TomasSri Ambati
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 HokkaidoIntroduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 Hokkaidoikegami__
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Namgee Lee
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
A Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman ProblemA Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman Problemvsubhashini
 

Tendances (20)

OPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCHOPTIMAL BINARY SEARCH
OPTIMAL BINARY SEARCH
 
Bellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer NetworksBellman Ford Routing Algorithm-Computer Networks
Bellman Ford Routing Algorithm-Computer Networks
 
Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014Sharbani bhattacharya sacta 2014
Sharbani bhattacharya sacta 2014
 
Lecture 15 data structures and algorithms
Lecture 15 data structures and algorithmsLecture 15 data structures and algorithms
Lecture 15 data structures and algorithms
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Monads from Definition
Monads from DefinitionMonads from Definition
Monads from Definition
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial Nets
 
R
RR
R
 
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
[Lecture 3] AI and Deep Learning: Logistic Regression (Coding)
 
Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5Trident International Graphics Workshop 2014 5/5
Trident International Graphics Workshop 2014 5/5
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk Tomas
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Introduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 HokkaidoIntroduction to Haskell@Open Source Conference 2007 Hokkaido
Introduction to Haskell@Open Source Conference 2007 Hokkaido
 
Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303Numpy tutorial(final) 20160303
Numpy tutorial(final) 20160303
 
Aaex2 group2
Aaex2 group2Aaex2 group2
Aaex2 group2
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
A Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman ProblemA Signature Scheme as Secure as the Diffie Hellman Problem
A Signature Scheme as Secure as the Diffie Hellman Problem
 

Similaire à CrystalBall - Compute Relative Frequency in Hadoop

Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkJivan Nepali
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfShiwani Gupta
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological ApproachDon Sheehy
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd Iaetsd
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programsAmit Kapoor
 
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...Revolution Analytics
 
Dic rd theory_quantization_07
Dic rd theory_quantization_07Dic rd theory_quantization_07
Dic rd theory_quantization_07wtyru1989
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileKushagraChadha1
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkAshutosh Trivedi
 
Geolocation on Rails
Geolocation on RailsGeolocation on Rails
Geolocation on Railsnebirhos
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationVector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationJanindu Arukgoda
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsLuc Brun
 

Similaire à CrystalBall - Compute Relative Frequency in Hadoop (20)

Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
 
module4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdfmodule4_dynamic programming_2022.pdf
module4_dynamic programming_2022.pdf
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Jan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoopJan 2012 HUG: RHadoop
Jan 2012 HUG: RHadoop
 
Sensors and Samples: A Homological Approach
Sensors and Samples:  A Homological ApproachSensors and Samples:  A Homological Approach
Sensors and Samples: A Homological Approach
 
Iaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detectionIaetsd vlsi implementation of gabor filter based image edge detection
Iaetsd vlsi implementation of gabor filter based image edge detection
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
Cg my own programs
Cg my own programsCg my own programs
Cg my own programs
 
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
R + Hadoop = Big Data Analytics. How Revolution Analytics' RHadoop Project Al...
 
Dic rd theory_quantization_07
Dic rd theory_quantization_07Dic rd theory_quantization_07
Dic rd theory_quantization_07
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical File
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache Spark
 
Geolocation on Rails
Geolocation on RailsGeolocation on Rails
Geolocation on Rails
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot NavigationVector Distance Transform Maps for Autonomous Mobile Robot Navigation
Vector Distance Transform Maps for Autonomous Mobile Robot Navigation
 
Graph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & TrendsGraph Edit Distance: Basics & Trends
Graph Edit Distance: Basics & Trends
 

Dernier

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Dernier (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

CrystalBall - Compute Relative Frequency in Hadoop

  • 1. Big Data Project on Crystal Ball Submitted By: Sushil Sedai(984474) Suvash Shah(984461) Submitted to: Prof. Prem Nair
  • 2. Pair approach (Mapper) – pseudo code method map(docid id, doc d) for each term w in doc d do total = 0; for each neighbor u in Neighbor(w) do Emit(Pair(w, u), 1); total++; Emit(Pair(w, *), total);
  • 3. Pair approach (Mapper) – Java Code
  • 4. Pair approach (Reducer) – pseudo code method reduce(Pair p, Iterable<Int> values) if p.secondValue == * if p.firstValue is new currentvalue = p.firstvalue; marginal = sum(values) else marginal += sum(values) else Emit(p, sum(values)/marginal);
  • 5. Pair approach (Reducer) – Java Code
  • 6. Pair approach - input Mapper1 input 18 29 12 34 79 18 56 12 34 92 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 7. Pair approach – Output (Reducer1) (10,12) 0.5 (10,34) 0.5 (12,10) 0.09090909090909091 (12,18) 0.09090909090909091 (12,34) 0.36363636363636365 (12,56) 0.18181818181818182 (12,79) 0.09090909090909091 (12,92) 0.18181818181818182 (18,12) 0.25 (18,29) 0.125 (18,34) 0.25 (18,56) 0.125 (18,79) 0.125 (18,92) 0.125 (29,10) 0.06666666666666667 (29,12) 0.26666666666666666 (29,18) 0.06666666666666667 (29,34) 0.26666666666666666 (29,56) 0.13333333333333333 (29,79) 0.06666666666666667 (29,92) 0.13333333333333333 (34,10) 0.08333333333333333 (34,12) 0.25 (34,18) 0.08333333333333333 (34,29) 0.08333333333333333 (34,56) 0.25 (34,79) 0.08333333333333333 (34,92) 0.16666666666666666 (56,10) 0.1 (56,12) 0.3 (56,29) 0.1 (56,34) 0.3 (56,92) 0.2 (92,10) 0.3333333333333333 (92,12) 0.3333333333333333 (92,34) 0.3333333333333333
  • 8. Pair approach – Output (Reducer2) (79,12) 0.2 (79,18) 0.2 (79,34) 0.2 (79,56) 0.2 (79,92)0.2
  • 9. Stripe approach (Mapper) – pseudo code method map(docid id, doc d) Stripe H; for each term w in doc d do clear(H); for each neighbor u in Neighbor(w) do if H.containsKey(u) H{u} += 1; else H.add(u, 1); Emit(w, H);
  • 10. Stripe approach (Mapper) – Java Code
  • 11. Stripe approach (Reducer) – pseudo code total = 0; method reduce(Text key, Stripe H [H1, H2, …]) total = sumValues(H); for each Item h in H do h.secondValue /= total; Emit(key, H);
  • 12. Stripe approach (Reducer) – Java Code
  • 13. Stripe appoach (Reducer) – Java Code
  • 14. Stripe approach – input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 15. Stripe approach – Output(Reducer1) 10 [ (34,0.5000) (12,0.5000) ] 12 [ (56,0.1818) (92,0.1818) (34,0.3636) (18,0.0909) (79,0.0909) (10,0.0909) ] 18 [ (56,0.1250) (92,0.1250) (34,0.2500) (79,0.1250) (29,0.1250) (12,0.2500) ] 29 [ (56,0.1333) (92,0.1333) (34,0.2667) (18,0.0667) (79,0.0667) (10,0.0667) (12,0.2667) ] 34 [ (56,0.2500) (92,0.1667) (18,0.0833) (79,0.0833) (29,0.0833) (10,0.0833) (12,0.2500) ] 56 [ (92,0.2000) (34,0.3000) (29,0.1000) (10,0.1000) (12,0.3000) ] 92 [ (34,0.3333) (10,0.3333) (12,0.3333) ]
  • 16. Stripe approach – Output(Reducer2) 79 [ (56,0.2000) (92,0.2000) (34,0.2000) (18,0.2000) (12,0.2000) ]
  • 17. Hybrid approach (Mapper) – pseudo code method map(docid id, doc d) HashMap H; for each term w in doc d do for each neighbor u in Neighbor(w) do if H.contains(Pair(w, u)) H{Pair(w, u)} += 1; else H.add(Pair(w, u)); for each Pair p in H do Emit(p, H(p));
  • 18. Hybrid approach (Mapper) – Java Code
  • 19. Hybrid approach (Reducer) – pseudo codeprev = null; HashMap H; Method reduce(Pair p, Iterable<Int> values) if p.firstValue != prev and not first total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H); clear(H); End if prev = p.firstValue; H.add(p.secondValue, sum(values)); Method close //for last pair total = sumValues(H); for each item h in H h(prev.secondValue) /= total; Emit(p.firstValue, H);
  • 20. Hybrid approach (Reducer) – Java Code
  • 21. Hybrid approach (Reducer) – Java Code
  • 22. Hybrid approach - Input Mapper1 input 34 56 29 12 34 56 92 10 34 12 Mapper2 input 18 29 12 34 79 18 56 12 34 92
  • 23. Hybrid approach – Output(Reducer1) 10 (12,0.5) (34,0.5) 12 (10,0.09090909) (18,0.09090909) (34,0.36363637) (56,0.18181819) (79,0.09090909) (92,0.18181819) 18 (12,0.25) (29,0.125) (34,0.25) (56,0.125) (79,0.125) (92,0.125) 29 (10,0.06666667) (12,0.26666668) (18,0.06666667) (34,0.26666668) (56,0.13333334) (79,0.06666667) (92,0.13333334) 34 (10,0.083333336) (12,0.25) (18,0.083333336) (29,0.083333336) (56,0.25) (79,0.083333336) (92,0.16666667) 56 (10,0.1) (12,0.3) (29,0.1) (34,0.3) (92,0.2) 92 (10,0.33333334) (12,0.33333334) (34,0.33333334)
  • 24. Hybrid approach – Output(Reducer2) 79 (12,0.2) (18,0.2) (34,0.2) (56,0.2) (92,0.2)
  • 26. Apache Spark Write a java program on spark to calculate total number of students in MUM coming in different entries.This program should display total number student by country.
  • 27. Spark - Java Code
  • 28. Spark - input 2014 Feb Nepal 20 2014 Feb India 15 2014 Oct Italy 2 2014 July France 1 2015 Feb Nepal 10 2015 Feb India 25 2015 Oct Italy 7
  • 30. Tools Used • VMPlayer Pro 7 • cloudera-quickstart-vm-5.4.0-0-vmware • EclipseVersion: Luna Service Release 2 (4.4.2) • Windows 8.1