Diamond Application Development Crafting Solutions with Precision
CrystalBall - Compute Relative Frequency in Hadoop
1. Big Data Project
on
Crystal Ball
Submitted By:
Sushil Sedai(984474)
Suvash Shah(984461)
Submitted to:
Prof. Prem Nair
2. Pair approach (Mapper) – pseudo
code
method map(docid id, doc d)
for each term w in doc d do
total = 0;
for each neighbor u in Neighbor(w) do
Emit(Pair(w, u), 1);
total++;
Emit(Pair(w, *), total);
9. Stripe approach (Mapper) –
pseudo code
method map(docid id, doc d)
Stripe H;
for each term w in doc d do
clear(H);
for each neighbor u in Neighbor(w) do
if H.containsKey(u)
H{u} += 1;
else
H.add(u, 1);
Emit(w, H);
11. Stripe approach (Reducer) –
pseudo code
total = 0;
method reduce(Text key, Stripe H [H1, H2, …])
total = sumValues(H);
for each Item h in H do
h.secondValue /= total;
Emit(key, H);
17. Hybrid approach (Mapper) –
pseudo code
method map(docid id, doc d)
HashMap H;
for each term w in doc d do
for each neighbor u in Neighbor(w) do
if H.contains(Pair(w, u))
H{Pair(w, u)} += 1;
else
H.add(Pair(w, u));
for each Pair p in H do
Emit(p, H(p));
19. Hybrid approach (Reducer) –
pseudo codeprev = null;
HashMap H;
Method reduce(Pair p, Iterable<Int> values)
if p.firstValue != prev and not first
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
clear(H);
End if
prev = p.firstValue;
H.add(p.secondValue, sum(values));
Method close
//for last pair
total = sumValues(H);
for each item h in H
h(prev.secondValue) /= total;
Emit(p.firstValue, H);
26. Apache Spark
Write a java program on spark to calculate total number of
students in MUM coming in different entries.This program
should display total number student by country.