SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 1
Big Data Hadoop
Local and Public Cloud
Hands On Workshop
Dr.Thanachart Numnonda
thanachart@imcinstitute.com
Danairat T.
Certified Java Programmer, TOGAF – Silver
danairat@gmail.com, +66-81-559-1446
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 2
Hands-On: Running Hadoop
on Amazon Elastic MapReduce
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 3
Architecture Overview of Amazon EMR
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 4
Creating an AWS account
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 5
Signing up for the necessary services
●
Simple Storage Service (S3)
●
Elastic Compute Cloud (EC2)
●
Elastic MapReduce (EMR)
Caution! This costs real money!
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 6
Creating Amazon EC2 Instance
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 7
Creating Amazon S3 bucket
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 8
Create access key using Security Credentials
in the AWS Management Console
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 9
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 10
Creating a new Job Flow in EMR
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 11
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 12
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 13
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 14
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 15
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 16
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 17
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 18
View Result from the S3 bucket
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 19
Lecture: Understanding Map Reduce
Processing
Client
Name Node Job Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
Map Reduce
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 20
MapReduce Framework
map: (K1, V1) -> list(K2, V2))
reduce: (K2, list(V2)) -> list(K3, V3)
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 21
MapReduce Processing – The Data
flow
1. InputFormat, InputSplits, RecordReader
2. Mapper - your focus is here
3. Partition, Shuffle & Sort
4. Reducer - your focus is here
5. OutputFormat, RecordWriter
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 22
How does the MapReduce work?
Output in a list of (Key, List of Values)
in the intermediate file
Sorting
Partitioning
Output in a list of (Key, Value)
in the intermediate file
InputSplit
RecordReader
RecordWriter
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 23
How does the MapReduce work?
Sorting
Partitioning
Combining
Car, 2
Car, 2
Bear, {1,1}
Car, {2,1}
River, {1,1}
Deer, {1,1}
Output in a list of (Key, List of Values)
in the intermediate file
Output in a list of (Key, Value)
in the intermediate file
InputSplit
RecordReader
RecordWriter
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 24
Hands-On: Writing you own Map
Reduce Program
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 25
Wordcount (HelloWord in Hadoop)
1. package org.myorg;
2.
3. import java.io.IOException;
4. import java.util.*;
5.
6. import org.apache.hadoop.fs.Path;
7. import org.apache.hadoop.conf.*;
8. import org.apache.hadoop.io.*;
9. import org.apache.hadoop.mapred.*;
10. import org.apache.hadoop.util.*;
11.
12. public class WordCount {
13.
14.
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text,
IntWritable> {
15. private final static IntWritable one = new IntWritable(1);
16. private Text word = new Text();
17.
18.
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException {
19. String line = value.toString();
20. StringTokenizer tokenizer = new StringTokenizer(line);
21. while (tokenizer.hasMoreTokens()) {
22. word.set(tokenizer.nextToken());
23. output.collect(word, one);
24. }
25. }
26. }
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 26
Wordcount (HelloWord in Hadoop)
27.
28. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text,
IntWritable> {
29.
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException {
30. int sum = 0;
31. while (values.hasNext()) {
32. sum += values.next().get();
33. }
34. output.collect(key, new IntWritable(sum));
35. }
36. }
37.
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 27
Wordcount (HelloWord in Hadoop)
38. public static void main(String[] args) throws Exception {
39. JobConf conf = new JobConf(WordCount.class);
40. conf.setJobName("wordcount");
41.
42. conf.setOutputKeyClass(Text.class);
43. conf.setOutputValueClass(IntWritable.class);
44.
45. conf.setMapperClass(Map.class);
46.
47. conf.setReducerClass(Reduce.class);
48.
49. conf.setInputFormat(TextInputFormat.class);
50. conf.setOutputFormat(TextOutputFormat.class);
51.
52. FileInputFormat.setInputPaths(conf, new Path(args[0]));
53. FileOutputFormat.setOutputPath(conf, new Path(args[1]));
54.
55. JobClient.runJob(conf);
57. }
58. }
59.
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 28
Hands-On: Packaging Map Reduce
and Deploying to Hadoop Runtime
Environment
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 29
Packaging Map Reduce Program
Usage
Assuming HADOOP_HOME is the root of the installation and HADOOP_VERSION is the Hadoop version
installed, compile WordCount.java and create a jar:
$ mkdir /home/hduser/wordcount_classes
$ cd /home/hduser
$ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d wordcount_classes WordCount.java
$ jar -cvf ./wordcount.jar -C wordcount_classes/ .
$ hadoop jar ./wordcount.jar org.myorg.WordCount /input/* /output/wordcount_output_dir
Output:
…….
$ hadoop dfs -cat /output/wordcount_output_dir/part-00000
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 30
Hands-On: Running WordCount.jar
on Amazon EMR
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 31
Upload .jar file and input file to
Amazon S3
1. Select <yourbucket> in Amazon S3 service
2. Create folder : applications
3. Upload wordcount.jar to the applications folder
4. Create another folder: input
5. Upload input_test.txt to the input folder
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 32
Running a new Job Flow in EMR
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 33
Input JAR Location and Arguments
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 34
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 35
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 36
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 37
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 38
View the Result
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 39
Hands-On: Analytics Using
MapReduce
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 40
Three Analytic MapReduce Examples
1. Simple analytics using MapReduce
2. Performing Group-By using MapReduce
3. Calculating frequency distributions and
sorting using MapReduce
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 41
NASA weblog dataset available from
http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
is a real-life dataset collected using the requests received by NASA web servers.
Download the weblog dataset from ftp://ita.ee.lbl.gov/traces/NASA_
access_log_Jul95.gz and unzip it. We call the extracted folder as DATA_DIR.
$ hadoopdfs -mkdir /data
$ hadoopdfs -put <DATA_DIR>/NASA_access_log_Jul95 /data/input1
Preparing Example Data
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 42
Aggregative values (for example, Mean, Max, Min, standard deviation, and so on)
provide the basic analytics about a dataset..
Simple analytics using MapReduce
Source: Hadoop MapReduce CookBook
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 43
package analysis;
import java.io.IOException;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.conf.*;
public class WebLogMessageSizeAggregator {
public static final Pattern httplogPattern = Pattern
.compile("([^s]+) - - [(.+)] "([^s]+) (/[^s]*) HTTP/[^s]+"
[^s]+ ([0-9]+)");
public static class AMapper extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
WebLogMessageSizeAggregator.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 44
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output,Reporter reporter) throws IOException {
Matcher matcher = httplogPattern.matcher(value.toString());
if (matcher.matches()) {
int size = Integer.parseInt(matcher.group(5));
output.collect(new Text("msgSize"), new IntWritable(size));
}
}
}
WebLogMessageSizeAggregator.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 45
public static class AReducer extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException {
double tot = 0;
int count = 0;
int min = Integer.MAX_VALUE;
int max = 0;
while (values.hasNext()) {
int value = values.next().get();
tot = tot + value;
count++;
if (value < min) {
min = value;
}
if (value > max) {
max = value;
}
}
WebLogMessageSizeAggregator.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 46
output.collect(new Text("Mean"), new IntWritable((int) tot / count));
output.collect(new Text("Max"), new IntWritable(max));
output.collect(new Text("Min"), new IntWritable(min));
}
}
public static void main(String[] args) throws Exception {
JobConf job = new JobConf(WebLogMessageSizeAggregator.class);
job.setJarByClass(WebLogMessageSizeAggregator.class);
job.setMapperClass(AMapper.class);
job.setReducerClass(AReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
JobClient.runJob(job);
}
}
WebLogMessageSizeAggregator.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 47
Compile, Build JAR, Submit Job, Review Result
$ cd /home/hduser
$ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d WebLog WebLogMessageSizeAggregator.java
$ jar -cvf ./weblog.jar -C WebLog .
$ hadoop jar ./weblog.jar analysis.WebLogMessageSizeAggregator /data/* /output/result_weblog
Output:
......
$ hadoop dfs -cat /output/result_weblog/part-00000
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 48
A MapReduce to group data into simple groups and calculate the analytics for
each group.
Performing Group-By using MapReduce
Source: Hadoop MapReduce CookBook
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 49
public class WeblogHitsByLinkProcessor {
public static final Pattern httplogPattern = Pattern
.compile("([^s]+) - - [(.+)] "([^s]+) (/[^s]*) HTTP/[^s]+"
[^s]+ ([0-9]+)");
public static class AMapper extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output,Reporter reporter) throws IOException {
Matcher matcher = httplogPattern.matcher(value.toString());
if (matcher.matches()) {
String linkUrl = matcher.group(4);
word.set(linkUrl);
output.collect(word, one);
}
}
}
WeblogHitsByLinkProcessor.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 50
public static class AReducer extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
result.set(sum);
output.collect(key, result);
}
}
WeblogHitsByLinkProcessor.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 51
Compile, Build JAR, Submit Job, Review Result
$ cd /home/hduser
$ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d WebLogHit WeblogHitsByLinkProcessor.java
$ jar -cvf ./webloghit.jar -C WebLogHit .
$ hadoop jar ./webloghit.jar analysis.WeblogHitsByLinkProcessor /data/* /output/result_webloghit
Output:
......
$ hadoop dfs -cat /output/result_webloghit/part-00000
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 52
Frequency distribution is the number of hits received by each URL sorted in the
Ascending order, by the number hits received by a URL. We have already calculated
the number of hits inthe previous program.
Calculating frequency distributions and
sorting using MapReduce
Source: Hadoop MapReduce CookBook
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 53
public class WeblogFrequencyDistributionProcessor {
public static final Pattern httplogPattern = Pattern.compile("([^s]+) - - [(.
+)] "([^s]+) (/[^s]*) HTTP/[^s]+" [^s]+ ([0-9]+)");
public static class AMapper extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output,Reporter reporter) throws IOException {
String[] tokens = value.toString().split("s");
output.collect(new Text(tokens[0]),new
IntWritable(Integer.parseInt(tokens[1])));
}
}
WeblogFrequencyDistributionProcessor.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 54
/**
* <p>Reduce function receives all the values that has the same key as the input,
and it output the key
* and the number of occurrences of the key as the output.</p>
*/
public static class AReducer extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException {
if(values.hasNext()){
output.collect(key, values.next());
}
}
}
WeblogFrequencyDistributionProcessor.java
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 55
Compile, Build JAR, Submit Job, Review Result
$ cd /home/hduser
$ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d WebLogFreq
WWeblogFrequencyDistributionProcessor.java
$ jar -cvf ./weblogfreq.jar -C WebLogFreq .
$ hadoop jar ./weblogfreq.jar analysis.WeblogFrequencyDistributionProcessor /output/result_webloghit/*
/output/result_weblogfreq
Output:
......
$ hadoop dfs -cat /output/result_weblogfreq/part-00000
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 56
Exercise: Runming the analytic
programs on Amazon EMR.
Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 57
Thank you

Contenu connexe

Tendances

Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015IMC Institute
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Hadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdHadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdKevin Weil
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java ProfessionalsEdureka!
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
R Hadoop integration
R Hadoop integrationR Hadoop integration
R Hadoop integrationDzung Nguyen
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview QuestionsZaranTech LLC
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
The hadoop 2.0 ecosystem and yarn
The hadoop 2.0 ecosystem and yarnThe hadoop 2.0 ecosystem and yarn
The hadoop 2.0 ecosystem and yarnMichael Joseph
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothmanDenis Rothman
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 

Tendances (20)

Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Hadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdHadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant bird
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java Professionals
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
R Hadoop integration
R Hadoop integrationR Hadoop integration
R Hadoop integration
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions10 Popular Hadoop Technical Interview Questions
10 Popular Hadoop Technical Interview Questions
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
The hadoop 2.0 ecosystem and yarn
The hadoop 2.0 ecosystem and yarnThe hadoop 2.0 ecosystem and yarn
The hadoop 2.0 ecosystem and yarn
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothman
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 

Similaire à Big Data Hadoop Local and Public Cloud (Amazon EMR)

Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria López
 
2. Develop a MapReduce program to calculate the frequency of a given word in ...
2. Develop a MapReduce program to calculate the frequency of a given word in ...2. Develop a MapReduce program to calculate the frequency of a given word in ...
2. Develop a MapReduce program to calculate the frequency of a given word in ...Prof. Maulik Trivedi
 
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop clusterAmazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop clusterAsociatia ProLinux
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceKrishna Sangeeth KS
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Rajat Mittal
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015Asaf Ben Gal
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn HadoopSilicon Halton
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed DatasetsGabriele Modena
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 

Similaire à Big Data Hadoop Local and Public Cloud (Amazon EMR) (20)

Hadoop Mapreduce
Hadoop MapreduceHadoop Mapreduce
Hadoop Mapreduce
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
2. Develop a MapReduce program to calculate the frequency of a given word in ...
2. Develop a MapReduce program to calculate the frequency of a given word in ...2. Develop a MapReduce program to calculate the frequency of a given word in ...
2. Develop a MapReduce program to calculate the frequency of a given word in ...
 
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop clusterAmazon-style shopping cart analysis using MapReduce on a Hadoop cluster
Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Sentiment Analysis using Big Data
Sentiment Analysis using Big Data Sentiment Analysis using Big Data
Sentiment Analysis using Big Data
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Data Science
Data ScienceData Science
Data Science
 
MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015MATLAB_BIg_Data_ds_Haddop_22032015
MATLAB_BIg_Data_ds_Haddop_22032015
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 

Plus de IMC Institute

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AIIMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital TransformationIMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมIMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon ValleyIMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9 IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.orgIMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgIMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital TransformationIMC Institute
 

Plus de IMC Institute (20)

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
 

Dernier

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Big Data Hadoop Local and Public Cloud (Amazon EMR)

  • 1. Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 1 Big Data Hadoop Local and Public Cloud Hands On Workshop Dr.Thanachart Numnonda thanachart@imcinstitute.com Danairat T. Certified Java Programmer, TOGAF – Silver danairat@gmail.com, +66-81-559-1446
  • 2. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 2 Hands-On: Running Hadoop on Amazon Elastic MapReduce
  • 3. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 3 Architecture Overview of Amazon EMR
  • 4. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 4 Creating an AWS account
  • 5. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 5 Signing up for the necessary services ● Simple Storage Service (S3) ● Elastic Compute Cloud (EC2) ● Elastic MapReduce (EMR) Caution! This costs real money!
  • 6. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 6 Creating Amazon EC2 Instance
  • 7. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 7 Creating Amazon S3 bucket
  • 8. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 8 Create access key using Security Credentials in the AWS Management Console
  • 9. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 9
  • 10. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 10 Creating a new Job Flow in EMR
  • 11. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 11
  • 12. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 12
  • 13. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 13
  • 14. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 14
  • 15. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 15
  • 16. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 16
  • 17. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 17
  • 18. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 18 View Result from the S3 bucket
  • 19. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 19 Lecture: Understanding Map Reduce Processing Client Name Node Job Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Map Reduce
  • 20. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 20 MapReduce Framework map: (K1, V1) -> list(K2, V2)) reduce: (K2, list(V2)) -> list(K3, V3)
  • 21. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 21 MapReduce Processing – The Data flow 1. InputFormat, InputSplits, RecordReader 2. Mapper - your focus is here 3. Partition, Shuffle & Sort 4. Reducer - your focus is here 5. OutputFormat, RecordWriter
  • 22. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 22 How does the MapReduce work? Output in a list of (Key, List of Values) in the intermediate file Sorting Partitioning Output in a list of (Key, Value) in the intermediate file InputSplit RecordReader RecordWriter
  • 23. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 23 How does the MapReduce work? Sorting Partitioning Combining Car, 2 Car, 2 Bear, {1,1} Car, {2,1} River, {1,1} Deer, {1,1} Output in a list of (Key, List of Values) in the intermediate file Output in a list of (Key, Value) in the intermediate file InputSplit RecordReader RecordWriter
  • 24. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 24 Hands-On: Writing you own Map Reduce Program
  • 25. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 25 Wordcount (HelloWord in Hadoop) 1. package org.myorg; 2. 3. import java.io.IOException; 4. import java.util.*; 5. 6. import org.apache.hadoop.fs.Path; 7. import org.apache.hadoop.conf.*; 8. import org.apache.hadoop.io.*; 9. import org.apache.hadoop.mapred.*; 10. import org.apache.hadoop.util.*; 11. 12. public class WordCount { 13. 14. public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 15. private final static IntWritable one = new IntWritable(1); 16. private Text word = new Text(); 17. 18. public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 19. String line = value.toString(); 20. StringTokenizer tokenizer = new StringTokenizer(line); 21. while (tokenizer.hasMoreTokens()) { 22. word.set(tokenizer.nextToken()); 23. output.collect(word, one); 24. } 25. } 26. }
  • 26. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 26 Wordcount (HelloWord in Hadoop) 27. 28. public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 29. public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 30. int sum = 0; 31. while (values.hasNext()) { 32. sum += values.next().get(); 33. } 34. output.collect(key, new IntWritable(sum)); 35. } 36. } 37.
  • 27. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 27 Wordcount (HelloWord in Hadoop) 38. public static void main(String[] args) throws Exception { 39. JobConf conf = new JobConf(WordCount.class); 40. conf.setJobName("wordcount"); 41. 42. conf.setOutputKeyClass(Text.class); 43. conf.setOutputValueClass(IntWritable.class); 44. 45. conf.setMapperClass(Map.class); 46. 47. conf.setReducerClass(Reduce.class); 48. 49. conf.setInputFormat(TextInputFormat.class); 50. conf.setOutputFormat(TextOutputFormat.class); 51. 52. FileInputFormat.setInputPaths(conf, new Path(args[0])); 53. FileOutputFormat.setOutputPath(conf, new Path(args[1])); 54. 55. JobClient.runJob(conf); 57. } 58. } 59.
  • 28. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 28 Hands-On: Packaging Map Reduce and Deploying to Hadoop Runtime Environment
  • 29. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 29 Packaging Map Reduce Program Usage Assuming HADOOP_HOME is the root of the installation and HADOOP_VERSION is the Hadoop version installed, compile WordCount.java and create a jar: $ mkdir /home/hduser/wordcount_classes $ cd /home/hduser $ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d wordcount_classes WordCount.java $ jar -cvf ./wordcount.jar -C wordcount_classes/ . $ hadoop jar ./wordcount.jar org.myorg.WordCount /input/* /output/wordcount_output_dir Output: ……. $ hadoop dfs -cat /output/wordcount_output_dir/part-00000
  • 30. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 30 Hands-On: Running WordCount.jar on Amazon EMR
  • 31. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 31 Upload .jar file and input file to Amazon S3 1. Select <yourbucket> in Amazon S3 service 2. Create folder : applications 3. Upload wordcount.jar to the applications folder 4. Create another folder: input 5. Upload input_test.txt to the input folder
  • 32. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 32 Running a new Job Flow in EMR
  • 33. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 33 Input JAR Location and Arguments
  • 34. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 34
  • 35. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 35
  • 36. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 36
  • 37. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 37
  • 38. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 38 View the Result
  • 39. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 39 Hands-On: Analytics Using MapReduce
  • 40. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 40 Three Analytic MapReduce Examples 1. Simple analytics using MapReduce 2. Performing Group-By using MapReduce 3. Calculating frequency distributions and sorting using MapReduce
  • 41. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 41 NASA weblog dataset available from http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html is a real-life dataset collected using the requests received by NASA web servers. Download the weblog dataset from ftp://ita.ee.lbl.gov/traces/NASA_ access_log_Jul95.gz and unzip it. We call the extracted folder as DATA_DIR. $ hadoopdfs -mkdir /data $ hadoopdfs -put <DATA_DIR>/NASA_access_log_Jul95 /data/input1 Preparing Example Data
  • 42. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 42 Aggregative values (for example, Mean, Max, Min, standard deviation, and so on) provide the basic analytics about a dataset.. Simple analytics using MapReduce Source: Hadoop MapReduce CookBook
  • 43. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 43 package analysis; import java.io.IOException; import java.util.*; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import org.apache.hadoop.conf.*; public class WebLogMessageSizeAggregator { public static final Pattern httplogPattern = Pattern .compile("([^s]+) - - [(.+)] "([^s]+) (/[^s]*) HTTP/[^s]+" [^s]+ ([0-9]+)"); public static class AMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { WebLogMessageSizeAggregator.java
  • 44. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 44 public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException { Matcher matcher = httplogPattern.matcher(value.toString()); if (matcher.matches()) { int size = Integer.parseInt(matcher.group(5)); output.collect(new Text("msgSize"), new IntWritable(size)); } } } WebLogMessageSizeAggregator.java
  • 45. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 45 public static class AReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException { double tot = 0; int count = 0; int min = Integer.MAX_VALUE; int max = 0; while (values.hasNext()) { int value = values.next().get(); tot = tot + value; count++; if (value < min) { min = value; } if (value > max) { max = value; } } WebLogMessageSizeAggregator.java
  • 46. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 46 output.collect(new Text("Mean"), new IntWritable((int) tot / count)); output.collect(new Text("Max"), new IntWritable(max)); output.collect(new Text("Min"), new IntWritable(min)); } } public static void main(String[] args) throws Exception { JobConf job = new JobConf(WebLogMessageSizeAggregator.class); job.setJarByClass(WebLogMessageSizeAggregator.class); job.setMapperClass(AMapper.class); job.setReducerClass(AReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); JobClient.runJob(job); } } WebLogMessageSizeAggregator.java
  • 47. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 47 Compile, Build JAR, Submit Job, Review Result $ cd /home/hduser $ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d WebLog WebLogMessageSizeAggregator.java $ jar -cvf ./weblog.jar -C WebLog . $ hadoop jar ./weblog.jar analysis.WebLogMessageSizeAggregator /data/* /output/result_weblog Output: ...... $ hadoop dfs -cat /output/result_weblog/part-00000
  • 48. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 48 A MapReduce to group data into simple groups and calculate the analytics for each group. Performing Group-By using MapReduce Source: Hadoop MapReduce CookBook
  • 49. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 49 public class WeblogHitsByLinkProcessor { public static final Pattern httplogPattern = Pattern .compile("([^s]+) - - [(.+)] "([^s]+) (/[^s]*) HTTP/[^s]+" [^s]+ ([0-9]+)"); public static class AMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException { Matcher matcher = httplogPattern.matcher(value.toString()); if (matcher.matches()) { String linkUrl = matcher.group(4); word.set(linkUrl); output.collect(word, one); } } } WeblogHitsByLinkProcessor.java
  • 50. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 50 public static class AReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } result.set(sum); output.collect(key, result); } } WeblogHitsByLinkProcessor.java
  • 51. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 51 Compile, Build JAR, Submit Job, Review Result $ cd /home/hduser $ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d WebLogHit WeblogHitsByLinkProcessor.java $ jar -cvf ./webloghit.jar -C WebLogHit . $ hadoop jar ./webloghit.jar analysis.WeblogHitsByLinkProcessor /data/* /output/result_webloghit Output: ...... $ hadoop dfs -cat /output/result_webloghit/part-00000
  • 52. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 52 Frequency distribution is the number of hits received by each URL sorted in the Ascending order, by the number hits received by a URL. We have already calculated the number of hits inthe previous program. Calculating frequency distributions and sorting using MapReduce Source: Hadoop MapReduce CookBook
  • 53. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 53 public class WeblogFrequencyDistributionProcessor { public static final Pattern httplogPattern = Pattern.compile("([^s]+) - - [(. +)] "([^s]+) (/[^s]*) HTTP/[^s]+" [^s]+ ([0-9]+)"); public static class AMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException { String[] tokens = value.toString().split("s"); output.collect(new Text(tokens[0]),new IntWritable(Integer.parseInt(tokens[1]))); } } WeblogFrequencyDistributionProcessor.java
  • 54. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 54 /** * <p>Reduce function receives all the values that has the same key as the input, and it output the key * and the number of occurrences of the key as the output.</p> */ public static class AReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException { if(values.hasNext()){ output.collect(key, values.next()); } } } WeblogFrequencyDistributionProcessor.java
  • 55. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 55 Compile, Build JAR, Submit Job, Review Result $ cd /home/hduser $ javac -classpath /usr/local/hadoop/hadoop-core-0.20.205.0.jar -d WebLogFreq WWeblogFrequencyDistributionProcessor.java $ jar -cvf ./weblogfreq.jar -C WebLogFreq . $ hadoop jar ./weblogfreq.jar analysis.WeblogFrequencyDistributionProcessor /output/result_webloghit/* /output/result_weblogfreq Output: ...... $ hadoop dfs -cat /output/result_weblogfreq/part-00000
  • 56. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 56 Exercise: Runming the analytic programs on Amazon EMR.
  • 57. Thanachart Numnonda and Danairat T, July 2013Big Data Hadoop – Hands On Workshop 57 Thank you