Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Testing multi outputformat based mapreduce
1. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal
Ashok Agarwal
Testing MultiOutputFormat based MapReduce
≈ LEAVE A COMMENT
[]
Tags
11 Thursday Sep 2014
POSTED BY ASHOK AGARWAL IN BIG DATA
Big Data, Hadoop, MapReduce
In one of our projects, we were require to generate per client file as output of MapReduce Job, so
that the corresponding client can see their data and analyze it.
Consider you get daily stock prices files.
For 9/8/2014: 9_8_2014.csv
1234
9/8/14,MSFT,47
9/8/14,ORCL,40
9/8/14,GOOG,577
9/8/14,AAPL,100.4
For 9/9/2014: 9_9_2014.csv
1234
9/9/14,MSFT,46
9/9/14,ORCL,41
9/9/14,GOOG,578
9/9/14,AAPL,101
So on…
123456789
10
9/10/14,MSFT,48
9/10/14,ORCL,39.5
9/10/14,GOOG,577
9/10/14,AAPL,100
9/11/14,MSFT,47.5
9/11/14,ORCL,41
9/11/14,GOOG,588
9/11/14,AAPL,99.8
9/12/14,MSFT,46.69
9/12/14,ORCL,40.5
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
2. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal
11
12
9/12/14,GOOG,576
9/12/14,AAPL,102.5
We want to analyze the each stock weekly trend. In order to that we need to create each stock
based data.
The below mapper code splits the read records from csv using TextInputFormat. The output
mapper key is stock and value is price.
123456789
10
11
12
13
package com.jbksoft;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] tokens = line.split(",");
context.write(new Text(tokens[1]), new Text(tokens[2]));
}
}
The below reducer code creates file for each stock.
123456789
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package com.jbksoft;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import java.io.IOException;
public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs(context);
}
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text value : values) {
mos.write(NullWritable.get(), value, key.toString());
}
}
protected void cleanup(Context context)
throws IOException, InterruptedException {
mos.close();
}
}
The driver for the code:
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
3. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal
123456789
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package com.jbksoft;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
public class MyMultiOutputTest {
public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]);
Path outputDir = new Path(args[1]);
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(MyMultiOutputTest.class);
job.setJobName("My MultipleOutputs Demo");
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MyMultiOutputMapper.class);
job.setReducerClass(MyMultiOutputReducer.class);
FileInputFormat.setInputPaths(job, inputDir);
FileOutputFormat.setOutputPath(job, outputDir);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
job.waitForCompletion(true);
}
}
The command for executing above code(compiled and packaged as jar):
123456789
aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output
total 32
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS
aagarwal‐mbpro:~ ashok.agarwal$
The test case for the above code can be created using MRunit.
The reducer needs to be mocked over here as below:
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7