SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
Ashok Agarwal 
Testing MultiOutputFormat based MapReduce 
≈ LEAVE A COMMENT 
[] 
Tags 
11 Thursday Sep 2014 
POSTED BY ASHOK AGARWAL IN BIG DATA 
Big Data, Hadoop, MapReduce 
In one of our projects, we were require to generate per client file as output of MapReduce Job, so 
that the corresponding client can see their data and analyze it. 
Consider you get daily stock prices files. 
For 9/8/2014: 9_8_2014.csv 
1234 
9/8/14,MSFT,47 
9/8/14,ORCL,40 
9/8/14,GOOG,577 
9/8/14,AAPL,100.4 
For 9/9/2014: 9_9_2014.csv 
1234 
9/9/14,MSFT,46 
9/9/14,ORCL,41 
9/9/14,GOOG,578 
9/9/14,AAPL,101 
So on… 
123456789 
10 
9/10/14,MSFT,48 
9/10/14,ORCL,39.5 
9/10/14,GOOG,577 
9/10/14,AAPL,100 
9/11/14,MSFT,47.5 
9/11/14,ORCL,41 
9/11/14,GOOG,588 
9/11/14,AAPL,99.8 
9/12/14,MSFT,46.69 
9/12/14,ORCL,40.5 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
11 
12 
9/12/14,GOOG,576 
9/12/14,AAPL,102.5 
We want to analyze the each stock weekly trend. In order to that we need to create each stock 
based data. 
The below mapper code splits the read records from csv using TextInputFormat. The output 
mapper key is stock and value is price. 
123456789 
10 
11 
12 
13 
package com.jbksoft; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
import java.io.IOException; 
public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context) 
throws IOException, InterruptedException { 
String line = value.toString(); 
String[] tokens = line.split(","); 
context.write(new Text(tokens[1]), new Text(tokens[2])); 
} 
} 
The below reducer code creates file for each stock. 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
package com.jbksoft; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; 
import java.io.IOException; 
public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos; 
public void setup(Context context) { 
mos = new MultipleOutputs(context); 
} 
public void reduce(Text key, Iterable<Text> values, Context context) 
throws IOException, InterruptedException { 
for (Text value : values) { 
mos.write(NullWritable.get(), value, key.toString()); 
} 
} 
protected void cleanup(Context context) 
throws IOException, InterruptedException { 
mos.close(); 
} 
} 
The driver for the code: 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
package com.jbksoft; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import java.io.IOException; 
public class MyMultiOutputTest { 
public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]); 
Path outputDir = new Path(args[1]); 
Configuration conf = new Configuration(); 
Job job = new Job(conf); 
job.setJarByClass(MyMultiOutputTest.class); 
job.setJobName("My MultipleOutputs Demo"); 
job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(Text.class); 
job.setMapperClass(MyMultiOutputMapper.class); 
job.setReducerClass(MyMultiOutputReducer.class); 
FileInputFormat.setInputPaths(job, inputDir); 
FileOutputFormat.setOutputPath(job, outputDir); 
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); 
job.waitForCompletion(true); 
} 
} 
The command for executing above code(compiled and packaged as jar): 
123456789 
aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output 
total 32 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000 
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS 
aagarwal‐mbpro:~ ashok.agarwal$ 
The test case for the above code can be created using MRunit. 
The reducer needs to be mocked over here as below: 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
package com.jbksoft.test; 
import com.jbksoft.MyMultiOutputReducer; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; 
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; 
import org.apache.hadoop.mrunit.types.Pair; 
import org.junit.Before; 
import org.junit.Test; 
import java.util.ArrayList; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 
import static org.junit.Assert.assertEquals; 
import static org.junit.Assert.assertTrue; 
public class MyMultiOutputReducerTest { 
MockOSReducer reducer; 
ReduceDriver<Text, Text, NullWritable, Text> reduceDriver; 
Configuration config; 
Map<String, List<Text>> outputCSVFiles; 
static String[] CSV = { 
"9/8/14,MSFT,47", 
"9/8/14,ORCL,40", 
"9/8/14,GOOG,577", 
"9/8/14,AAPL,100.4", 
"9/9/14,MSFT,46", 
"9/9/14,ORCL,41", 
"9/9/14,GOOG,578" 
}; 
class MockOSReducer extends MyMultiOutputReducer { 
private Map<String, List<Text>> multipleOutputMap; 
public MockOSReducer(Map<String, List<Text>> map) { 
super(); 
multipleOutputMap = map; 
} 
@Override 
public void setup(Reducer.Context context) { 
mos = new MultipleOutputs<NullWritable, Text>(context) { 
@Override 
public void write(NullWritable key, Text value, String outputFileName) 
throws java.io.IOException, java.lang.InterruptedException { 
List<Text> outputs = multipleOutputMap.get(outputFileName); 
if (outputs == null) { 
outputs = new ArrayList<Text>(); 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 4/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
multipleOutputMap.put(outputFileName, outputs); 
} 
outputs.add(new Text(value)); 
} 
}; 
config = context.getConfiguration(); 
} 
} 
@Before 
public void setup() 
throws Exception { 
config = new Configuration(); 
outputCSVFiles = new HashMap<String, List<Text>>(); 
reducer = new MockOSReducer(outputCSVFiles); 
reduceDriver = ReduceDriver.newReduceDriver(reducer); 
reduceDriver.setConfiguration(config); 
} 
@Test 
public void testReduceInput1Output() 
throws Exception { 
List<Text> list = new ArrayList<Text>(); 
list.add(new Text("47")); 
list.add(new Text("46")); 
list.add(new Text("48")); 
reduceDriver.withInput(new Text("MSFT"), list); 
reduceDriver.runTest(); 
Map<String, List<Text>> expectedCSVOutput = new HashMap<String, List<Text> outputs = new ArrayList<Text>(); 
outputs.add(new Text("47")); 
outputs.add(new Text("46")); 
outputs.add(new Text("48")); 
expectedCSVOutput.put("MSFT", outputs); 
validateOutputList(outputCSVFiles, expectedCSVOutput); 
} 
static void print(Map<String, List<Text>> outputCSVFiles) { 
for (String key : outputCSVFiles.keySet()) { 
List<Text> valueList = outputCSVFiles.get(key); 
for (Text pair : valueList) { 
System.out.println("OUTPUT " + key + " = " + pair.toString()); 
} 
} 
} 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 5/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
protected void validateOutputList(Map<String, List<Text>> actuals, 
Map<String, List<Text>> expects) { 
List<String> removeList = new ArrayList<String>(); 
for (String key : expects.keySet()) { 
removeList.add(key); 
List<Text> expectedValues = expects.get(key); 
List<Text> actualValues = actuals.get(key); 
int expectedSize = expectedValues.size(); 
int actualSize = actualValues.size(); 
int i = 0; 
assertEquals("Number of output CSV files is " + actualSize + " actualSize, expectedSize); 
while (expectedSize > i || actualSize > i) { 
if (expectedSize > i && actualSize > i) { 
Text expected = expectedValues.get(i); 
Text actual = actualValues.get(i); 
assertTrue("Expected CSV content is " + expected.toString() + "expected.equals(actual)); 
} 
i++; 
} 
} 
} 
} 
The mapper unit test can be as below: 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
package com.jbksoft.test; 
import com.jbksoft.MyMultiOutputMapper; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mrunit.mapreduce.MapDriver; 
import org.apache.hadoop.mrunit.types.Pair; 
import org.junit.Before; 
import org.junit.Test; 
import java.util.ArrayList; 
import java.util.List; 
public class MyMultiOutputMapperTest { 
MyMultiOutputMapper mapper; 
MapDriver<LongWritable, Text, Text, Text> mapDriver; 
Configuration config; 
static String[] CSV = { 
"9/8/14,MSFT,47", 
"9/8/14,ORCL,40", 
"9/8/14,GOOG,577" 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 6/7
12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
}; 
@Before 
public void setup() 
throws Exception { 
config = new Configuration(); 
mapper = new MyMultiOutputMapper(); 
mapDriver = MapDriver.newMapDriver(mapper); 
mapDriver.setConfiguration(config); 
} 
@Test 
public void testMapInput1Output() 
throws Exception { 
mapDriver.withInput(new LongWritable(), new Text(CSV[0])); 
mapDriver.withOutput(new Text("MSFT"), new Text("47")); 
mapDriver.runTest(); 
} 
@Test 
public void testMapInput2Output() 
throws Exception { 
final List<Pair<LongWritable, Text>> inputs = new ArrayList<Pair&inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ 
inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ 
final List<Pair<Text, Text>> outputs = new ArrayList<Pair<outputs.add(new Pair<Text, Text>(new Text("MSFT"), new Text(&outputs.add(new Pair<Text, Text>(new Text("ORCL"), new Text(&// mapDriver.withAll(inputs).withAllOutput(outputs).runTest(); 
} 
} 
References: 
1. MapReduce Tutorial 
2. HDFS Architecture 
3. MultipileOutputs 
4. MRUnit 
About Occasionally, these ads 
some of your visitors may see an advertisement here. 
Tell me more | Dismiss this message 
Blog at WordPress.com. The Chateau Theme. 
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 7/7

Contenu connexe

Tendances

Taking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the ExtremeTaking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the Extremeyinonavraham
 
利用Init connect做mysql clients stat 用户审计
 利用Init connect做mysql clients stat 用户审计 利用Init connect做mysql clients stat 用户审计
利用Init connect做mysql clients stat 用户审计Dehua Yang
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringCary Millsap
 
State of the CFEngine 2018
State of the CFEngine 2018State of the CFEngine 2018
State of the CFEngine 2018Nick Anderson
 
Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1Kirill Rozov
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Ontico
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
Automated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashAutomated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashRichard Thomson
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasJim Mlodgenski
 
Large scale machine learning projects with r suite
Large scale machine learning projects with r suiteLarge scale machine learning projects with r suite
Large scale machine learning projects with r suiteWit Jakuczun
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streamingRamūnas Urbonas
 
The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184Mahmoud Samir Fayed
 

Tendances (20)

Taking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the ExtremeTaking Jenkins Pipeline to the Extreme
Taking Jenkins Pipeline to the Extreme
 
利用Init connect做mysql clients stat 用户审计
 利用Init connect做mysql clients stat 用户审计 利用Init connect做mysql clients stat 用户审计
利用Init connect做mysql clients stat 用户审计
 
Innovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and MonitoringInnovative Specifications for Better Performance Logging and Monitoring
Innovative Specifications for Better Performance Logging and Monitoring
 
State of the CFEngine 2018
State of the CFEngine 2018State of the CFEngine 2018
State of the CFEngine 2018
 
Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1Why Kotlin - Apalon Kotlin Sprint Part 1
Why Kotlin - Apalon Kotlin Sprint Part 1
 
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
 
Db2
Db2Db2
Db2
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
Automated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDashAutomated Testing with CMake, CTest and CDash
Automated Testing with CMake, CTest and CDash
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
Common scenarios in vcl
Common scenarios in vclCommon scenarios in vcl
Common scenarios in vcl
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
 
Large scale machine learning projects with r suite
Large scale machine learning projects with r suiteLarge scale machine learning projects with r suite
Large scale machine learning projects with r suite
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
 
Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
 
The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184The Ring programming language version 1.5.3 book - Part 78 of 184
The Ring programming language version 1.5.3 book - Part 78 of 184
 

En vedette

Ship classification and types
Ship classification and typesShip classification and types
Ship classification and typesmuhdisys
 
Business Analysis for professionals
Business Analysis for professionalsBusiness Analysis for professionals
Business Analysis for professionalsNazish Riaz
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and typesmuhdisys
 
Num Integration
Num IntegrationNum Integration
Num Integrationmuhdisys
 
Compensation management
Compensation managementCompensation management
Compensation managementNazish Riaz
 
HBase based map reduce job unit testing
HBase based map reduce job unit testingHBase based map reduce job unit testing
HBase based map reduce job unit testingAshok Agarwal
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and typesmuhdisys
 
Elasticity of demand
Elasticity of demandElasticity of demand
Elasticity of demandJithin Thomas
 
Price discrimination
Price discriminationPrice discrimination
Price discriminationJithin Thomas
 
Accounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting EquationAccounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting EquationJithin Thomas
 
Theory of Production
Theory of ProductionTheory of Production
Theory of ProductionJithin Thomas
 

En vedette (13)

Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
 
Business Analysis for professionals
Business Analysis for professionalsBusiness Analysis for professionals
Business Analysis for professionals
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
 
Num Integration
Num IntegrationNum Integration
Num Integration
 
Compensation management
Compensation managementCompensation management
Compensation management
 
Cost of Capital
Cost of CapitalCost of Capital
Cost of Capital
 
HBase based map reduce job unit testing
HBase based map reduce job unit testingHBase based map reduce job unit testing
HBase based map reduce job unit testing
 
Ship classification and types
Ship classification and typesShip classification and types
Ship classification and types
 
Elasticity of demand
Elasticity of demandElasticity of demand
Elasticity of demand
 
Demand
DemandDemand
Demand
 
Price discrimination
Price discriminationPrice discrimination
Price discrimination
 
Accounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting EquationAccounting Principles, Concepts and Accounting Equation
Accounting Principles, Concepts and Accounting Equation
 
Theory of Production
Theory of ProductionTheory of Production
Theory of Production
 

Similaire à Testing multi outputformat based mapreduce

Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdataPooja Gupta
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSLadislav Prskavec
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEnis Afgan
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....La Cuisine du Web
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Guillaume Laforge
 
NodeJs
NodeJsNodeJs
NodeJsdizabl
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Introduction To Groovy 2005
Introduction To Groovy 2005Introduction To Groovy 2005
Introduction To Groovy 2005Tugdual Grall
 
Advanced Javascript Unit Testing
Advanced Javascript Unit TestingAdvanced Javascript Unit Testing
Advanced Javascript Unit TestingLars Thorup
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerNic Raboy
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitEric Wendelin
 
Scripting Oracle Develop 2007
Scripting Oracle Develop 2007Scripting Oracle Develop 2007
Scripting Oracle Develop 2007Tugdual Grall
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 

Similaire à Testing multi outputformat based mapreduce (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdata
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
NodeJs
NodeJsNodeJs
NodeJs
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Introduction To Groovy 2005
Introduction To Groovy 2005Introduction To Groovy 2005
Introduction To Groovy 2005
 
Advanced Javascript Unit Testing
Advanced Javascript Unit TestingAdvanced Javascript Unit Testing
Advanced Javascript Unit Testing
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Quick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase ServerQuick and Easy Development with Node.js and Couchbase Server
Quick and Easy Development with Node.js and Couchbase Server
 
Good Practices On Test Automation
Good Practices On Test AutomationGood Practices On Test Automation
Good Practices On Test Automation
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Scripting Oracle Develop 2007
Scripting Oracle Develop 2007Scripting Oracle Develop 2007
Scripting Oracle Develop 2007
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Junit_.pptx
Junit_.pptxJunit_.pptx
Junit_.pptx
 
UNO based ODF Toolkit API
UNO based ODF Toolkit APIUNO based ODF Toolkit API
UNO based ODF Toolkit API
 

Dernier

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 

Dernier (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 

Testing multi outputformat based mapreduce

  • 1. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal Ashok Agarwal Testing MultiOutputFormat based MapReduce ≈ LEAVE A COMMENT [] Tags 11 Thursday Sep 2014 POSTED BY ASHOK AGARWAL IN BIG DATA Big Data, Hadoop, MapReduce In one of our projects, we were require to generate per client file as output of MapReduce Job, so that the corresponding client can see their data and analyze it. Consider you get daily stock prices files. For 9/8/2014: 9_8_2014.csv 1234 9/8/14,MSFT,47 9/8/14,ORCL,40 9/8/14,GOOG,577 9/8/14,AAPL,100.4 For 9/9/2014: 9_9_2014.csv 1234 9/9/14,MSFT,46 9/9/14,ORCL,41 9/9/14,GOOG,578 9/9/14,AAPL,101 So on… 123456789 10 9/10/14,MSFT,48 9/10/14,ORCL,39.5 9/10/14,GOOG,577 9/10/14,AAPL,100 9/11/14,MSFT,47.5 9/11/14,ORCL,41 9/11/14,GOOG,588 9/11/14,AAPL,99.8 9/12/14,MSFT,46.69 9/12/14,ORCL,40.5 https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
  • 2. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 11 12 9/12/14,GOOG,576 9/12/14,AAPL,102.5 We want to analyze the each stock weekly trend. In order to that we need to create each stock based data. The below mapper code splits the read records from csv using TextInputFormat. The output mapper key is stock and value is price. 123456789 10 11 12 13 package com.jbksoft; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] tokens = line.split(","); context.write(new Text(tokens[1]), new Text(tokens[2])); } } The below reducer code creates file for each stock. 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 package com.jbksoft; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import java.io.IOException; public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos; public void setup(Context context) { mos = new MultipleOutputs(context); } public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { for (Text value : values) { mos.write(NullWritable.get(), value, key.toString()); } } protected void cleanup(Context context) throws IOException, InterruptedException { mos.close(); } } The driver for the code: https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
  • 3. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 package com.jbksoft; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import java.io.IOException; public class MyMultiOutputTest { public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]); Path outputDir = new Path(args[1]); Configuration conf = new Configuration(); Job job = new Job(conf); job.setJarByClass(MyMultiOutputTest.class); job.setJobName("My MultipleOutputs Demo"); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setMapperClass(MyMultiOutputMapper.class); job.setReducerClass(MyMultiOutputReducer.class); FileInputFormat.setInputPaths(job, inputDir); FileOutputFormat.setOutputPath(job, outputDir); LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); job.waitForCompletion(true); } } The command for executing above code(compiled and packaged as jar): 123456789 aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output total 32 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000 ‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS aagarwal‐mbpro:~ ashok.agarwal$ The test case for the above code can be created using MRunit. The reducer needs to be mocked over here as below: https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7
  • 4. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 package com.jbksoft.test; import com.jbksoft.MyMultiOutputReducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Before; import org.junit.Test; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; public class MyMultiOutputReducerTest { MockOSReducer reducer; ReduceDriver<Text, Text, NullWritable, Text> reduceDriver; Configuration config; Map<String, List<Text>> outputCSVFiles; static String[] CSV = { "9/8/14,MSFT,47", "9/8/14,ORCL,40", "9/8/14,GOOG,577", "9/8/14,AAPL,100.4", "9/9/14,MSFT,46", "9/9/14,ORCL,41", "9/9/14,GOOG,578" }; class MockOSReducer extends MyMultiOutputReducer { private Map<String, List<Text>> multipleOutputMap; public MockOSReducer(Map<String, List<Text>> map) { super(); multipleOutputMap = map; } @Override public void setup(Reducer.Context context) { mos = new MultipleOutputs<NullWritable, Text>(context) { @Override public void write(NullWritable key, Text value, String outputFileName) throws java.io.IOException, java.lang.InterruptedException { List<Text> outputs = multipleOutputMap.get(outputFileName); if (outputs == null) { outputs = new ArrayList<Text>(); https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 4/7
  • 5. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 multipleOutputMap.put(outputFileName, outputs); } outputs.add(new Text(value)); } }; config = context.getConfiguration(); } } @Before public void setup() throws Exception { config = new Configuration(); outputCSVFiles = new HashMap<String, List<Text>>(); reducer = new MockOSReducer(outputCSVFiles); reduceDriver = ReduceDriver.newReduceDriver(reducer); reduceDriver.setConfiguration(config); } @Test public void testReduceInput1Output() throws Exception { List<Text> list = new ArrayList<Text>(); list.add(new Text("47")); list.add(new Text("46")); list.add(new Text("48")); reduceDriver.withInput(new Text("MSFT"), list); reduceDriver.runTest(); Map<String, List<Text>> expectedCSVOutput = new HashMap<String, List<Text> outputs = new ArrayList<Text>(); outputs.add(new Text("47")); outputs.add(new Text("46")); outputs.add(new Text("48")); expectedCSVOutput.put("MSFT", outputs); validateOutputList(outputCSVFiles, expectedCSVOutput); } static void print(Map<String, List<Text>> outputCSVFiles) { for (String key : outputCSVFiles.keySet()) { List<Text> valueList = outputCSVFiles.get(key); for (Text pair : valueList) { System.out.println("OUTPUT " + key + " = " + pair.toString()); } } } https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 5/7
  • 6. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 protected void validateOutputList(Map<String, List<Text>> actuals, Map<String, List<Text>> expects) { List<String> removeList = new ArrayList<String>(); for (String key : expects.keySet()) { removeList.add(key); List<Text> expectedValues = expects.get(key); List<Text> actualValues = actuals.get(key); int expectedSize = expectedValues.size(); int actualSize = actualValues.size(); int i = 0; assertEquals("Number of output CSV files is " + actualSize + " actualSize, expectedSize); while (expectedSize > i || actualSize > i) { if (expectedSize > i && actualSize > i) { Text expected = expectedValues.get(i); Text actual = actualValues.get(i); assertTrue("Expected CSV content is " + expected.toString() + "expected.equals(actual)); } i++; } } } } The mapper unit test can be as below: 123456789 10 11 12 13 14 15 16 17 18 19 20 package com.jbksoft.test; import com.jbksoft.MyMultiOutputMapper; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Before; import org.junit.Test; import java.util.ArrayList; import java.util.List; public class MyMultiOutputMapperTest { MyMultiOutputMapper mapper; MapDriver<LongWritable, Text, Text, Text> mapDriver; Configuration config; static String[] CSV = { "9/8/14,MSFT,47", "9/8/14,ORCL,40", "9/8/14,GOOG,577" https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 6/7
  • 7. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 }; @Before public void setup() throws Exception { config = new Configuration(); mapper = new MyMultiOutputMapper(); mapDriver = MapDriver.newMapDriver(mapper); mapDriver.setConfiguration(config); } @Test public void testMapInput1Output() throws Exception { mapDriver.withInput(new LongWritable(), new Text(CSV[0])); mapDriver.withOutput(new Text("MSFT"), new Text("47")); mapDriver.runTest(); } @Test public void testMapInput2Output() throws Exception { final List<Pair<LongWritable, Text>> inputs = new ArrayList<Pair&inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ inputs.add(new Pair<LongWritable, Text>(new LongWritable(), new Text(CSV[ final List<Pair<Text, Text>> outputs = new ArrayList<Pair<outputs.add(new Pair<Text, Text>(new Text("MSFT"), new Text(&outputs.add(new Pair<Text, Text>(new Text("ORCL"), new Text(&// mapDriver.withAll(inputs).withAllOutput(outputs).runTest(); } } References: 1. MapReduce Tutorial 2. HDFS Architecture 3. MultipileOutputs 4. MRUnit About Occasionally, these ads some of your visitors may see an advertisement here. Tell me more | Dismiss this message Blog at WordPress.com. The Chateau Theme. https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 7/7