SlideShare une entreprise Scribd logo
1  sur  52
Testing Big Data:
Unit Test in Hadoop (Part II)
Jongwook Woo (PhD)
High-Performance Internet Computing Center (HiPIC)
Educational Partner with Cloudera and Grants Awardee of Amazon AWS
Computer Information Systems Department
California State University, Los Angeles

Seoul Software Testing Conference
Contents

• Test in General
• Use Cases: Big Data in Hadoop and Ecosystems
• Unit Test in Hadoop

Seoul Software Testing Conference
Test in general

• Quality Assurance
– TDD (Test Driven Development)
• Unit Test
– Test functional units of the S/W

– BDD (Behavior Driven Development)
• Based on TDD
• Test behavior of the S/W

– Integration Test:
• integrated components
– Group of unit tests

• CI (Continuous Integration) Server
– Hudson, Jenkins etc
Seoul Software Testing Conference
CI Server

• Continuous Integration Server
– TDD (Test Driven Development) based
• All developers commit the update everyday
• CI server compile and run the unit tests
• If a test fails, all receive the failure email
– Know who committed a bad code

– Hudson, Jenkins etc
• Supports SCM version control tools
– CVS, Subversion, Git

Seoul Software Testing Conference
Test in Hadoop

• Much harder
– JUnit cannot be used in Hadoop
– Cluster
– Server
– Parallel Computing

Seoul Software Testing Conference
Use Cases: Shopzilla

• Hadoop’s Elephant In The Room
– Hadoop testing
• Quality Assurance
– Unit Test: functional units of the S/W
– Integration Test: integrated components
– BDD Test: Behavior of the S/W

• Augmented Development
– Use a dev cluster?
» Too long per day
– Hadoop-In-A-Box

Seoul Software Testing Conference
Use Cases: Shopzilla

• Hadoop-In-A-Box
– Fully compatible Mock Environment
• Without a cluster
• Mock cluster state

– Test Locally
• Single Node Pseudo Cluster
• MiniMRCluster
• => can test HDFS, Pig

Seoul Software Testing Conference
Use Cases: Yahoo

• Developer
– Wants to run Hadoop codes in the local machine
• Does not want to run Hadoop codes at the Hadoop cluster

• Yahoo HIT
– Hadoop Integration Test
– Run Hadoop tests in the Hadoop Ecosystems
• Deploy HIT on a Hadoop single or cluster
• Run tests in Hadoop, Pig, Hive, Oozie,…

Seoul Software Testing Conference
Unit Test in Hadoop

• MRUnit testing framework
– is based on JUnit
– Cloudera donated to Apache
– can test Map Reduce programs
• written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop

– Can test Mapper, Reducer, MapperReducer

Seoul Software Testing Conference
Unit Test in Hadoop

• WordCount Example
– reads text files and counts how often words occur.
• The input and the output are text files,

– Need three classes
• WordCount.java
– Driver class with main function

• WordMapper.java
– Mapper class with map method

• SumReducer.java
– Reducer class with reduce method

Seoul Software Testing Conference
WordCount Example

• WordMapper.java
– Mapper class with map function
– For the given sample input
• assuming two map nodes
– The sample input is distributed to the maps

• the first map emits:
– <Hello, 1> <World, 1> <Bye, 1> <World, 1>

• The second map emits:
– <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>

Seoul Software Testing Conference
WordCount Example

• SumReducer.java
– Reducer class with reduce function
– For the input from two Mappers
• the reduce method just sums up the values,
– which are the occurrence counts for each key

• Thus the output of the job is:
– <Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2>

Seoul Software Testing Conference
WordCount.java (Driver)
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}

Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}

Check Input and Output files

Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

Set output (key, value) types

job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

Set Mapper/Reducer classes

job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);

Set Input/Output format classes

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

Set Input/Output paths

FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
Set
FileOutputFormat.setOutputPath(job, new Path(args[1]));

Driver class

job.setJarByClass(WordCount.class);
job.submit();
}
}

Seoul Software Testing Conference
WordCount.java
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

Submit the job
job.setJarByClass(WordCount.class);

to the master node

job.submit();
}
}

Seoul Software Testing Conference
WordMapper.java (Mapper class)
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
@Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
Seoul Software Testing Conference
WordMapper.java

Extends mapper class with input/
output keys and values

public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
@Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
Output (key, value) types
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
@Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
Input (key, value) types
Output as Context type

@Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}

Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
@Override
public void map(Object key, Text value,
Read words from each line
Context contex) throws IOException, InterruptedException { input file
of the
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
Seoul Software Testing Conference
WordMapper.java
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
@Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
Count each word

while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
Seoul Software Testing Conference
Shuffler/Sorter

• Maps emit (key, value) pairs
• Shuffler/Sorter of Hadoop framework
– Sort (key, value) pairs by key
– Then, append the value to make (key, list of values) p
air
– For example,
• The first, second maps emit:
– <Hello, 1> <World, 1> <Bye, 1> <World, 1>
– <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1>

• Shuffler produces and it becomes the input of the reducer
– <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>
, <World, <1,1>>

Seoul Software Testing Conference
SumReducer.java (Reducer class)
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}

Seoul Software Testing Conference
SumReducer.java

Extends Reducer class with input/
output keys and values

public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
Set output value type

private IntWritable totalWordCount = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
Set input (key, list of values) type
and output as Context class

@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}

Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
For each word,
Iterator<IntWritable> it=values.iterator(); Count/sum the number of values
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}
Seoul Software Testing Conference
SumReducer.java
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
For each word,
Total count becomes the value
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}
Seoul Software Testing Conference
SumReducer

• Reducer
– Input: Shuffler produces and it becomes the input of
the reducer
• <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>>
, <World, <1,1>>

– Output
• <Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World,
2>

Seoul Software Testing Conference
MRUnit Test

• How to UnitTest in Hadoop
– Extending JUnit test
• With org.apache.hadoop.mrunit.* API

– Needs to test Driver, Mapper, Reducer
• MapReduceDriver, MapDriver, ReduceDriver
• Add input with expected output

Seoul Software Testing Conference
MRUnit Test
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.MapDriver;
import org.apache.hadoop.mrunit.MapReduceDriver;
import org.apache.hadoop.mrunit.ReduceDriver;
import org.junit.Before;
import org.junit.Test;
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
@Test
public void testReducer() {
List<IntWritable> values = new ArrayList<IntWritable>();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("cat"), values);
reduceDriver.withOutput(new Text("cat"), new IntWritable(2));
reduceDriver.runTest();
}
@Test
public void testMapReduce() {
mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2));
mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1));
mapReduceDriver.runTest();
}
}

Seoul Software Testing Conference
MRUnit Test

Using MRUnit API, declare MapReduce,
Mapper, Reducer drivers
with input/output (key, value)

public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr
iver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I
ntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr
iver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
Run setUp() before executing each test method
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I
ntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr
iver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
@Before
Instantiate WordCount Mapper, Reducer
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I
ntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr
iver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();

Instantiate and set Mapper
driver
with input/output (key, value)

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I
ntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr
iver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();

Instantiate and set Reducer
driver
with input/output (key, value)

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I
ntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
public class TestWordCount {
MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr
iver;
MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
SumReducer reducer = new SumReducer();

Instantiate and set
MapperReducer driver
with input/output (key, value)

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>();
reduceDriver.setReducer(reducer);
mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I
ntWritable>();
mapReduceDriver.setMapper(mapper);
mapReduceDriver.setReducer(reducer);
}

Seoul Software Testing Conference
MRUnit Test
Mapper test:
Define sample input with expected output

@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
@Test
public void testReducer() {
List<IntWritable> values = new ArrayList<IntWritable>();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("cat"), values);
reduceDriver.withOutput(new Text("cat"), new IntWritable(2));
reduceDriver.runTest();
}

Seoul Software Testing Conference
MRUnit Test
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
Reducer test:
Define sample input with expected output

@Test
public void testReducer() {
List<IntWritable> values = new ArrayList<IntWritable>();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
reduceDriver.withInput(new Text("cat"), values);
reduceDriver.withOutput(new Text("cat"), new IntWritable(2));
reduceDriver.runTest();
}

Seoul Software Testing Conference
MRUnit Test

MapperReducer test:
Define sample input with expected output

@Test
public void testMapReduce() {
mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog"));
mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2));
mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1));
mapReduceDriver.runTest();
}
}

Seoul Software Testing Conference
MRUnit Test in real

• Need to implement unit tests
– How many?
• all Map, Reduce, Driver

– Problems?
• Mostly work
– But it does not support complicated Map, Reduce APIs

– How many problems you can detect
• Depends on how well you implement MRUnit code

Seoul Software Testing Conference
Conclusion

•
•
•
•

MRUnit for Hadoop Unit Test
Development
Integrate with QA site with CI server
Need to use it

Seoul Software Testing Conference
Question?

Seoul Software Testing Conference
References
1. Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/1
0/hadoop-wordcount-with-new-map-reduce-api.html)
2. Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount )
3. Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/clouder
a-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html )
4. Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count)
5. Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial )
6. Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-sui
te )
7. Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hado
ops-elephant-in-the-room/ )
8. Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/
org/apache/hadoop/mapreduce/TestMapReduceLocal.java )
9. Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 )

Seoul Software Testing Conference
Seoul Software Testing Conference
Seoul Software Testing Conference

Contenu connexe

Tendances

Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...RTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargQA or the Highway
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTobias Trelle
 
Managing Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksManaging Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksDatabricks
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Optimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansOptimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansDatabricks
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Databricks
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...DataStax
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareDatabricks
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...Databricks
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 

Tendances (20)

Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Test Automation for NoSQL Databases
Test Automation for NoSQL DatabasesTest Automation for NoSQL Databases
Test Automation for NoSQL Databases
 
Managing Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksManaging Millions of Tests Using Databricks
Managing Millions of Tests Using Databricks
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Optimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex PlansOptimizing the Catalyst Optimizer for Complex Plans
Optimizing the Catalyst Optimizer for Complex Plans
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 

En vedette

Big Data, Big Trouble: Getting into the Flow of Hadoop Testing
Big Data, Big Trouble: Getting into the Flow of Hadoop TestingBig Data, Big Trouble: Getting into the Flow of Hadoop Testing
Big Data, Big Trouble: Getting into the Flow of Hadoop TestingTechWell
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?Dmitri Shiryaev
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)vodqancr
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (6)

Big Data, Big Trouble: Getting into the Flow of Hadoop Testing
Big Data, Big Trouble: Getting into the Flow of Hadoop TestingBig Data, Big Trouble: Getting into the Flow of Hadoop Testing
Big Data, Big Trouble: Getting into the Flow of Hadoop Testing
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
How to be an awesome test automation professional
How to be an awesome test automation professionalHow to be an awesome test automation professional
How to be an awesome test automation professional
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similaire à 2014 International Software Testing Conference in Seoul

Test in action – week 1
Test in action – week 1Test in action – week 1
Test in action – week 1Yi-Huan Chan
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applicationsKnoldus Inc.
 
May: Automated Developer Testing: Achievements and Challenges
May: Automated Developer Testing: Achievements and ChallengesMay: Automated Developer Testing: Achievements and Challenges
May: Automated Developer Testing: Achievements and ChallengesTriTAUG
 
Testing in Craft CMS
Testing in Craft CMSTesting in Craft CMS
Testing in Craft CMSJustinHolt20
 
Structured Functional Automated Web Service Testing
Structured Functional Automated Web Service TestingStructured Functional Automated Web Service Testing
Structured Functional Automated Web Service Testingrdekleijn
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmAnton Shapin
 
Stopping the Rot - Putting Legacy C++ Under Test
Stopping the Rot - Putting Legacy C++ Under TestStopping the Rot - Putting Legacy C++ Under Test
Stopping the Rot - Putting Legacy C++ Under TestSeb Rose
 
Automated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesAutomated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesTao Xie
 
Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Joachim Baumann
 
Joomla! Testing - J!DD Germany 2016
Joomla! Testing - J!DD Germany 2016Joomla! Testing - J!DD Germany 2016
Joomla! Testing - J!DD Germany 2016Yves Hoppe
 
Testing basics for developers
Testing basics for developersTesting basics for developers
Testing basics for developersAnton Udovychenko
 
Performance testing jmeter
Performance testing jmeterPerformance testing jmeter
Performance testing jmeterBhojan Rajan
 
Netserv Software Testing
Netserv Software TestingNetserv Software Testing
Netserv Software Testingsthicks14
 
Renaissance of JUnit - Introduction to JUnit 5
Renaissance of JUnit - Introduction to JUnit 5Renaissance of JUnit - Introduction to JUnit 5
Renaissance of JUnit - Introduction to JUnit 5Jimmy Lu
 
2014 11 20 Drupal 7 -> 8 test migratie
2014 11 20 Drupal 7 -> 8 test migratie2014 11 20 Drupal 7 -> 8 test migratie
2014 11 20 Drupal 7 -> 8 test migratiehcderaad
 
July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessYahoo Developer Network
 

Similaire à 2014 International Software Testing Conference in Seoul (20)

Test in action – week 1
Test in action – week 1Test in action – week 1
Test in action – week 1
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
May: Automated Developer Testing: Achievements and Challenges
May: Automated Developer Testing: Achievements and ChallengesMay: Automated Developer Testing: Achievements and Challenges
May: Automated Developer Testing: Achievements and Challenges
 
Testing in Craft CMS
Testing in Craft CMSTesting in Craft CMS
Testing in Craft CMS
 
Structured Functional Automated Web Service Testing
Structured Functional Automated Web Service TestingStructured Functional Automated Web Service Testing
Structured Functional Automated Web Service Testing
 
Useful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvmUseful practices of creation automatic tests by using cucumber jvm
Useful practices of creation automatic tests by using cucumber jvm
 
Stopping the Rot - Putting Legacy C++ Under Test
Stopping the Rot - Putting Legacy C++ Under TestStopping the Rot - Putting Legacy C++ Under Test
Stopping the Rot - Putting Legacy C++ Under Test
 
Automated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesAutomated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and Challenges
 
Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)Gradle For Beginners (Serbian Developer Conference 2013 english)
Gradle For Beginners (Serbian Developer Conference 2013 english)
 
Joomla! Testing - J!DD Germany 2016
Joomla! Testing - J!DD Germany 2016Joomla! Testing - J!DD Germany 2016
Joomla! Testing - J!DD Germany 2016
 
Presentation
PresentationPresentation
Presentation
 
Testing
TestingTesting
Testing
 
Testing basics for developers
Testing basics for developersTesting basics for developers
Testing basics for developers
 
Performance testing jmeter
Performance testing jmeterPerformance testing jmeter
Performance testing jmeter
 
End_to_End_DevOps.pptx
End_to_End_DevOps.pptxEnd_to_End_DevOps.pptx
End_to_End_DevOps.pptx
 
Netserv Software Testing
Netserv Software TestingNetserv Software Testing
Netserv Software Testing
 
Renaissance of JUnit - Introduction to JUnit 5
Renaissance of JUnit - Introduction to JUnit 5Renaissance of JUnit - Introduction to JUnit 5
Renaissance of JUnit - Introduction to JUnit 5
 
2014 11 20 Drupal 7 -> 8 test migratie
2014 11 20 Drupal 7 -> 8 test migratie2014 11 20 Drupal 7 -> 8 test migratie
2014 11 20 Drupal 7 -> 8 test migratie
 
July 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification ProcessJuly 2012 HUG: Overview of Oozie Qualification Process
July 2012 HUG: Overview of Oozie Qualification Process
 
Unit tests and TDD
Unit tests and TDDUnit tests and TDD
Unit tests and TDD
 

Plus de Jongwook Woo

Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum ComputingJongwook Woo
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsJongwook Woo
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIJongwook Woo
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its TrendsJongwook Woo
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkJongwook Woo
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningJongwook Woo
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraJongwook Woo
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataJongwook Woo
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive AnalysisJongwook Woo
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLJongwook Woo
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryJongwook Woo
 
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeWhose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeJongwook Woo
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open PlatformJongwook Woo
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open DataJongwook Woo
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open DataJongwook Woo
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLJongwook Woo
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and SparkJongwook Woo
 

Plus de Jongwook Woo (20)

Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum Computing
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
 
Rating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and SparkRating Prediction using Deep Learning and Spark
Rating Prediction using Deep Learning and Spark
 
History and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep LearningHistory and Trend of Big Data and Deep Learning
History and Trend of Big Data and Deep Learning
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark MLPredictive Analysis of Financial Fraud Detection using Azure and Spark ML
Predictive Analysis of Financial Fraud Detection using Azure and Spark ML
 
Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart Factory
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeWhose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 

Dernier

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

2014 International Software Testing Conference in Seoul

  • 1. Testing Big Data: Unit Test in Hadoop (Part II) Jongwook Woo (PhD) High-Performance Internet Computing Center (HiPIC) Educational Partner with Cloudera and Grants Awardee of Amazon AWS Computer Information Systems Department California State University, Los Angeles Seoul Software Testing Conference
  • 2. Contents • Test in General • Use Cases: Big Data in Hadoop and Ecosystems • Unit Test in Hadoop Seoul Software Testing Conference
  • 3. Test in general • Quality Assurance – TDD (Test Driven Development) • Unit Test – Test functional units of the S/W – BDD (Behavior Driven Development) • Based on TDD • Test behavior of the S/W – Integration Test: • integrated components – Group of unit tests • CI (Continuous Integration) Server – Hudson, Jenkins etc Seoul Software Testing Conference
  • 4. CI Server • Continuous Integration Server – TDD (Test Driven Development) based • All developers commit the update everyday • CI server compile and run the unit tests • If a test fails, all receive the failure email – Know who committed a bad code – Hudson, Jenkins etc • Supports SCM version control tools – CVS, Subversion, Git Seoul Software Testing Conference
  • 5. Test in Hadoop • Much harder – JUnit cannot be used in Hadoop – Cluster – Server – Parallel Computing Seoul Software Testing Conference
  • 6. Use Cases: Shopzilla • Hadoop’s Elephant In The Room – Hadoop testing • Quality Assurance – Unit Test: functional units of the S/W – Integration Test: integrated components – BDD Test: Behavior of the S/W • Augmented Development – Use a dev cluster? » Too long per day – Hadoop-In-A-Box Seoul Software Testing Conference
  • 7. Use Cases: Shopzilla • Hadoop-In-A-Box – Fully compatible Mock Environment • Without a cluster • Mock cluster state – Test Locally • Single Node Pseudo Cluster • MiniMRCluster • => can test HDFS, Pig Seoul Software Testing Conference
  • 8. Use Cases: Yahoo • Developer – Wants to run Hadoop codes in the local machine • Does not want to run Hadoop codes at the Hadoop cluster • Yahoo HIT – Hadoop Integration Test – Run Hadoop tests in the Hadoop Ecosystems • Deploy HIT on a Hadoop single or cluster • Run tests in Hadoop, Pig, Hive, Oozie,… Seoul Software Testing Conference
  • 9. Unit Test in Hadoop • MRUnit testing framework – is based on JUnit – Cloudera donated to Apache – can test Map Reduce programs • written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop – Can test Mapper, Reducer, MapperReducer Seoul Software Testing Conference
  • 10. Unit Test in Hadoop • WordCount Example – reads text files and counts how often words occur. • The input and the output are text files, – Need three classes • WordCount.java – Driver class with main function • WordMapper.java – Mapper class with map method • SumReducer.java – Reducer class with reduce method Seoul Software Testing Conference
  • 11. WordCount Example • WordMapper.java – Mapper class with map function – For the given sample input • assuming two map nodes – The sample input is distributed to the maps • the first map emits: – <Hello, 1> <World, 1> <Bye, 1> <World, 1> • The second map emits: – <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1> Seoul Software Testing Conference
  • 12. WordCount Example • SumReducer.java – Reducer class with reduce function – For the input from two Mappers • the reduce method just sums up the values, – which are the occurrence counts for each key • Thus the output of the job is: – <Bye, 1> <Goodbye, 1> <Hadoop, 2> <Hello, 2> <World, 2> Seoul Software Testing Conference
  • 13. WordCount.java (Driver) import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 14. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Check Input and Output files Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 15. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Set output (key, value) types job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 16. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); Set Mapper/Reducer classes job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 17. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); Set Input/Output format classes job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 18. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); Set Input/Output paths FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 19. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); Set FileOutputFormat.setOutputPath(job, new Path(args[1])); Driver class job.setJarByClass(WordCount.class); job.submit(); } } Seoul Software Testing Conference
  • 20. WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.println("usage: [input] [output]"); System.exit(-1); } Job job = Job.getInstance(new Configuration()); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); Submit the job job.setJarByClass(WordCount.class); to the master node job.submit(); } } Seoul Software Testing Conference
  • 21. WordMapper.java (Mapper class) import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 22. WordMapper.java Extends mapper class with input/ output keys and values public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 23. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { Output (key, value) types private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 24. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); Input (key, value) types Output as Context type @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 25. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Read words from each line Context contex) throws IOException, InterruptedException { input file of the // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 26. WordMapper.java public class WordMapper extends Mapper<Object, Text, Text, IntWritable> { private Text word = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(Object key, Text value, Context contex) throws IOException, InterruptedException { // Break line into words for processing StringTokenizer wordList = new StringTokenizer(value.toString()); Count each word while (wordList.hasMoreTokens()) { word.set(wordList.nextToken()); contex.write(word, one); } } } Seoul Software Testing Conference
  • 27. Shuffler/Sorter • Maps emit (key, value) pairs • Shuffler/Sorter of Hadoop framework – Sort (key, value) pairs by key – Then, append the value to make (key, list of values) p air – For example, • The first, second maps emit: – <Hello, 1> <World, 1> <Bye, 1> <World, 1> – <Hello, 1> <Hadoop, 1> <Goodbye, 1> <Hadoop, 1> • Shuffler produces and it becomes the input of the reducer – <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>> , <World, <1,1>> Seoul Software Testing Conference
  • 28. SumReducer.java (Reducer class) import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 29. SumReducer.java Extends Reducer class with input/ output keys and values public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 30. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { Set output value type private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 31. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); Set input (key, list of values) type and output as Context class @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 32. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; For each word, Iterator<IntWritable> it=values.iterator(); Count/sum the number of values while (it.hasNext()) { wordCount += it.next().get(); } totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 33. SumReducer.java public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable totalWordCount = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; Iterator<IntWritable> it=values.iterator(); while (it.hasNext()) { wordCount += it.next().get(); } For each word, Total count becomes the value totalWordCount.set(wordCount); context.write(key, totalWordCount); } } Seoul Software Testing Conference
  • 34. SumReducer • Reducer – Input: Shuffler produces and it becomes the input of the reducer • <Bye, 1>, <Goodbye, 1>, <Hadoop, <1,1>>, <Hello, <1, 1>> , <World, <1,1>> – Output • <Bye, 1>, <Goodbye, 1>, <Hadoop, 2>, <Hello, 2>, <World, 2> Seoul Software Testing Conference
  • 35. MRUnit Test • How to UnitTest in Hadoop – Extending JUnit test • With org.apache.hadoop.mrunit.* API – Needs to test Driver, Mapper, Reducer • MapReduceDriver, MapDriver, ReduceDriver • Add input with expected output Seoul Software Testing Conference
  • 36. MRUnit Test import java.util.ArrayList; import java.util.List; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mrunit.MapDriver; import org.apache.hadoop.mrunit.MapReduceDriver; import org.apache.hadoop.mrunit.ReduceDriver; import org.junit.Before; import org.junit.Test; public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 37. MRUnit Test @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } } Seoul Software Testing Conference
  • 38. MRUnit Test Using MRUnit API, declare MapReduce, Mapper, Reducer drivers with input/output (key, value) public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 39. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; Run setUp() before executing each test method @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 40. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before Instantiate WordCount Mapper, Reducer public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 41. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set Mapper driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 42. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set Reducer driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 43. MRUnit Test public class TestWordCount { MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDr iver; MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; @Before public void setUp() { WordMapper mapper = new WordMapper(); SumReducer reducer = new SumReducer(); Instantiate and set MapperReducer driver with input/output (key, value) mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>(); mapDriver.setMapper(mapper); reduceDriver = new ReduceDriver<Text, IntWritable, Text, IntWritable>(); reduceDriver.setReducer(reducer); mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, I ntWritable>(); mapReduceDriver.setMapper(mapper); mapReduceDriver.setReducer(reducer); } Seoul Software Testing Conference
  • 44. MRUnit Test Mapper test: Define sample input with expected output @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } Seoul Software Testing Conference
  • 45. MRUnit Test @Test public void testMapper() { mapDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("cat"), new IntWritable(1)); mapDriver.withOutput(new Text("dog"), new IntWritable(1)); mapDriver.runTest(); } Reducer test: Define sample input with expected output @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("cat"), values); reduceDriver.withOutput(new Text("cat"), new IntWritable(2)); reduceDriver.runTest(); } Seoul Software Testing Conference
  • 46. MRUnit Test MapperReducer test: Define sample input with expected output @Test public void testMapReduce() { mapReduceDriver.withInput(new LongWritable(1), new Text("cat cat dog")); mapReduceDriver.addOutput(new Text("cat"), new IntWritable(2)); mapReduceDriver.addOutput(new Text("dog"), new IntWritable(1)); mapReduceDriver.runTest(); } } Seoul Software Testing Conference
  • 47. MRUnit Test in real • Need to implement unit tests – How many? • all Map, Reduce, Driver – Problems? • Mostly work – But it does not support complicated Map, Reduce APIs – How many problems you can detect • Depends on how well you implement MRUnit code Seoul Software Testing Conference
  • 48. Conclusion • • • • MRUnit for Hadoop Unit Test Development Integrate with QA site with CI server Need to use it Seoul Software Testing Conference
  • 50. References 1. Hadoop WordCount example with new map reduce api (http://codesfusion.blogspot.com/2013/1 0/hadoop-wordcount-with-new-map-reduce-api.html) 2. Hadoop Word Count Example (http://wiki.apache.org/hadoop/WordCount ) 3. Example: WordCount v1.0, Cloudera Hadoop Tutorial (http://www.cloudera.com/content/clouder a-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_walk_through.html ) 4. Testing Word Count (https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count) 5. Apache MRUnit Tutorial (https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial ) 6. Hadoop Integration Test Suite, Shopzilla (https://github.com/shopzilla/hadoop-integration-test-sui te ) 7. Hadoop’s Elepahnt in the Room, Jeremy Lucas, Shopzilla (http://tech.shopzilla.com/2013/04/hado ops-elephant-in-the-room/ ) 8. Facebook Test MapReduce Local (https://github.com/facebook/hadoop-20/blob/master/src/test/ org/apache/hadoop/mapreduce/TestMapReduceLocal.java ) 9. Yahoo HIT Hadoop Integrated Testing (http://www.slideshare.net/ydn/hi-tv3?from_search=1 ) Seoul Software Testing Conference