1. Unit Testing Map Reduce Jobs in
Hadoop
Speaker Details :
Anirudh Bhatnagar
Senior Consultant-Xebia India
abhatnagar@xebia.com
Sanchit Agarwal
Senior Consultant-Xebia India
sagarwal@xebia.com
2. Agenda
● Hadoop Introduction
● What is Map Reduce [Sample Code]
● Map-Reduce Testing using Mockito [Sample Code]
● Shortcomings with Mockito
● MRUnit Test Harness [Sample Code]
● Advantages of MRUnit
● What Lies Ahead
8. Sample Map Reduce Code
● All examples and setup is done for a single
node cluster
- map(LongWritable key, Text value, Context
context) {Mapper Class}
- reduce(Text key, Iterable<IntWritable>
values, Context context) {Reducer Class}
11. Shortcoming with Mockito
● Not very intuitive for Map Reduce style of
programming
● Semantics for Map-Reduce are different in
subtle ways as compared to how it is done
with Mockito
● Might be equally good in some scenarios and
might fail to cover more complex scenarios
12. MRUnit Test Harness
● Very intuitive for Map-Reduce style of prorgamming
● MRUnit helps bridge the gap between MapReduce programs
and JUnit by providing a set of interfaces and test harnesses,
which allow MapReduce programs to be more easily tested
using standard tools and practices.
● Provides 4 drivers for seperately testing Map-Reduce code
– MapDriver
– ReduceDriver
– MapReduceDriver
– PipelineMapReduceDriver
13. Sample Code with MRunit
● Used in combination with Junit to get better
control on log messages
● Easily integrable with Junit
14. Gotchas With MRUnit
● MapDriver.withInput supports only one input
types, multiple inputs are replaced sequentially
and last one is used
● Handle runTest() and run() methods with care,
runTest() runs the test and returns void while
run() executes the test and return a list of
output map.
● PipelineMapReduceDriver only supports old
Hadoop API
15. What Lies Ahead
● MiniMRCluster and MiniDFSCluster classes
offer full-blown in-memory MapReduce and
HDFS clusters, and can launch multiple
MapReduce and HDFS nodes
● Best Practices and Debugging techniques for
Map-Reduce