SlideShare a Scribd company logo
1 of 18
Unit Testing Map Reduce Jobs in
                      Hadoop


Speaker Details :

Anirudh Bhatnagar
     Senior Consultant-Xebia India
     abhatnagar@xebia.com
Sanchit Agarwal
     Senior Consultant-Xebia India
     sagarwal@xebia.com
Agenda
●   Hadoop Introduction
●   What is Map Reduce [Sample Code]
●   Map-Reduce Testing using Mockito [Sample Code]
●   Shortcomings with Mockito
●   MRUnit Test Harness [Sample Code]
●   Advantages of MRUnit
●   What Lies Ahead
What is Hadoop??
WHY Hadoop???
How Hadoop works?
What is Map Reduce
Map Reduce Execution
Sample Map Reduce Code
 ●   All examples and setup is done for a single
     node cluster

- map(LongWritable key, Text value, Context
context) {Mapper Class}

- reduce(Text key, Iterable<IntWritable>
values, Context context) {Reducer Class}
Problem Statement
To find the top trend of all the given tags in
             different user logs
Sample Code Unit Testing with
             Mockito
●   No MRUnit code used
Shortcoming with Mockito
●   Not very intuitive for Map Reduce style of
    programming
●   Semantics for Map-Reduce are different in
    subtle ways as compared to how it is done
    with Mockito
●   Might be equally good in some scenarios and
    might fail to cover more complex scenarios
MRUnit Test Harness
●   Very intuitive for Map-Reduce style of prorgamming
●   MRUnit helps bridge the gap between MapReduce programs
    and JUnit by providing a set of interfaces and test harnesses,
    which allow MapReduce programs to be more easily tested
    using standard tools and practices.
●   Provides 4 drivers for seperately testing Map-Reduce code
    –   MapDriver
    –   ReduceDriver
    –   MapReduceDriver
    –   PipelineMapReduceDriver
Sample Code with MRunit
●   Used in combination with Junit to get better
    control on log messages
●   Easily integrable with Junit
Gotchas With MRUnit
●   MapDriver.withInput supports only one input
    types, multiple inputs are replaced sequentially
    and last one is used
●   Handle runTest() and run() methods with care,
    runTest() runs the test and returns void while
    run() executes the test and return a list of
    output map.
●   PipelineMapReduceDriver only supports old
    Hadoop API
What Lies Ahead
●   MiniMRCluster and MiniDFSCluster classes
    offer full-blown in-memory MapReduce and
    HDFS clusters, and can launch multiple
    MapReduce and HDFS nodes
●   Best Practices and Debugging techniques for
    Map-Reduce
Questions??
Bibliography
●   Books
    –   Hadoop in Practice
    –   Hadoop Definitive Guide
    –   Hadoop in Action
●   Links
    –   http://hadoop.apache.org/
●   Blogs
    –   http://codingjunkie.net/testing-hadoop-programs-with-mrunit/
    –   http://java.dzone.com/articles/effective-testing-strategies
    –   https://github.com/alexholmes/blog/blob/master/_posts/2012-10-20-
        hadoop-unit-testing-with-minimrcluster.markdown

More Related Content

What's hot

Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxInfluxData
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxvishal choudhary
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
 
Grid search (parameter tuning)
Grid search (parameter tuning)Grid search (parameter tuning)
Grid search (parameter tuning)Akhilesh Joshi
 
AJUG April 2011 Cascading example
AJUG April 2011 Cascading exampleAJUG April 2011 Cascading example
AJUG April 2011 Cascading exampleChristopher Curtin
 
scalable machine learning
scalable machine learningscalable machine learning
scalable machine learningSamir Bessalah
 
C++ and OOPS Crash Course by ACM DBIT | Grejo Joby
C++ and OOPS Crash Course by ACM DBIT | Grejo JobyC++ and OOPS Crash Course by ACM DBIT | Grejo Joby
C++ and OOPS Crash Course by ACM DBIT | Grejo JobyGrejoJoby1
 
Advanced functional programing in Swift
Advanced functional programing in SwiftAdvanced functional programing in Swift
Advanced functional programing in SwiftVincent Pradeilles
 
Operator overloading
Operator overloadingOperator overloading
Operator overloadingBurhan Ahmed
 
operator overloading & type conversion in cpp over view || c++
operator overloading & type conversion in cpp over view || c++operator overloading & type conversion in cpp over view || c++
operator overloading & type conversion in cpp over view || c++gourav kottawar
 
Operator overloading and type conversion in cpp
Operator overloading and type conversion in cppOperator overloading and type conversion in cpp
Operator overloading and type conversion in cpprajshreemuthiah
 
operator overloading
operator overloadingoperator overloading
operator overloadingNishant Joshi
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional featuresRafal Rybacki
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 

What's hot (20)

Monads in Swift
Monads in SwiftMonads in Swift
Monads in Swift
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for Flux
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
K fold
K foldK fold
K fold
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Grid search (parameter tuning)
Grid search (parameter tuning)Grid search (parameter tuning)
Grid search (parameter tuning)
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
AJUG April 2011 Cascading example
AJUG April 2011 Cascading exampleAJUG April 2011 Cascading example
AJUG April 2011 Cascading example
 
scalable machine learning
scalable machine learningscalable machine learning
scalable machine learning
 
C++ and OOPS Crash Course by ACM DBIT | Grejo Joby
C++ and OOPS Crash Course by ACM DBIT | Grejo JobyC++ and OOPS Crash Course by ACM DBIT | Grejo Joby
C++ and OOPS Crash Course by ACM DBIT | Grejo Joby
 
Advanced functional programing in Swift
Advanced functional programing in SwiftAdvanced functional programing in Swift
Advanced functional programing in Swift
 
Operator overloading
Operator overloadingOperator overloading
Operator overloading
 
operator overloading & type conversion in cpp over view || c++
operator overloading & type conversion in cpp over view || c++operator overloading & type conversion in cpp over view || c++
operator overloading & type conversion in cpp over view || c++
 
Lecture5
Lecture5Lecture5
Lecture5
 
Operator overloading and type conversion in cpp
Operator overloading and type conversion in cppOperator overloading and type conversion in cpp
Operator overloading and type conversion in cpp
 
operator overloading
operator overloadingoperator overloading
operator overloading
 
Operator overloading
Operator overloadingOperator overloading
Operator overloading
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional features
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 

Viewers also liked

What is hadoop and how it works?
What is hadoop and how it works?What is hadoop and how it works?
What is hadoop and how it works?Cnu Federer
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and CorrectionTechiNerd
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding techniqueMantra VLSI
 

Viewers also liked (6)

Cn lec-06
Cn lec-06Cn lec-06
Cn lec-06
 
Piggybacking
PiggybackingPiggybacking
Piggybacking
 
What is hadoop and how it works?
What is hadoop and how it works?What is hadoop and how it works?
What is hadoop and how it works?
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and Correction
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding technique
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Similar to Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr

Hadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. ElephantHadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. ElephantAkshay Rai
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGMapR Technologies
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLinaro
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Ganesh Raju
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
Map reduce debugging with jumbune
Map reduce debugging with jumbuneMap reduce debugging with jumbune
Map reduce debugging with jumbuneMahesh Nair
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming ModelAdarshaDhakal
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduceHC Lin
 
Hadoop Training in Hyderabad,Hadoop Training Institute in Hyderabad
Hadoop Training in Hyderabad,Hadoop Training Institute in HyderabadHadoop Training in Hyderabad,Hadoop Training Institute in Hyderabad
Hadoop Training in Hyderabad,Hadoop Training Institute in Hyderabadchariorienit
 
Quality Hadoop Training
Quality Hadoop TrainingQuality Hadoop Training
Quality Hadoop TrainingMartin James
 
Hadoop Institute in Hyderabad,Hadoop Training Institutes in Hyderabad
Hadoop Institute in Hyderabad,Hadoop Training Institutes in HyderabadHadoop Institute in Hyderabad,Hadoop Training Institutes in Hyderabad
Hadoop Institute in Hyderabad,Hadoop Training Institutes in Hyderabadchariorienit
 
Hadoop training in hyderabad
Hadoop training in hyderabadHadoop training in hyderabad
Hadoop training in hyderabadsreehari orienit
 

Similar to Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr (20)

Hadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. ElephantHadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. Elephant
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
 
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96BoardsLAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
Map reduce debugging with jumbune
Map reduce debugging with jumbuneMap reduce debugging with jumbune
Map reduce debugging with jumbune
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming Model
 
Edge and ai
Edge and aiEdge and ai
Edge and ai
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduce
 
Hadoop Training in Hyderabad,Hadoop Training Institute in Hyderabad
Hadoop Training in Hyderabad,Hadoop Training Institute in HyderabadHadoop Training in Hyderabad,Hadoop Training Institute in Hyderabad
Hadoop Training in Hyderabad,Hadoop Training Institute in Hyderabad
 
Hadoop
HadoopHadoop
Hadoop
 
Quality Hadoop Training
Quality Hadoop TrainingQuality Hadoop Training
Quality Hadoop Training
 
Hadoop Institute in Hyderabad,Hadoop Training Institutes in Hyderabad
Hadoop Institute in Hyderabad,Hadoop Training Institutes in HyderabadHadoop Institute in Hyderabad,Hadoop Training Institutes in Hyderabad
Hadoop Institute in Hyderabad,Hadoop Training Institutes in Hyderabad
 
Hadoop training in hyderabad
Hadoop training in hyderabadHadoop training in hyderabad
Hadoop training in hyderabad
 

More from AgileNCR2013

Agile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automationAgile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automationAgileNCR2013
 
Agile NCR 2013-Tushar Soimya - Executives role in agile
Agile NCR 2013-Tushar Soimya - Executives role in agileAgile NCR 2013-Tushar Soimya - Executives role in agile
Agile NCR 2013-Tushar Soimya - Executives role in agileAgileNCR2013
 
Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..
Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..
Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..AgileNCR2013
 
Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...
Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...
Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...AgileNCR2013
 
Agile NCR 2013- Jainendra Kumar - agilemethodology-pitneybowe-jai1
Agile NCR 2013-  Jainendra Kumar - agilemethodology-pitneybowe-jai1Agile NCR 2013-  Jainendra Kumar - agilemethodology-pitneybowe-jai1
Agile NCR 2013- Jainendra Kumar - agilemethodology-pitneybowe-jai1AgileNCR2013
 
Agile NCR 2013 - Archana Joshi - maintaining agile equilibrium v4
Agile NCR 2013 - Archana Joshi -  maintaining agile equilibrium v4Agile NCR 2013 - Archana Joshi -  maintaining agile equilibrium v4
Agile NCR 2013 - Archana Joshi - maintaining agile equilibrium v4AgileNCR2013
 
Agile NCR 2013 - Puneet sachdev - Pragmatic Agile Adoption
Agile NCR 2013 - Puneet sachdev - Pragmatic Agile AdoptionAgile NCR 2013 - Puneet sachdev - Pragmatic Agile Adoption
Agile NCR 2013 - Puneet sachdev - Pragmatic Agile AdoptionAgileNCR2013
 
Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...
Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...
Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...AgileNCR2013
 
Agile NCR 2013 - Serge Beaumont - welcome to agile ncr 2014
Agile NCR 2013 - Serge Beaumont -  welcome to agile ncr 2014 Agile NCR 2013 - Serge Beaumont -  welcome to agile ncr 2014
Agile NCR 2013 - Serge Beaumont - welcome to agile ncr 2014 AgileNCR2013
 
Agile NCR 2013 - Seema Verma - energizing hr for agile excellence-competency...
Agile NCR 2013 - Seema Verma -  energizing hr for agile excellence-competency...Agile NCR 2013 - Seema Verma -  energizing hr for agile excellence-competency...
Agile NCR 2013 - Seema Verma - energizing hr for agile excellence-competency...AgileNCR2013
 

More from AgileNCR2013 (10)

Agile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automationAgile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automation
 
Agile NCR 2013-Tushar Soimya - Executives role in agile
Agile NCR 2013-Tushar Soimya - Executives role in agileAgile NCR 2013-Tushar Soimya - Executives role in agile
Agile NCR 2013-Tushar Soimya - Executives role in agile
 
Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..
Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..
Agile NCR 2013 - Pooja Jagtap - presentation on thinking beyond strategic hr..
 
Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...
Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...
Agile NCR 2013- Shekhar Gulati - Open shift platform-for-rapid-and-agile-deve...
 
Agile NCR 2013- Jainendra Kumar - agilemethodology-pitneybowe-jai1
Agile NCR 2013-  Jainendra Kumar - agilemethodology-pitneybowe-jai1Agile NCR 2013-  Jainendra Kumar - agilemethodology-pitneybowe-jai1
Agile NCR 2013- Jainendra Kumar - agilemethodology-pitneybowe-jai1
 
Agile NCR 2013 - Archana Joshi - maintaining agile equilibrium v4
Agile NCR 2013 - Archana Joshi -  maintaining agile equilibrium v4Agile NCR 2013 - Archana Joshi -  maintaining agile equilibrium v4
Agile NCR 2013 - Archana Joshi - maintaining agile equilibrium v4
 
Agile NCR 2013 - Puneet sachdev - Pragmatic Agile Adoption
Agile NCR 2013 - Puneet sachdev - Pragmatic Agile AdoptionAgile NCR 2013 - Puneet sachdev - Pragmatic Agile Adoption
Agile NCR 2013 - Puneet sachdev - Pragmatic Agile Adoption
 
Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...
Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...
Agile NCR 2013 - Milind Agnihotri - Agile &amp; the imperatives of effective ...
 
Agile NCR 2013 - Serge Beaumont - welcome to agile ncr 2014
Agile NCR 2013 - Serge Beaumont -  welcome to agile ncr 2014 Agile NCR 2013 - Serge Beaumont -  welcome to agile ncr 2014
Agile NCR 2013 - Serge Beaumont - welcome to agile ncr 2014
 
Agile NCR 2013 - Seema Verma - energizing hr for agile excellence-competency...
Agile NCR 2013 - Seema Verma -  energizing hr for agile excellence-competency...Agile NCR 2013 - Seema Verma -  energizing hr for agile excellence-competency...
Agile NCR 2013 - Seema Verma - energizing hr for agile excellence-competency...
 

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr

  • 1. Unit Testing Map Reduce Jobs in Hadoop Speaker Details : Anirudh Bhatnagar Senior Consultant-Xebia India abhatnagar@xebia.com Sanchit Agarwal Senior Consultant-Xebia India sagarwal@xebia.com
  • 2. Agenda ● Hadoop Introduction ● What is Map Reduce [Sample Code] ● Map-Reduce Testing using Mockito [Sample Code] ● Shortcomings with Mockito ● MRUnit Test Harness [Sample Code] ● Advantages of MRUnit ● What Lies Ahead
  • 6. What is Map Reduce
  • 8. Sample Map Reduce Code ● All examples and setup is done for a single node cluster - map(LongWritable key, Text value, Context context) {Mapper Class} - reduce(Text key, Iterable<IntWritable> values, Context context) {Reducer Class}
  • 9. Problem Statement To find the top trend of all the given tags in different user logs
  • 10. Sample Code Unit Testing with Mockito ● No MRUnit code used
  • 11. Shortcoming with Mockito ● Not very intuitive for Map Reduce style of programming ● Semantics for Map-Reduce are different in subtle ways as compared to how it is done with Mockito ● Might be equally good in some scenarios and might fail to cover more complex scenarios
  • 12. MRUnit Test Harness ● Very intuitive for Map-Reduce style of prorgamming ● MRUnit helps bridge the gap between MapReduce programs and JUnit by providing a set of interfaces and test harnesses, which allow MapReduce programs to be more easily tested using standard tools and practices. ● Provides 4 drivers for seperately testing Map-Reduce code – MapDriver – ReduceDriver – MapReduceDriver – PipelineMapReduceDriver
  • 13. Sample Code with MRunit ● Used in combination with Junit to get better control on log messages ● Easily integrable with Junit
  • 14. Gotchas With MRUnit ● MapDriver.withInput supports only one input types, multiple inputs are replaced sequentially and last one is used ● Handle runTest() and run() methods with care, runTest() runs the test and returns void while run() executes the test and return a list of output map. ● PipelineMapReduceDriver only supports old Hadoop API
  • 15. What Lies Ahead ● MiniMRCluster and MiniDFSCluster classes offer full-blown in-memory MapReduce and HDFS clusters, and can launch multiple MapReduce and HDFS nodes ● Best Practices and Debugging techniques for Map-Reduce
  • 16.
  • 18. Bibliography ● Books – Hadoop in Practice – Hadoop Definitive Guide – Hadoop in Action ● Links – http://hadoop.apache.org/ ● Blogs – http://codingjunkie.net/testing-hadoop-programs-with-mrunit/ – http://java.dzone.com/articles/effective-testing-strategies – https://github.com/alexholmes/blog/blob/master/_posts/2012-10-20- hadoop-unit-testing-with-minimrcluster.markdown