SlideShare une entreprise Scribd logo
1  sur  27
Testing Hadoop jobs
    with MRUnit

 Boulder/Denver Hadoop Users Group
                        05.12.2010


                     © 2010 Eric Wendelin
Eric Wendelin
Hadooper @returnpath
Blog: eriwen.com
Twitter: @eriwen
What is MRUnit?

• Testing library for MapReduce
• Developed by Cloudera
• Easy integration between MapReduce
  and standard testing tools (e.g. JUnit)

  cloudera.com/hadoop-mrunit
Why do I need that?
Testing without MRUnit
• Write tests that create JobConf or
  Configuration   objects
 •   conf.set(‘mapred.job.tracker’, ‘local’)

• Developing new test input files stored
  alongside MapReduce test code
• Lots of work to validate output files
 • External file I/O makes tests slooooow
MRUnit makes testing
Hadoop jobs easier
Testing with MRUnit

• No external test input or output files
 • Programmatically specified
• Less test harness code (but also perhaps
  less control)
• Concise, fast tests
Example
class ExampleTest() {
  private Example.MyMapper mapper
  private Example.MyReducer reducer
  private MapReduceDriver driver

    @Before void setUp() {
      mapper = new Example.MyMapper()
      reducer = new Example.MyReducer()
      driver = new MapReduceDriver(mapper, reducer)
    }

    @Test void testMapReduce() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
Example
class ExampleTest() {
  private Example.MyMapper mapper
  private Example.MyReducer reducer
  private MapReduceDriver driver

    @Before void setUp() {
      mapper = new Example.MyMapper()
      reducer = new Example.MyReducer()
      driver = new MapReduceDriver(mapper, reducer)
    }

    @Test void testMapReduce() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
          .withOutput(new Text(‘c’), new Text(‘d’))
          .runTest()
    }
}
Test map and reduce
    separately
class ExampleTest() {
  private Example.MyMapper mapper
  private MapDriver driver

    @Before void setUp() {
       mapper = new Example.MyMapper()
       driver = new MapDriver(mapper)
     }

    @Test void testMap() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
class ExampleTest() {
  private Example.MyReducer reducer
  private ReduceDriver driver

    @Before void setUp() {
       reducer = new Example.MyReducer()
       driver = new ReduceDriver(reducer)
     }

    @Test void testReduce() {
      driver.withInput(new Text(‘a’),
          [new Text(‘foo’), new Text(‘bar’)])
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
Counters!
driver.withInput(...)
driver.run()

def counters = driver.getCounters()

assertEquals(1, counters.findCounter
    (‘foo’, ‘bar’).getValue())
Verifying logging
def messages = []
def appender = [
    append: { messages.add(it) },
    requiresLayout: { false }
  ] as AppenderSkeleton
Logger.getRootLogger().addAppender(appender)

driver.runTest()

assertTrue messages.find {
    it.getLevel.toString() == ‘WARN’ &&
    it.getMessage().contains(‘My err’) }

Logger.getRootLogger().removeAppender(appender)
Cool stuff I haven’t
         tried...
• The   PipelineMapReduceDriver  - allows
  testing a series of MapReduce passes
 • Just call addMapReduce(mapper, reducer)
• Mock objects - MockReporter,
  MockInputSplit, and MockOutputCollector

• Test combiners with
  myMapReduceDriver.setCombiner(myCombiner)
Problems with MRUnit
Not useful for
streaming jobs
shell$ ./myMapper.py < test.input |
sort | ./myReducer.py > actual.out

shell$ diff expected.out actual.out
runTest()  does not
    give meaningful
information on failure
Better to use run() and
      then assert
driver.setInput(new Text(‘foo’),
    new Text(‘bar’))

def output = driver.run()

assertEquals ‘baz’, output[0].first
assertEquals ‘jy’, output[0].second
Documentation is
 severely lacking
runXxx()   calls setup()
called for new Hadoop
 API, but not old API
Tests are not executed
 in a distributed way
In Summary, MRUnit...

• Makes testing your Hadoop jobs easier
• Abstracts away a lot of the boilerplate test
  setup you need
• Has it’s problems
 • but they are outweighed by the benefits
?
cloudera.com/hadoop-mrunit


Blog: eriwen.com
Twitter: @eriwen
Email:
eric.wendelin@returnpath.net
                   © 2010 Eric Wendelin

Contenu connexe

Tendances

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data StreamsSujaAldrin
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Enginerajdeep
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Cloud Resource Management
Cloud Resource ManagementCloud Resource Management
Cloud Resource ManagementNASIRSAYYED4
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databasesGowriLatha1
 

Tendances (20)

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
MongoDB
MongoDBMongoDB
MongoDB
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Cloud Resource Management
Cloud Resource ManagementCloud Resource Management
Cloud Resource Management
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Comparison with Traditional databases
Comparison with Traditional databasesComparison with Traditional databases
Comparison with Traditional databases
 

En vedette

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr AgileNCR2013
 
Groovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonGroovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonEric Wendelin
 
Javascript Stacktrace Ignite
Javascript Stacktrace IgniteJavascript Stacktrace Ignite
Javascript Stacktrace IgniteEric Wendelin
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Eric Wendelin
 
JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!Eric Wendelin
 
Test your Javascript! v1.1
Test your Javascript! v1.1Test your Javascript! v1.1
Test your Javascript! v1.1Eric Wendelin
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and CorrectionTechiNerd
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding techniqueMantra VLSI
 
UNIT TESTING PPT
UNIT TESTING PPTUNIT TESTING PPT
UNIT TESTING PPTsuhasreddy1
 

En vedette (14)

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
 
Groovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonGroovy-er desktop applications with Griffon
Groovy-er desktop applications with Griffon
 
Javascript Stacktrace Ignite
Javascript Stacktrace IgniteJavascript Stacktrace Ignite
Javascript Stacktrace Ignite
 
Cn lec-06
Cn lec-06Cn lec-06
Cn lec-06
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Piggybacking
PiggybackingPiggybacking
Piggybacking
 
JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!
 
Test your Javascript! v1.1
Test your Javascript! v1.1Test your Javascript! v1.1
Test your Javascript! v1.1
 
Gradle by Example
Gradle by ExampleGradle by Example
Gradle by Example
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and Correction
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding technique
 
UNIT TESTING PPT
UNIT TESTING PPTUNIT TESTING PPT
UNIT TESTING PPT
 

Similaire à Testing Hadoop jobs with MRUnit

An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAnanth PackkilDurai
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduceAshok Agarwal
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
How and why i roll my own node.js framework
How and why i roll my own node.js frameworkHow and why i roll my own node.js framework
How and why i roll my own node.js frameworkBen Lin
 
Background Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbBackground Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbJuan Maiz
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportAnton Arhipov
 
AngularJS Testing Strategies
AngularJS Testing StrategiesAngularJS Testing Strategies
AngularJS Testing Strategiesnjpst8
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8Omar Bashir
 
Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Jason Lotito
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Your task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdfYour task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdfamie1085
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good TestsTomek Kaczanowski
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...GeeksLab Odessa
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 

Similaire à Testing Hadoop jobs with MRUnit (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
How and why i roll my own node.js framework
How and why i roll my own node.js frameworkHow and why i roll my own node.js framework
How and why i roll my own node.js framework
 
Background Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbBackground Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRb
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience Report
 
AngularJS Testing Strategies
AngularJS Testing StrategiesAngularJS Testing Strategies
AngularJS Testing Strategies
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
R console
R consoleR console
R console
 
Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Your task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdfYour task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdf
 
Testing in airflow
Testing in airflowTesting in airflow
Testing in airflow
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 

Dernier

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Testing Hadoop jobs with MRUnit

  • 1. Testing Hadoop jobs with MRUnit Boulder/Denver Hadoop Users Group 05.12.2010 © 2010 Eric Wendelin
  • 2. Eric Wendelin Hadooper @returnpath Blog: eriwen.com Twitter: @eriwen
  • 3. What is MRUnit? • Testing library for MapReduce • Developed by Cloudera • Easy integration between MapReduce and standard testing tools (e.g. JUnit) cloudera.com/hadoop-mrunit
  • 4. Why do I need that?
  • 5. Testing without MRUnit • Write tests that create JobConf or Configuration objects • conf.set(‘mapred.job.tracker’, ‘local’) • Developing new test input files stored alongside MapReduce test code • Lots of work to validate output files • External file I/O makes tests slooooow
  • 7. Testing with MRUnit • No external test input or output files • Programmatically specified • Less test harness code (but also perhaps less control) • Concise, fast tests
  • 8. Example class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 9. Example class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) .withOutput(new Text(‘c’), new Text(‘d’)) .runTest() } }
  • 10. Test map and reduce separately
  • 11. class ExampleTest() { private Example.MyMapper mapper private MapDriver driver @Before void setUp() { mapper = new Example.MyMapper() driver = new MapDriver(mapper) } @Test void testMap() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 12. class ExampleTest() { private Example.MyReducer reducer private ReduceDriver driver @Before void setUp() { reducer = new Example.MyReducer() driver = new ReduceDriver(reducer) } @Test void testReduce() { driver.withInput(new Text(‘a’), [new Text(‘foo’), new Text(‘bar’)]) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 13. Counters! driver.withInput(...) driver.run() def counters = driver.getCounters() assertEquals(1, counters.findCounter (‘foo’, ‘bar’).getValue())
  • 14. Verifying logging def messages = [] def appender = [ append: { messages.add(it) }, requiresLayout: { false } ] as AppenderSkeleton Logger.getRootLogger().addAppender(appender) driver.runTest() assertTrue messages.find { it.getLevel.toString() == ‘WARN’ && it.getMessage().contains(‘My err’) } Logger.getRootLogger().removeAppender(appender)
  • 15. Cool stuff I haven’t tried... • The PipelineMapReduceDriver - allows testing a series of MapReduce passes • Just call addMapReduce(mapper, reducer) • Mock objects - MockReporter, MockInputSplit, and MockOutputCollector • Test combiners with myMapReduceDriver.setCombiner(myCombiner)
  • 18. shell$ ./myMapper.py < test.input | sort | ./myReducer.py > actual.out shell$ diff expected.out actual.out
  • 19. runTest() does not give meaningful information on failure
  • 20. Better to use run() and then assert
  • 21. driver.setInput(new Text(‘foo’), new Text(‘bar’)) def output = driver.run() assertEquals ‘baz’, output[0].first assertEquals ‘jy’, output[0].second
  • 23. runXxx() calls setup() called for new Hadoop API, but not old API
  • 24. Tests are not executed in a distributed way
  • 25. In Summary, MRUnit... • Makes testing your Hadoop jobs easier • Abstracts away a lot of the boilerplate test setup you need • Has it’s problems • but they are outweighed by the benefits
  • 26. ?

Notes de l'éditeur