HIT, which stands for Hadoop Integration Testing, is a Yahoo! framework for assembling Hadoop components into a full Stack and running integration tests to make sure that the components can inter-operate with each other. HIT aims to:
- build fully automated, modular, scalable and flexible Hadoop stack deployment and test framework
- develop integration processes and tools for development, quality engineering, operations, and customers
- grow participation and evolve into a comprehensive self-service stack deployment and test solution
HIT is designed as an open system to plug in any type of testing. We will also share new developments around HIT and how it can be a Platform for all testing and automation.
Speaker: Baljit Deot, Technical Yahoo!, Cloud Engineering Group, Yahoo!
3. Overview
Hadoop Grid != HDFS and MR
› It is the whole ecosystem, including monitoring
Set up of the Hadoop ecosystem for testing is a complex problem
› Consistently repeatable
› Co-ordinated install of multiple packages
› Setup necessary schemas and do the “wiring”
› Make the packages work with other internal proprietary systems
› Automated!
Run Hadoop tests for everything in the Hadoop ecosystem
Record and report results across builds/releases
Yahoo! Confidential & Proprietary. 3 2/22/2013
4. Solution
Yahoo’s HIT (Hadoop Integrated Testing)
› Deploy and certify the complete Hadoop ecosystem with one click
• Assemble all products on a given cluster
• Run integration/functional tests
• Record and report results
› Repeatable automated environment setup
• Select available test environment
• Wipe clean and install Hadoop and all components every time
• Setup necessary database schemas
Yahoo! Confidential & Proprietary. 4 2/22/2013
5. Legend:
HIT Deployment Hadoop-related HIT-related
Hadoop cluster
Test driver NN dn
Hudson
job Isolated virtual environment
Core tests Pig tests NN2
dn
HIT
HIT Oozie tests Nova tests JT
DAQ tests …..
dn
results
HDFS
proxy
Pig Vaidya …..
Hadoop client
Oozie
distcp Oozie cli ….. dn
Results
storage HDFS MR Hadoop core
dn DAQ
Nova
6. Legend:
Workflow 2 Deploy cluster
Hadoop-related
1 Start certification HIT-related
Hudson HIT job
3 Deploy HIT
TAG: H22.rc2.5
Click to run HIT
Results page
5 View results
TAG: H22.rc2.5
Hadoop: pass
Pig: pass
Hive: pass
Oozie: pass
…
4 Run tests
6
8. HIT Key Values
HIT system benefits
› Enable integration testing
› Provide safety net for deployment to sandbox and production grids
› Provide comprehensive reference platform
› Empower engineering teams through self-service Hadoop stack integration and
testing.
› Provide test solution for teams with no test environment
› Enforce repeatable automated environment in QE teams
› Give QE a way to deploy a full Hadoop Stack
› Standard tool for entry/exit criterion for QE
› Viable framework for CI – nightly
8
9. Future direction
Command line interface
› Enable developer pre-commit testing “on the box”
› Develop a fully automated commit-to-deployment process
Contributions to the community
Yahoo! Confidential & Proprietary. 9 2/22/2013
Editor's Notes
Introduction of tools like Igor or yinst should have been done already, but was not. And therefore it falls to HIT project to do it.