Scaling API-first – The story of a global engineering organization
July 2012 HUG: Overview of Oozie Qualification Process
1. Overview of Oozie QE Qualification Process
Michelle Chiang
07/18/2012
2. Agenda
• What is Oozie
• Qualification stages
• Challenges
• Future tasks
• Q&A
3. What is Oozie?
• Scalable, secure workflow scheduling
system for Hadoop.
– Three levels of jobs
• Workflow job
– Support actions such as MR, Pig, Java, Distcp
• Coordinator job
– Scheduling
• Bundle job
– Monitor status of coordinator jobs
4. Job Submission to Hadoop
Oozie
Client Hadoop Cluster
1. CLI Job
Tracker Actual
2. Java Client API M/R Job
3. WS API
Launcher
Mapper
Oozie
Server
5. QE Qualification Process
• Develop test plan in design cycle
• Design and implement test cases
• Execute tests
• Prepare release notes & certification
• Support production deployment and
customers’ FAQs
6. Develop Test Plan
• Prepare test plan for new features
defined in PRD, or
• Prepare test plan for the selected new
features checked into the apache source
• Define test strategy
• Test plan is reviewed by QE and Dev
7. Test plan example
• test plan for “shell” action
Case Execution Expected results Comment
ID
Ticket # Shell action 1. Read env var, compare action data Pass/Fail, bug#/JIRA#
2. Read config env var
3. Hadoop fs –ls; hadoop fs -cp
Test_sh* Bash shell 1, 2, 3
Perl script 1, 2
Python script 1, 2
Java 1, 2
C++ 1, 2
8. Design and implement test cases
Design Prepare
Build Verify/Bug Automate Demo
test case test data
9. Unit tests
• Unit tests
– 784 unit tests
– code coverage: 72%
– Checked in with code by developers
– Executed by CI build as a Jenkins job
10. Functional tests
• Functional tests (including regression
tests) as of 3.2.0:
– Use real systems (hadoop, oozie), not
minicluster or minioozie
– 1129 shell-based tests
– 146 Java OozieClient API tests (in testNG)
– Runtime: 36 hours, on 2 servers/clusters
• Manual setup time: 20min
11. Shell-based tests
• Assumptions:
– secure hadoop cluster is up
– oozie server is configured and up.
• 2 types of tests
– Individualized feature tests
• Customized validation
• Self-contained
– 1 script drives many tests
• Good for repetitive testing, e.g., schema tests
12. Example: run.sh
• Prepare: generate job
prepare property file based on
given conf and template
• Upload: delete existing
upload data, and upload
application/data to hdfs
• Submit: submit oozie jobs
submit • Verify: check jobs finish
successfully
verify
13. Test validation (1)
• Add validation into the workflow.xml
– Apply decision node to check
• wf:actionData
• fs:exists
• Other EL functions
– Apply Java action to verify
• capture-output
14. Test validation (2)
• Add validation into run.sh
– Apply oozie client commands to check
• Job status, log, configurations, definition, dryrun
– Apply shell commands to parse results
• Download output data, parse and compare
15. Integration
• Integration tests:
– 15 tests, within hadoop eco system
• Including Hadoop, Pig, Hcatalog, Distcp.
– Runtime: ~5 hours (oozie tests only)
• Manual setup time: 30min
• Plus, test package preparation & test run: 3 hr
– Examples
• Oozie and MapReduce
• Oozie and Pig
• Oozie and Hcatalog
16. Stress tests (1)
• Performance/stress/longevity tests:
– 10 tests
– Runtime:
• 12 hours for performance/stress tests
• 7 days for longevity testing.
• Manual setup & analysis time: ~ 10min per test
17. Stress tests (2)
• Performance metrics:
– job submission rate
– status update
– no failed jobs
– number of jobs submitted vs. completed
• Longevity tests:
– 300 wf jobs/min for 7 days ~= 3M jobs
18. Memory tests
• Memory/stress tests:
– 3 tests
– Runtime: ~ 10 hours.
• Manual setup & analysis: 30min per test
– Examples:
• Purge big amount of wf/coord/bundle jobs
• Query a coord job with 100k actions
• Query a coord job with 8k actions by N threads
19. Upgrade/installation tests
• Upgrade tests:
– 14 tests
– Runtime: 4 hours (manual setup: 2hr)
– steps:
• Submit wf/coord/bundle jobs
• Shut down oozie server
• Upgrade database schema, oozie version, oozie
config
• Restart oozie server
20. Release notes and certification
• Release notes
– New features
– Package version and new settings
– New db schema
• Certification
– Number of tests being executed and pass
rate
– Known issues
21. Production and customer support
• Document FAQs, e.g., usage of new
features
• Support production deployment issues
• Meet customers’ SLA requirements
22. Experiences learned (1)
• Add “time-out” to the test script
– If the test fails to reach expected status
• Carefully timed the verification step to
catch transient states.
– Job status transition, e.g., from PREP to
RUNNING to PAUSED
24. Experiences learned (3)
• Accumulate large number of jobs for
testing
– Increase materialization window
– Reduce materialization look up interval
– Coordinator job’s frequency, duration
• Also, check database memory usage
25. Experiences learned (4)
• Check oozie job log, tomcat server log,
hadoop jobtracker log for debugging
• Dev adds debugging statements
26. Challenges - production issues
• Reproduce and debug issues in QE
environment.
• Set up QE environment as close to
production as possible.
– Recent story: using CNAME for oozie URL.
27. Challenges – backward compatibility
• Oozie always guarantees backward
compatibility
– Web-service API
– Job definitions
– Client API
• Verify old jobs continue to run in new
release
28. Challenges – multiple versions
• Compatibility of multiple versions of other
components
– Hadoop API
– Pig
– Hcatalog
29. Work in Progress (1)
• Increase test coverage
– Java based, testNG framework
– Server-side oozie white box testing
– Improved web service API testing
30. Work in Progress (2)
• Hadoop 2.x integration testing,
including HDFS federation.
• Memory monitoring framework
• Performance benchmark framework
• Of course, new oozie releases
31. Open sourcing
• Short term: Shell based tests
– Review file/data structure
– Add readme, copyright, etc
– Work in progress
• Long term: Java based tests
– oozie-core, oozie-client, oozie-ws
32. Y! Oozie QE team
QE Architect
Jane Q. Chen qianchen@yahoo-inc.com
QE Engineer
Marcy Chen marchen@yahoo-inc.com
QE Engineer
Michelle Chiang mchiang@yahoo-inc.com
33. Acknowledgement
• All oozie developers in the community!
http://incubator.apache.org/oozie/
oozie-dev@incubator.apache.org
36. An Oozie Workflow
MapReduce OK
Streaming job
FS job OK
start fork join
(mkdir)
Pig job OK Case1
Decision
Case2
MapReduce
job
OK
Java
Action
OK
OK FS job
end
(chmod)
37. Oozie ‘Wordcount’ Workflow Example
• Non-Oozie (single map-reduce job)
From Gateway,
[yourid@gwgd2211 ~]$ hadoop jar hadoop-examples.jar wordcount
-Dmapred.job.queue.name=queue_name inputDir outputDir
• Oozie
MapReduce OK
Start
wordcount
End Workflow.xml
ERROR
Kill
39. Integration tests
• Compatible with other components
• No system failures, e.g., NN, JT,
Hcat_server
• Run standalone utility to narrow down
issues
– For example, pig, distcp
• Check oozie’s launcher log on Jobtracker
40. Production environment
• Total number of nodes: 42K+
• Total number of Clusters: 25+
– 1 oozie server per cluster
• Total number of processed jobs ≈ 750K/month