2. Introduc?ons
• Who I am
• Technical Lead at Yahoo!
• Oozie Team
• Architecture, Development, Management
– Mayank Bansal
– Angelo Huang
– Mohammad Islam
– Amol Kekre
– Andreas Newman
– Lei Zhang
• External contributors.
• QE
– Marcy Chang
– Michelle Chiang
5. Overview: Coordinator
• Oozie executes workflow based on:
– Time Dependency (Frequency)
– Data Dependency
• Introduced in Oozie 2.x.
Oozie Server
Check
WS API Oozie Data Availability
Coordinator
Oozie
Oozie Workflow
Client Hadoop
6. Bundle
• What is Bundle?
– A new abstraccon layer on top of Coordinator.
– Users can define and execute a bunch of
coordinator applicacons.
– Introduced in Oozie 3.x.
• Why it is required?
– Data pipeline: A set of inter‐related coordinators
applicacon required for large data processing.
– Operaconal nightmare: Hard to maintain and
control these pipelines for Service Engineering
team.
10. Improved Usability
• Issue: Coordinator job’s status is not intuicve and
causes confusion to the Oozie user.
• Impact: User confusion and related Oozie
support.
• Reason:
– Status SUCCEEDED doesn’t mean job is successful!!
– Status PREMATER is for oozie internal use only. But it
was exposed to user.
• Resolucon:
– Redesign Coordinator status
11. Coordinator Status Redesign
Current SUSPENDED KILLED
PREP PREMATER Running SUCCEEDED
FAILED
New SUSPENDED KILLED
SUCCEEDED
PREP Running
DONE_WITH_ERROR
PAUSED FAILED
12. Future Plan
• Higher Scalability: Change polling‐based data‐
dependency check to push‐model through HCatalog
and Nocficacon system.
• Adaptability: Graceful handling Hadoop downcme:
– If Hadoop is down, block submission.
– When Hadoop becomes available
• Submit the blocked job
• Auto‐resubmit the untraced job.
• Monitoring: Rich WS API for applicacon Monitoring/
Alercng.