Azkaban is a batch workflow job scheduler created by LinkedIn to run Hadoop jobs. It features a simple web UI, scheduling of workflows, tracking of user actions, and email alerts. While it has a small community and only time-based scheduling, it can run command-line processes making it a good alternative to cron. The architecture includes a solo server mode for investigation and a two server mode for production.
1. Azkaban from
Solve the problem of Hadoop job dependencies
Now Voldemort can easily
manage his Hadoop jobs
Anatoliy Nikulin
2. Overview
Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs
Features:
● Compatible with any version of Hadoop
● Easy to use web UI
● Simple web and http workflow uploads
● Project workspaces
● Scheduling of workflows
● Modular and pluggable
● Authentication and Authorization
● Tracking of user actions
● Email alerts on failure and successes
● SLA alerting and auto killing
● Retrying of failed jobs
4. Azkaban Pros/Cons
Pros:
● Simple workflow configuration
● Rich DAG visualization
● User-friendly Web UI
● Jobs history
● Easy access to log files
Cons:
● Small community (mostly Linkedin)
● Only time based scheduling.
● Unable to run none-Hadoop tasks in distributed mode
5. Architecture
There are two versions:
● solo server mode - All in one process (H2 instead MySQL). Good choice for investigation
● two server mode - For production work
6. What about none-Hadoop jobs?
Azkaban able to handle it
● It can run command-line processes
● Good alternative for Crontab
8. What about native Hadoop
scheduler?
Oozie - Scheduler framework. Also good tool
Pros:
● Rich and very powerful configuration abilities for Workflow
● Rich API (REST, command-line)
● Integrated with Cloudera
● Large community
● Good documentation
Cons:
● Complex configuration with XML hell !
● Pure visualization of workflow