Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 26 Publicité

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

Télécharger pour lire hors ligne

Running scheduled, long-running or repetitive workflows on Hadoop clusters, especially secure clusters, is the domain of Apache Oozie. Oozie, however, suffers from XML for job configuration and a dated UI -- very bad usability in all. Apache Ambari, in its quest to make cluster management easier, has branched out to offering views for user services. This talk covers the Ambari Workflow Manager view which provides a GUI to author and visualize Oozie jobs.

To provide an example of Workflow Manager, Oozie jobs for log management and HBase compactions will be demonstrated showing off how easy Oozie can now be and what the exciting future for Oozie and Workflow Manager holds.

Apache Oozie is the long-time incumbent in big data processing. It is known to be hard to use and the interface is not aesthetically pleasing -- Oozie suffers from a dated UI. However, for secure Hadoop clusters, Oozie is the most readily available, obvious and full featured solution.

Apache Ambari is a deployment and configuration management tool used to deploy Hadoop clusters. Ambari Workflow Manager is a new Ambari view that helps address the usability and UI appeal of Apache Oozie.

In this talk, we’re going to leverage the stable foundation of Apache Oozie and clarity of Workflow Manager to demonstrate how one can build powerful batch workflows on top of Apache Hadoop. We’re also going to cover future roadmap and vision for both Apache Oozie and Workflow Manager. We will finish off with a live demo of Workflow Manager in action.

Speaker
Artem Ervits, Solutions Engineer, Hortonworks
Clay Baenziger, Hadoop Infrastructure, Bloomberg

Running scheduled, long-running or repetitive workflows on Hadoop clusters, especially secure clusters, is the domain of Apache Oozie. Oozie, however, suffers from XML for job configuration and a dated UI -- very bad usability in all. Apache Ambari, in its quest to make cluster management easier, has branched out to offering views for user services. This talk covers the Ambari Workflow Manager view which provides a GUI to author and visualize Oozie jobs.

To provide an example of Workflow Manager, Oozie jobs for log management and HBase compactions will be demonstrated showing off how easy Oozie can now be and what the exciting future for Oozie and Workflow Manager holds.

Apache Oozie is the long-time incumbent in big data processing. It is known to be hard to use and the interface is not aesthetically pleasing -- Oozie suffers from a dated UI. However, for secure Hadoop clusters, Oozie is the most readily available, obvious and full featured solution.

Apache Ambari is a deployment and configuration management tool used to deploy Hadoop clusters. Ambari Workflow Manager is a new Ambari view that helps address the usability and UI appeal of Apache Oozie.

In this talk, we’re going to leverage the stable foundation of Apache Oozie and clarity of Workflow Manager to demonstrate how one can build powerful batch workflows on top of Apache Hadoop. We’re also going to cover future roadmap and vision for both Apache Oozie and Workflow Manager. We will finish off with a live demo of Workflow Manager in action.

Speaker
Artem Ervits, Solutions Engineer, Hortonworks
Clay Baenziger, Hadoop Infrastructure, Bloomberg

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager (20)

Publicité

Plus par DataWorks Summit (20)

Publicité

Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager

  1. 1. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 April 19, 2018 Artem Ervits – Hortonworks Clay Baenziger – Bloomberg Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
  2. 2. © 2018 Bloomberg Finance L.P. All rights reserved. Poll: • Who here uses Oozie? — In production? — Do you use HUE with Oozie? — How many workflows have you in production? 1-10? 10-50? 50+? — How many actions does the largest workflow contain? 1-10? 10-50? 50+? — Do you use Oozie with (or want to)? HBase? Spark? Python? Deployment Automation? • Do you like XML? — Do you have a favorite editor for Oozie workflows?
  3. 3. © 2018 Bloomberg Finance L.P. All rights reserved. Open Source Workflow Managers • Apache Airflow (Incubating) • Luigi by Spotify • Azkaban by LinkedIn • (And of course) Apache Oozie
  4. 4. © 2018 Bloomberg Finance L.P. All rights reserved. Introduction to Oozie • Oozie is a workflow scheduler system to manage Apache Hadoop jobs. • Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions. • Oozie coordinator jobs are recurrent Oozie workflow jobs triggerd by time and data availability. • Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as system specific jobs out of the box. • Oozie is a scalable, reliable and extensible system. - Paraphrased from http://oozie.apache.org Actions: • Map/Reduce • Hive • Pig • HDFS • Java • Shell • Spark • Sub-Workflow • E-Mail • Decision • Fork • Join
  5. 5. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Release Timeline • 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs. • 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs. • 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials. • 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high availability. • 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead of MR launcher, new actions, code clean up. - Adopted from: Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
  6. 6. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints • Launcher jobs as map tasks • Dated UI • Confusing object model – workflows, coordinators, bundles • Complicated setup • XML • DAG visualization • SLA alerting • Fine grained authorization • Easy access to log files
  7. 7. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints Improvements • Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770 • Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today) • Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339 • Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666 • XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339 • DAG visualization – solved by Oozie 5.0.0, OOZIE-2406 • SLA alerting – since Oozie 4.0.0, OOZIE-1294 • Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196 • Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
  8. 8. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Prior to Release 5.0 • MR launcher job
  9. 9. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Release 5.0 • OYA: OOZIE-1770: Create Oozie Application Master for YARN — Removes MR launcher job • Design Doc
  10. 10. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Documentation – Before Release 5.0 and After Documentation redesign OOZIE-3163: Improve documentation rendering: use fluido skin and better config
  11. 11. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Workflow Visualiztion – Prior to 5.0 and After Jung GraphViz OOZIE-2406: Completely rewrite GraphGenerator code
  12. 12. © 2018 Bloomberg Finance L.P. All rights reserved. Oozie Fluent Job API – Apache Oozie 5.X (Preview) OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
  13. 13. © 2018 Bloomberg Finance L.P. All rights reserved. Apache Ambari Ambari Provides: • Provisioning of a Hadoop Cluster • Management of a Hadoop Cluster • Monitoring of a Hadoop Cluster — A Metrics System for metrics collection — An Alert Framework — A dashboard for monitoring the Hadoop cluster -Paraphrased from http://ambari.apache.org
  14. 14. © 2018 Bloomberg Finance L.P. All rights reserved. Ambari Views • Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web. A "view" is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.” • Key take-aways: — One does not need an Ambari managed (administrated) cluster — Third parties can build views packages to run in the Ambari framework too — Major views available: (YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN ATS) Jobs, (Oozie) Workflow Manager • Alternatives: Cloudera Hue, bespoke applications
  15. 15. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Motivation • Oozie workflows are defined in XML – too verbose — Provide GUI workflow builder and editor — Reduce possibility of user introduced errors — Provide browser based workflow manager • Integration with File Browser (includes S3 support) — Tighter integration with Ambari in future — Can replace existing Oozie web UI • Oozie is hard-coded to display only 25 actions — WFM doesn’t have this limit; tested with 300+ action nodes • Oozie is scalable — Can scale WFM by standing-up multiple Ambari Views servers
  16. 16. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Designer Example Workflow Manager: • Available as an Ambari View • Enables visual editing of Oozie workflows • Integrated with file browser • Reduces user input errors • Minimal input required
  17. 17. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Execution View • Integrated Dashboard with Workflow Manager View • Manage Oozie jobs • Drill down to logs
  18. 18. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Design Component
  19. 19. © 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Dashboard Component Good Documentation: HDP 2.6 – Workflow Manager Basics
  20. 20. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 • Setup • Data Definition • Compactions Workflow Manager Examples with HBase
  21. 21. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Setup Oozie needs HBase Confguration: • Oozie Server Code (to support Hbase delegation tokens) — In libexec (see Server JARs list) — In oozie-site.xml <name>oozie.credentials.credentialclasses</name> <value>hbase=org.apache.oozie.action.hadoop.HbaseCredentials,…</value> </name> • Client Workflow Code: — Add to workflow.xml: <credentials> <credential name=”myhbase_creds” type=”hbase”> […] </credential> </credentials> — All your normal HBase security settings in the credential section • Server JARs: (Copy the following to Oozie’s libexec) — hbase-common.jar — hbase-client.jar — hbase-server.jar — hbase-protocol.jar — hbase-hadoop2-compat.jar
  22. 22. © 2018 Bloomberg Finance L.P. All rights reserved. create_my_table.rb: tables = list tables.select { |table| table.eql?('my_table') } if tables.empty? create 'my_table', {NAME => 'my_col'} end exit HBase – Data Definition HBase Shell: <action name="HBASE-Shell" cred="hbase_creds"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>hbase</exec> <argument>shell</argument> <argument>-n</argument> <argument>create_my_table.rb</argument> </shell> <ok to="do_more_things"/> <error to="fail"/> </action>
  23. 23. © 2018 Bloomberg Finance L.P. All rights reserved. HBase – Compactions HBASE-19528: Major Compaction Tool • Automatically scales compaction to selected number of servers • Requires read ability to /hbase usage: MajorCompactor [-cf <arg>] [-dryRun] -servers <arg> -table <arg> [...] Usage instructions -cf <arg> column families: comma separated eg: a,b,c -dryRun Dry run, will just output a list of regions that require compaction based on parameters passed -minModTime <arg> Compact if store files have modification time < minModTime -servers <arg> Concurrent servers compacting -table <arg> table name ...
  24. 24. © 2018 Bloomberg Finance L.P. All rights reserved. More Resources • Oozie Examples: https://github.com/dbist/oozie-examples • Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html • Artem’s 12 Part Series on WFM: http://bit.ly/2syKUIh • Clay’s Past Oozie presentations: — Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj — HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
  25. 25. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Demo!
  26. 26. © 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Questions?

×