SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Intro to Airflow
Agenda
● Objective
● Workflow Concept
● What is Airflow
● DAG
● Task, Operator, Dependencies
● Workflow Airflow
● User Interface
● Hands On
Pre-Test
1. What is workflow ?
2. How we scheduled workflow?
● a sequence of tasks
● started on a schedule or triggered by an event
● frequently used to handle data / big data processing pipeline
Typical Workflow
1. download data from source
2. send data somewhere else to process
3. Monitor when the process is completed
4. Get the result and generate the report
5. Send the report out by email
Workflow Concept
4
● Writing a script to pull data from database and send it to HDFS to process.
● Schedule the script as a cronjob.
Traditional Workflow
5
Traditional Workflow (2)
6
Then Become Problem
7
Scalability in terms of managing → How do you manually manage scripts & Cron expressions for
hundreds of workflows?
Scalability in terms of execution → For the consideration of performance, you may want to run
your jobs on multiple worker nodes, how do you manage them?
Environment Dependencies → Different jobs may have different dependencies, e.g., Spark, or
network proxy, etc.
Connections to different systems → like RDBMS, AWS, Hive, HDFS, etc. All of them come together
with configurations like host address, port, id, password, schema, etc. How to manage them in a
centralised fashion?
Monitoring → How do we monitor the status of each step? Which batch job failed? Due to which
step? For what reason?
Re-running → How can we re-run a specific step? Manually do it or make ad-hoc change on the
script? Neither is ideal.
8
Apache Airflow is a platform to
programmaticaly author, schedule and
monitor workflows or data pipelines.
More like cronjobs but different
History Airflow
● The project joined the Apache Software Foundation’s incubation
program in 2016.
● Written in Python
● A workflow (data-pipeline) management system developed by Airbnb
○ A framework to define tasks & dependencies in python
○ Executing, scheduling, distributing tasks accross worker nodes.
○ View of present and past runs, logging feature
○ Extensible through plugins
○ Nice UI, possibility to define REST interface
○ Interact well with database
● Used by more than 200+ companies: Airbnb, Yahoo, Paypal, Intel,
Stripe,…
● 500+ contributors (according to GitHub history)
9
Why Airflow
10
What if? there are tools to manage collection of all the tasks/workflow that you want to run which is
organized and shows the relationship between different tasks and also show the status running
What makes Airflow Great
11
● Can handle upstream/downstream
dependencies gracefully (Example:
upstream missing tables)
● Easy to reprocess historical jobs by
date, or re-run for specific intervals
● Jobs can pass parameters to other
jobs downstream
● Handle errors and failures gracefully.
Automatically retry when a task fails.
● Ease of deployment of workflow
changes (continuous integration)
● Integrations with a lot of infrastructure
(Hive, Presto, Druid, AWS, Google
cloud, etc)
● Data sensors to trigger a DAG when
data arrives
● Job testing through airflow itself
● Accessibility of log files and other
metadata through the web GUI
● Implement trigger rules for tasks
● Monitoring all jobs status in real time
plus Email alerts
● Community support
Airflow Use Case Implementation
12
● Data warehousing: cleanse, organize, data quality check, and publish data into our
growing data warehouse
● Machine Learning: automate machine learning workflows
● Growth analytics: compute metrics around guest and host engagement as well as
growth accounting
● Experimentation: compute A/B testing experimentation frameworks logic and
aggregates
● Email targeting: apply rules to target and engage users through email campaigns
● Sessionization: compute clickstream and time spent datasets
● Search: compute search ranking related metrics
● Data infrastructure maintenance: database scrapes, folder cleanup, applying data
retention policies, …
● Workflows in Airflow are collections of tasks that have directional dependencies.
● Airflow uses DAG to represent a workflow.
● The graph is enforced to be acyclic so that there are no circular dependencies that can cause
infinite execution loops
● Each DAG has a set of properties:
○ dag_id, a unique identifier amongst all DAGs
○ start_date, the point in time at which the DAG’s tasks are to begin executing
○ schedule_interval, how often the tasks are to be executed.
● Nodes: A step in the data pipeline process/Task.
● Edges: The dependencies or relationships other between nodes.
DAG (Directed Acyclic Graph)
13
Task Operator
14
An Operator is conceptually a template for a predefined Task, that you can just define declaratively inside your DAG
● BashOperator - executes a bash command
● PythonOperator - calls an arbitrary Python function
● EmailOperator - sends an email
● SimpleHttpOperator
● MySqlOperator
● PostgresOperator
● MsSqlOperator
● OracleOperator
● JdbcOperator
● DockerOperator
● HiveOperator
● S3FileTransformOperator
● PrestoToMySqlOperator
● SlackAPIOperator
Airflow has a very extensive set of operators available, with some built-in
to the core or pre-installed providers. Some popular operators from core
include:
Task Instances
15
● none: The Task has not yet been queued
for execution (its dependencies are not
yet met)
● scheduled: The scheduler has
determined the Task’s dependencies are
met and it should run
● queued: The task has been assigned to an
Executor and is awaiting a worker
● running: The task is running on a worker
(or on a local/synchronous executor)
● success: The task finished running
without errors
● shutdown: The task was externally
requested to shut down when it was
running
● restarting: The task was externally requested to restart when it
was running
● failed: The task had an error during execution and failed to run
● skipped: The task was skipped due to branching, LatestOnly, or
similar.
● upstream_failed: An upstream task failed and the Trigger Rule
says we needed it
● up_for_retry: The task failed, but has retry attempts left and will
be rescheduled.
● up_for_reschedule: The task is a Sensor that is in reschedule
mode
● sensing: The task is a Smart Sensor
● deferred: The task has been deferred to a trigger
● removed: The task has vanished from the DAG since the run
started
Task Instances (2)
16
● Upstream task: A task that must reach a specified state before a dependent task can run
● Downstream task: A dependent task that cannot run until an upstream task reaches a specified
state
Basic dependencies between Airflow tasks can be set in two ways:
● Using bitshift operators (<< and >>)
● Using the set_upstream and set_downstream methods
Task Dependencies
17
Task Dependencies (2)
18
Dependencies with tuples & list
DAG (Example)
19
● Scheduler orchestrates the execution of jobs on a trigger or
schedule. The Scheduler chooses how to prioritize the running and
execution of tasks within the system.
● Work Queue is used by the scheduler in most Airflow installations to
deliver tasks that need to be run to the Workers.
● Worker processes execute the operations defined in each DAG. In
most Airflow installations, workers pull from the work queue when it
is ready to process a task. When the worker completes the execution
of the task, it will attempt to process more work from the work queue
until there is no further work remaining. When work in the queue
arrives, the worker will begin to process it.
● Database saves credentials, connections, history, and configuration.
The database, often referred to as the metadata database, also stores
the state of all tasks in the system. Airflow components interact with
the database with the Python ORM, SQLAlchemy.
● Web Interface provides a control dashboard for users and
maintainers. Throughout this course you will see how the web
interface allows users to perform tasks such as stopping and starting
DAGs, retrying failed tasks, configuring credentials, The web interface
is built using the Flask web-development microframework.
Airflow Core & Components
20
Airflow Workflow
21
UI Grid View
22
A bar chart and grid representation of the DAG that spans across
time. The top row is a chart of DAG Runs by duration, and below,
task instances. If a pipeline is late, you can quickly see where the
different steps are and identify the blocking ones.
The details panel wlil update when selecting a DAG Run by clicking
on a duration bar:
UI Graph View
23
The graph view is perhaps the most comprehensive. Visualize your DAG’s dependencies and their current
status for a specific run.
UI Tree View
24
The tree view also represents the DAG. If you think your pipeline took a longer time to execute than
expected then you can check which part is taking a long time to execute and then you can work on it.
Documentation
https://airflow.apache.org/docs/
Hands On
● install airflow on docker →
https://www.applydatascience.com/airflow/set-up-airflow-
env-with-docker/
● create DAG →
https://medium.com/analytics-vidhya/getting-started-with
-airflow-b0042cae3ebf
Documentation & Hands On
25
Exercise
26
1) Which graph(s) shown above are directed acyclic graphs (DAG)?
a) Graph 1
b) Graph 2
c) Graph 3
Exercise
27
2) What is the component name that pointed by an arrow? 3) Explain these operators
● BashOperator
● PythonOperator
● EmailOperator
● SimpleHttpOperator
● MySqlOperator
4) Which of the following constructs a DAG that runs task “B”, then “C”, then “A”?
a. A >> C >> B
b. B >> C >> A
c. C >> A >> B
d. B >> A >> C
5) Write the dependencies from this DAG Graph
Exercise
28
6) What do the Airflow Workers do?
Exercise
29

Contenu connexe

Tendances

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow ArchitectureGerard Toonstra
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow IntroductionLiangjun Jiang
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_onpko89403
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowPyData
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in ProductionRobert Sanders
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Tendances (20)

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Similaire à Airflow Intro-1.pdf

Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Noam Elfanbaum
 
Managing transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache AirflowManaging transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache AirflowMichael Ghen
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxJohn J Zhao
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Kaxil Naik
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaoadaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaolyvanlinh519
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
 
airflow web UI and CLI.pptx
airflow web UI and CLI.pptxairflow web UI and CLI.pptx
airflow web UI and CLI.pptxVIJAYAPRABAP
 
airflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxairflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxVIJAYAPRABAP
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsxManKD
 
Data Engineer's Lunch #44: Prefect
Data Engineer's Lunch #44: PrefectData Engineer's Lunch #44: Prefect
Data Engineer's Lunch #44: PrefectAnant Corporation
 
Serverless - DevOps Lessons Learned From Production
Serverless - DevOps Lessons Learned From ProductionServerless - DevOps Lessons Learned From Production
Serverless - DevOps Lessons Learned From ProductionSteve Hogg
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlcAlexey Tokar
 
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Quantyca - Data at Core
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...NETWAYS
 

Similaire à Airflow Intro-1.pdf (20)

Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 
Managing transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache AirflowManaging transactions on Ethereum with Apache Airflow
Managing transactions on Ethereum with Apache Airflow
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiaoadaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
adaidoadaoap9dapdadadjoadjoajdoiajodiaoiao
 
Airflow 4 manager
Airflow 4 managerAirflow 4 manager
Airflow 4 manager
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
airflow web UI and CLI.pptx
airflow web UI and CLI.pptxairflow web UI and CLI.pptx
airflow web UI and CLI.pptx
 
Dataflow.pptx
Dataflow.pptxDataflow.pptx
Dataflow.pptx
 
airflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptxairflowpresentation1-180717183432.pptx
airflowpresentation1-180717183432.pptx
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsx
 
Data Engineer's Lunch #44: Prefect
Data Engineer's Lunch #44: PrefectData Engineer's Lunch #44: Prefect
Data Engineer's Lunch #44: Prefect
 
GoDocker presentation
GoDocker presentationGoDocker presentation
GoDocker presentation
 
Spring batch overivew
Spring batch overivewSpring batch overivew
Spring batch overivew
 
Serverless - DevOps Lessons Learned From Production
Serverless - DevOps Lessons Learned From ProductionServerless - DevOps Lessons Learned From Production
Serverless - DevOps Lessons Learned From Production
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlc
 
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
Create a One Click Migration (OCM) process to Automate Repeatable Infrastruct...
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
OSMC 2018 | Stream connector: Easily sending events and/or metrics from the C...
 

Dernier

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 

Dernier (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 

Airflow Intro-1.pdf

  • 2. Agenda ● Objective ● Workflow Concept ● What is Airflow ● DAG ● Task, Operator, Dependencies ● Workflow Airflow ● User Interface ● Hands On
  • 3. Pre-Test 1. What is workflow ? 2. How we scheduled workflow?
  • 4. ● a sequence of tasks ● started on a schedule or triggered by an event ● frequently used to handle data / big data processing pipeline Typical Workflow 1. download data from source 2. send data somewhere else to process 3. Monitor when the process is completed 4. Get the result and generate the report 5. Send the report out by email Workflow Concept 4
  • 5. ● Writing a script to pull data from database and send it to HDFS to process. ● Schedule the script as a cronjob. Traditional Workflow 5
  • 7. Then Become Problem 7 Scalability in terms of managing → How do you manually manage scripts & Cron expressions for hundreds of workflows? Scalability in terms of execution → For the consideration of performance, you may want to run your jobs on multiple worker nodes, how do you manage them? Environment Dependencies → Different jobs may have different dependencies, e.g., Spark, or network proxy, etc. Connections to different systems → like RDBMS, AWS, Hive, HDFS, etc. All of them come together with configurations like host address, port, id, password, schema, etc. How to manage them in a centralised fashion? Monitoring → How do we monitor the status of each step? Which batch job failed? Due to which step? For what reason? Re-running → How can we re-run a specific step? Manually do it or make ad-hoc change on the script? Neither is ideal.
  • 8. 8 Apache Airflow is a platform to programmaticaly author, schedule and monitor workflows or data pipelines. More like cronjobs but different
  • 9. History Airflow ● The project joined the Apache Software Foundation’s incubation program in 2016. ● Written in Python ● A workflow (data-pipeline) management system developed by Airbnb ○ A framework to define tasks & dependencies in python ○ Executing, scheduling, distributing tasks accross worker nodes. ○ View of present and past runs, logging feature ○ Extensible through plugins ○ Nice UI, possibility to define REST interface ○ Interact well with database ● Used by more than 200+ companies: Airbnb, Yahoo, Paypal, Intel, Stripe,… ● 500+ contributors (according to GitHub history) 9
  • 10. Why Airflow 10 What if? there are tools to manage collection of all the tasks/workflow that you want to run which is organized and shows the relationship between different tasks and also show the status running
  • 11. What makes Airflow Great 11 ● Can handle upstream/downstream dependencies gracefully (Example: upstream missing tables) ● Easy to reprocess historical jobs by date, or re-run for specific intervals ● Jobs can pass parameters to other jobs downstream ● Handle errors and failures gracefully. Automatically retry when a task fails. ● Ease of deployment of workflow changes (continuous integration) ● Integrations with a lot of infrastructure (Hive, Presto, Druid, AWS, Google cloud, etc) ● Data sensors to trigger a DAG when data arrives ● Job testing through airflow itself ● Accessibility of log files and other metadata through the web GUI ● Implement trigger rules for tasks ● Monitoring all jobs status in real time plus Email alerts ● Community support
  • 12. Airflow Use Case Implementation 12 ● Data warehousing: cleanse, organize, data quality check, and publish data into our growing data warehouse ● Machine Learning: automate machine learning workflows ● Growth analytics: compute metrics around guest and host engagement as well as growth accounting ● Experimentation: compute A/B testing experimentation frameworks logic and aggregates ● Email targeting: apply rules to target and engage users through email campaigns ● Sessionization: compute clickstream and time spent datasets ● Search: compute search ranking related metrics ● Data infrastructure maintenance: database scrapes, folder cleanup, applying data retention policies, …
  • 13. ● Workflows in Airflow are collections of tasks that have directional dependencies. ● Airflow uses DAG to represent a workflow. ● The graph is enforced to be acyclic so that there are no circular dependencies that can cause infinite execution loops ● Each DAG has a set of properties: ○ dag_id, a unique identifier amongst all DAGs ○ start_date, the point in time at which the DAG’s tasks are to begin executing ○ schedule_interval, how often the tasks are to be executed. ● Nodes: A step in the data pipeline process/Task. ● Edges: The dependencies or relationships other between nodes. DAG (Directed Acyclic Graph) 13
  • 14. Task Operator 14 An Operator is conceptually a template for a predefined Task, that you can just define declaratively inside your DAG ● BashOperator - executes a bash command ● PythonOperator - calls an arbitrary Python function ● EmailOperator - sends an email ● SimpleHttpOperator ● MySqlOperator ● PostgresOperator ● MsSqlOperator ● OracleOperator ● JdbcOperator ● DockerOperator ● HiveOperator ● S3FileTransformOperator ● PrestoToMySqlOperator ● SlackAPIOperator Airflow has a very extensive set of operators available, with some built-in to the core or pre-installed providers. Some popular operators from core include:
  • 15. Task Instances 15 ● none: The Task has not yet been queued for execution (its dependencies are not yet met) ● scheduled: The scheduler has determined the Task’s dependencies are met and it should run ● queued: The task has been assigned to an Executor and is awaiting a worker ● running: The task is running on a worker (or on a local/synchronous executor) ● success: The task finished running without errors ● shutdown: The task was externally requested to shut down when it was running ● restarting: The task was externally requested to restart when it was running ● failed: The task had an error during execution and failed to run ● skipped: The task was skipped due to branching, LatestOnly, or similar. ● upstream_failed: An upstream task failed and the Trigger Rule says we needed it ● up_for_retry: The task failed, but has retry attempts left and will be rescheduled. ● up_for_reschedule: The task is a Sensor that is in reschedule mode ● sensing: The task is a Smart Sensor ● deferred: The task has been deferred to a trigger ● removed: The task has vanished from the DAG since the run started
  • 17. ● Upstream task: A task that must reach a specified state before a dependent task can run ● Downstream task: A dependent task that cannot run until an upstream task reaches a specified state Basic dependencies between Airflow tasks can be set in two ways: ● Using bitshift operators (<< and >>) ● Using the set_upstream and set_downstream methods Task Dependencies 17
  • 20. ● Scheduler orchestrates the execution of jobs on a trigger or schedule. The Scheduler chooses how to prioritize the running and execution of tasks within the system. ● Work Queue is used by the scheduler in most Airflow installations to deliver tasks that need to be run to the Workers. ● Worker processes execute the operations defined in each DAG. In most Airflow installations, workers pull from the work queue when it is ready to process a task. When the worker completes the execution of the task, it will attempt to process more work from the work queue until there is no further work remaining. When work in the queue arrives, the worker will begin to process it. ● Database saves credentials, connections, history, and configuration. The database, often referred to as the metadata database, also stores the state of all tasks in the system. Airflow components interact with the database with the Python ORM, SQLAlchemy. ● Web Interface provides a control dashboard for users and maintainers. Throughout this course you will see how the web interface allows users to perform tasks such as stopping and starting DAGs, retrying failed tasks, configuring credentials, The web interface is built using the Flask web-development microframework. Airflow Core & Components 20
  • 22. UI Grid View 22 A bar chart and grid representation of the DAG that spans across time. The top row is a chart of DAG Runs by duration, and below, task instances. If a pipeline is late, you can quickly see where the different steps are and identify the blocking ones. The details panel wlil update when selecting a DAG Run by clicking on a duration bar:
  • 23. UI Graph View 23 The graph view is perhaps the most comprehensive. Visualize your DAG’s dependencies and their current status for a specific run.
  • 24. UI Tree View 24 The tree view also represents the DAG. If you think your pipeline took a longer time to execute than expected then you can check which part is taking a long time to execute and then you can work on it.
  • 25. Documentation https://airflow.apache.org/docs/ Hands On ● install airflow on docker → https://www.applydatascience.com/airflow/set-up-airflow- env-with-docker/ ● create DAG → https://medium.com/analytics-vidhya/getting-started-with -airflow-b0042cae3ebf Documentation & Hands On 25
  • 26. Exercise 26 1) Which graph(s) shown above are directed acyclic graphs (DAG)? a) Graph 1 b) Graph 2 c) Graph 3
  • 27. Exercise 27 2) What is the component name that pointed by an arrow? 3) Explain these operators ● BashOperator ● PythonOperator ● EmailOperator ● SimpleHttpOperator ● MySqlOperator
  • 28. 4) Which of the following constructs a DAG that runs task “B”, then “C”, then “A”? a. A >> C >> B b. B >> C >> A c. C >> A >> B d. B >> A >> C 5) Write the dependencies from this DAG Graph Exercise 28
  • 29. 6) What do the Airflow Workers do? Exercise 29