Apache Airflow is a Python-based workflow management system that can be used to actively monitor and execute transactions on blockchain networks like Ethereum. This presentation is an introduction to Apache Airflow followed by a demonstration of a production deployment. Apache Airflow is an excellent tool for anyone already familiar with Python. Its ability to process jobs and handle errors makes it a good choice tool for managing activity on blockchain networks. The goal of this talk is to demonstrate how Apache Airflow can be used for environmental scanning and batch processing transactions. The demonstration will cover using Airflow and Python for monitoring and executing ERC20 token transactions on the Ethereum blockchain.
2. Managing
Transactions on
Ethereum
with Apache Airflow
Current:
● Mining Pool Operator
● Ph.D. Student at Drexel University
Previous:
● Data Architect at Benefits Data Trust
● Data Platform Engineer at Cohealo
● Systems Engineer at Brandeis University
● Introduction to Ethereum
● Introduction to Apache Airflow
○ Core Ideas
● Airflow in Action
○ Complete Example
● Journey to Airflow
3. Ethereum is a Public Computing Platform
● Ethereum can be viewed as a transaction-based state machine
● Begin with a genesis state and incrementally execute
transactions to morph it into some final state
4.
5.
6. Ether (ETH) is the currency for purchasing resources
Ether is meant to be used to pay for running smart contracts,
which are computer programs that run on an emulated computer
called the Ethereum Virtual Machine (EVM)
7.
8. Apache Airflow is a Workflow Management System
● a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want
to run, organized in a way that reflects their relationships and dependencies
● While DAGs describe how to run a workflow, Operators determine what
actually gets done
● Once an operator is instantiated, it is referred to as a task
Airflow is a platform to programmatically author, schedule and
monitor workflows. Workflows are authored using Python.
9. Apache Airflow is a Workflow Management System
Airflow is a platform to programmatically author, schedule and
monitor workflows. Workflows are authored using Python.
10. Apache Airflow is a Workflow Management System
Airflow is a platform to programmatically author, schedule and
monitor workflows. Workflows are authored using Python.
12. Core Ideas: DAG
● a DAG describes how you want
to carry out your workflow
● DAGs are defined in standard
Python files that are placed in
Airflow’s DAG_FOLDER
● You can have as many DAGs as
you want, each describing an
arbitrary number of tasks
● In general, each one should
correspond to a single logical
workflow.
https://airflow.apache.org/concepts.html#core-ideas
13. Core Ideas: Operators
● An operator describes a single task
in a workflow
● Describes what a task does
● In general, if two operators need to
share information, like a filename or
small amount of data, you should
consider combining them into a
single operator
● Airflow does have a feature for
operator cross-communication
called XCom
https://airflow.apache.org/concepts.html#core-ideas
BashOperator - executes a bash command
PythonOperator - calls an arbitrary Python function
EmailOperator - sends an email
SimpleHttpOperator - sends an HTTP request
MySqlOperator, SqliteOperator, PostgresOperator,
MsSqlOperator, OracleOperator, JdbcOperator, etc. -
executes a SQL command
Sensor - waits for a certain time, file, database row, S3 key, ..
14. Core Ideas: Hooks
● Hooks implement a common
interface when possible, and
act as a building block for
operators
● Hooks keep authentication
code and information out of
pipelines, centralized in the
metadata database
https://airflow.apache.org/concepts.html#core-ideas
15. Core Ideas: Tasks and Task Instances
● Once an operator is instantiated, it is referred to
as a “task”
● The instantiation defines specific values when
calling the abstract operator, and the
parameterized task becomes a node in a DAG.
● A task instance represents a specific run of a
task and is characterized as the combination of
a dag, a task, and a point in time
● Task instances also have an indicative state,
which could be “running”, “success”, “failed”,
“skipped”, “up for retry”, etc.
https://airflow.apache.org/concepts.html#core-ideas
16. Centralized Monitoring, Alerting, and Logging
● Airflow is an improvement over running
tasks with CRON because it has features
to support task monitoring, alerting, and
logging
● Task failures can be retried automatically
● Failures can trigger email alerts (or Slack,
Datadog, etc.)
● Logs generated from tasks can be stored
in a S3 or Google Cloud bucket
● Task failures can be easily identified,
investigated, and resolved
39. Relevant
Alternatives
● Apache Nifi
● Apache Beam
● Apache Camel
● Spotify’s Luigi
● Many other awesome projects
Airflow is not a data streaming
solution. Tasks do not move data
from one to the other easily.
Streaming and Batching
40. Apache Airflow
for IT Stakeholders
1. Integrate with any Information
System using Python
2. Automate the Development of
Workflows (Config as Code)
3. Centralize Workflow
Monitoring, Alerting, Logging