End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam summit europe 2019)

BERLIN 2019
1. End-to-end ML pipelines
2. What is Hopsworks
3. Beam Portable Runner with Flink in Hopsworks
4. ML Pipelines with Beam and TensorFlow Extended
5. Demo

BERLIN 2019
Data
Prep
Data
Ingest
Train Serve
Online
Monitor
Distributed Storage
Raw Data
Event Data
Data Lake
Resource Manager

BERLIN 2019
this one paper could repay your investment
HopsFS is a huge win.
World’s first Hadoop
platform to support
GPUs-as-a-Resource
World’s fastest Hadoop
Published at USENIX FAST
with Oracle and Spotify
World’s First
Open Source Feature Store
for Machine Learning
World’s First
Distributed Filesystem to store small
files in metadata on NVMe disks
Winner of IEEE
Scale Challenge 2017
with HopsFS - 1.2m
ops/sec
2017
World’s most scalable
Filesystem with
Multi Data Center Availability
2018 2019
World’s first
Open Source Platform to
support TensorFlow
Extended (TFX) on Beam

BERLIN 2019
Proj-XProject-42
Kafka TopicResources /Projs/My/Data
Project-AllCompanyDBModels

BERLIN 2019
● Manage Hopsworks resources via the REST API
○ Projects
○ Datasets
○ Jobs
○ Users
○ FeatureStore
○ Kafka
○ ..
● Documented with Swagger and hosted on SwaggerHub

BERLIN 2019
Beam Model: Fn Runners
Apache Flink Apache Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java Beam Python
Execution Execution
Cloud
Dataﬂow
Execution
1. End users: who want to
write pipelines in a language
that’s familiar.
2. SDK writers: who want to
make Beam concepts
available in new languages.
3. Runner writers: who have a
distributed processing
environment and want to
support Beam pipelines
https://s.apache.org/apache-beam-project-overview

BERLIN 2019
● Develop Beam pipelines in Python from Jupyter notebooks
● Tooling to simplify deployment and execution
● Manage lifecycle of Job Service
● SDK Workers (harness) with conda env
● Scalable execution on Flink clusters

BERLIN 2019
Running Beam jobs on a Flink cluster from a Hopsworks project

BERLIN 2019
● hops-util-py (Python) and HopsUtil(Java)
● Simplify development by:
○ Setting security config
○ Discovering cluster services
○ Helper methods for the Hopsworks REST API
○ ML Experiments
● Manage Beam Job Service
def start_beam_job_service(
flink_session_name,
artifacts_dir="Resources",
job_server_path="hdfs:///user/flink/",
job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar",
sdk_worker_parallelism=1)
https://github.com/logicalclocks/hops-util-py/ https://github.com/logicalclocks/hops-util

BERLIN 2019
● Docker:
○ Build image with all your
dependencies
○ Update or modify? build new
containers
○ Additional infrastructure
components
● Process:
○ Install dependencies on all
servers
○ Management of dependencies?
○ Easy to update and modify
libraries
○ Challenge? Multi-tenancy &
keep servers in sync
● SDK Worker (Harness): SDK-provided program responsible for
executing user code
● How to manage the user’s dependencies, libraries, … ?

Conda
Repo
Hopsworks Cluster
BERLIN 2019
No need to write
Dockerﬁles

BERLIN 2019
● Manage notebook
settings from
dashboard

BERLIN 2019
● Execute a Beam Python
pipeline
○ With the Python kernel
either in a docker
container managed by
Kubernetes or as a
local Python process.
○ In a PySpark executor
in the cluster.

BERLIN 2019
https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019

BERLIN 2019
HopsFS
Local/YARN/K8s
Hopsworks
Session cluster on YARN

BERLIN 2019
HopsFS
Local/YARN/K8s
# Compiled with HopsFS
dependencies
Hopsworks
hops-util.py

BERLIN 2019
HopsFS
Local/YARN/K8s
# hops-util-py localizes Job Service jar
file from HopsFS
# Provides arguments (ports,
artifacts_dir, etc.)
# Start Job Service and returns
host,port
# Job Service automatically shuts down
when Python pipeline shuts down
Hopsworks
hops-util.py
host,port = start_beam_job_service()
host+“:”+port

BERLIN 2019
HopsFS
Local/YARN/K8s
Hopsworks
hops-util.py
Conda
repo
host+“:”+port
# Python conda env and
Hopsworks env variables
are set in SDK Worker
script

BERLIN 2019
● Flink JobManager and TaskManager
● Beam Job service
○ Local mode - logs in project’s Jupyter staging dir
○ Cluster - logs in the PySpark container where process is running.
● SDK Worker
○ Logs are in the Flink TaskManager container
● Collect and visualize with the ELK stack
○ Logs are accessible only by project members

BERLIN 2019
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

BERLIN 2019
https://www.tensorflow.org/tfx

BERLIN 2019
Executor 1 Executor N
Driver
HopsFS (HDFS)TensorBoard Model Serving

BERLIN 2019
● Repeatable experiments
● Manage experiments metadata
● Integration with Tensorboard

BERLIN 2019
● Airflow available as a
multi-tenant service in
a Hopsworks
● Develop pipelines with
Hopsworks operators
and sensors

BERLIN 2019
Raw Data
Event Data
Monitor
Serving
Feature Store /
TFX Transform
Data PrepIngest DeployExperiment /
Train
logs
logs
Metadata Store
External
Model Analysis
FeatureStore

BERLIN 2019
● Beam 2.13.0
● Flink 1.8.0
● TensorFlow 1.14.0
● TFX 0.14.0dev
● TensorFlow Model Analysis 0.13.2

BERLIN 2019
● Summary
○ Hopsworks v1.0 the first on-prem open source horizontally
scalable platform to support Beam Portable Runner with Flink
runner
○ Develop and Manage lifecycle of horizontally scalable
End-to-End ML Pipelines with Beam and TFX
● Future Work
○ Add support for Spark Runner
○ Export metrics in InfluxDB and monitor with Grafana

https://github.com/logicalclocks
BERLIN 2019

End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam summit europe 2019)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam summit europe 2019)

Similaire à End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam summit europe 2019) (20)

Dernier

Dernier (20)

End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam summit europe 2019)