SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
End-to-End ML pipelines with Beam,
Flink, TensorFlow, and Hopsworks
Theofilos Kakantousis
Software Engineer & COO
@theofiloskak
3rd Apache Beam meetup, Stockholm, July 2019
Agenda
1. End-to-end ML pipelines
2. What is Hopsworks
3. Beam Portable Runner with Flink in Hopsworks
4. ML Pipelines with Beam and TensorFlow Extended
5. Demo
ML Pipelines
End-to-end ML Pipeline
Data
Prep
Data
Ingest
Train Serve
Online
Monitor
Distributed Storage
Raw
Data
Data
Lake
Resource Manager
Typical Feature Store pipeline
Hopsworks Timeline
“If you’re working with big data and Hadoop, this one paper could repay your
investment in the Morning Paper many times over.... HopsFS is a huge win.”
- Adrian Colyer, The Morning Paper
World’s first Hadoop
platform to support
GPUs-as-a-Resource
World’s fastest
Hadoop Published
at USENIX FAST
with Oracle and
Spotify
World’s First
Open Source Feature
Store for Machine
Learning
World’s First
Distributed Filesystem to
store small files in metadata
on NVMe disks
Winner of IEEE
Scale Challenge
2017
with HopsFS -
1.2m ops/sec
2017
World’s most scalable
Filesystem with
Multi Data Center
Availability
2018 2019
World’s first
Open Source Platform
to support TensorFlow
Extended (TFX) on
Beam
What is Hopsworks
What is Hopsworks
True Project-based multi-tenancy
Proj-XProject-42
Kafka TopicResources /Projs/My/Data
Project-AllExperimentsModels
Experiments
Hopsworks REST API
● Manage Hopsworks resources via the REST API
○ Projects
○ Datasets
○ Jobs
○ FeatureStore
○ Experiments
○ ModelServing
○ Kafka
○ ...
● Documented with Swagger and hosted on SwaggerHub
○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
Beam on Hopsworks
Beam Portable Runner
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflo
w
Execution
1. End users: who want to
write pipelines in a
language that’s familiar.
2. SDK writers: who want
to make Beam concepts
available in new
languages.
3. Runner writers: who
have a distributed
processing environment
and want to support
Beam pipelines
https://s.apache.org/apache-beam-project-overview
Beam-as-a-Service in Hopsworks
● Develop Beam pipelines in Python from Jupyter notebooks
● Tooling to simplify deployment and execution
● Manage lifecycle of Beam Portability JobService(JobServer)
● Logging and monitoring of Beam jobs
● SDK Workers(harness) with conda env
● Scalable execution on Flink/Spark clusters
Hopsworks API
● hops-util-py (Python) and HopsUtil(Java)
● Simplifies development:
○ Sets security config
○ Discover cluster services
○ Helper methods for the Hopsworks REST API
○ ML Experiments
● Manage Beam Runners and Job Service
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
Beam Portability - Process vs Docker
● Docker:
○ Build image with all your
dependencies
○ Update or modify? build new
containers
○ Additional infrastructure
components
● Process:
○ Install dependencies on all
servers
○ Management of
dependencies?
○ Easy to update and modify
libraries
○ Challenge? Multi-tenancy &
keep servers in sync
● SDK Worker: SDK-provided program responsible for executing user code
● How to manage the user’s dependencies, libraries, … ?
First class Python: Conda in the Cluster
Conda Repo
Hopsworks Cluster
No need to write
Dockerfiles
Jupyter dashboard in Hopsworks
● Manage notebook
settings from
dashboard
Jupyter dashboard in Hopsworks
● Execute a Beam Python
pipeline
● With the Python kernel
either in a docker
container managed by
Kubernetes or as a local
Python process.
● In a PySpark executor in
the cluster.
Notebooks as Beam jobs in ML pipelines
Beam portability architecture in Hopsworks
https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019
Beam portability architecture in Hopsworks
HopsFS
Local/YARN/K8s
Hopsworks
Session cluster on YARN
Beam portability architecture in Hopsworks
Local/YARN/K8s
Compiled and shipped with
HopsFS dependencies
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
# creates and starts runner
# localizes Job Service jar file
from HopsFS
# Provides arguments (ports,
artifacts_dir, etc.)
# Start Job Service and
returns host,port
# Job Service automatically
shuts down when Python
pipeline shuts down
host,port = start_runner()
Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Python conda env and
Hopsworks env
variables are set for
SDKWorker
Hopsworks API
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
def start_runner(
runner="flink",
runner_name="session",
runner_config=config)
def start_jobservice(
runner = "Resources",
artifacts_dir="Resources",
job_server_path="hdfs:///user/flink/",
job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar",
sdk_worker_parallelism=1)
hops.beam.start_runner()
hops.beam.start_jobservice()
Logging
● Flink JobManager and TaskManager
● Beam Job service
○ Local mode - logs in project’s Jupyter staging dir
○ Cluster - logs in the PySpark container where process is running.
● SDK Worker
○ Logs are in the Flink TaskManager container
● Collect and visualize with the ELK stack
○ Logs are accessible only by project members
Logging
Secure Beam with TLS certificates
TensorFlow Extended (TFX)
Hidden Technical Debt in Machine Learning Systems
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
TensorFlow Extended (TFX)
https://www.tensorflow.org/tfx
TFX on a Flink Cluster with Portable Runner
TFX on a Flink Cluster with Portable Runner
Distributed Deep Learning in Hopsworks
Executor 1 Executor N
Driver
HopsFS (HDFS)TensorBoard Model Serving
Experiments - TensorBoard
● Repeatable
experiments
● Manage
experiments
metadata
● Integration with
Tensorboard
Orchestration
Apache Airflow-as-a-Service
● Airflow available as a
multi-tenant service
in a Hopsworks
● Develop pipelines
with Hopsworks
operators and
sensors
Apache Airflow-as-a-Service
Apache Airflow-as-a-Service - TFX pipeline
●
Putting it all together
Horizontally Scalable ML Pipelines
Raw Data
Event Data
Monitor
HopsFS
Serving
Feature Store /
TFX Transform
Data PrepIngest DeployExperiment /
Train
logs
logs
Metadata Store
External
Model Analysis
FeatureStore
Compatibility...
● Hopsworks-1.0
● Beam 2.13.0
● Flink 1.8.0
● TensorFlow 1.14.0
● TFX 0.13
● TensorFlow Model Analysis 0.13.2
Demo
Conclusions & Future Work
● Summary
○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam
Portable Runner with Flink runner
○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and
TFX
● Future Work
○ Add support for Spark Runner
○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
Contributors
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan,
Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami,
Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer,
Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre
Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
How to get started with Hopsworks?
@hopsworks
Register for a free account at: www.hops.site
Images available for AWS, GCE, Virtualbox.
https://www.logicalclocks.com/
https://github.com/logicalclocks/hopsworks
https://www.meetup.com/HopsML-Stockholm
Reach us
@logicalclocks

Contenu connexe

Tendances

OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebula Project
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebula Project
 

Tendances (20)

KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Kubernetes The New Research Platform
Kubernetes The New Research PlatformKubernetes The New Research Platform
Kubernetes The New Research Platform
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
 
Composable infrastructure try valence
Composable infrastructure try valenceComposable infrastructure try valence
Composable infrastructure try valence
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
 
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Managing microservices with istio on OpenShift - Meetup
Managing microservices with istio on OpenShift - MeetupManaging microservices with istio on OpenShift - Meetup
Managing microservices with istio on OpenShift - Meetup
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streaming
 
Notary - container signing
Notary - container signingNotary - container signing
Notary - container signing
 
Cloud Native Applications on Kubernetes: a DevOps Approach
Cloud Native Applications on Kubernetes: a DevOps ApproachCloud Native Applications on Kubernetes: a DevOps Approach
Cloud Native Applications on Kubernetes: a DevOps Approach
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
 
p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 

Similaire à End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.

ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 

Similaire à End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks. (20)

Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data Plane
 
Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paper
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 

Dernier

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Dernier (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 

End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.

  • 1. End-to-End ML pipelines with Beam, Flink, TensorFlow, and Hopsworks Theofilos Kakantousis Software Engineer & COO @theofiloskak 3rd Apache Beam meetup, Stockholm, July 2019
  • 2. Agenda 1. End-to-end ML pipelines 2. What is Hopsworks 3. Beam Portable Runner with Flink in Hopsworks 4. ML Pipelines with Beam and TensorFlow Extended 5. Demo
  • 4. End-to-end ML Pipeline Data Prep Data Ingest Train Serve Online Monitor Distributed Storage Raw Data Data Lake Resource Manager
  • 6. Hopsworks Timeline “If you’re working with big data and Hadoop, this one paper could repay your investment in the Morning Paper many times over.... HopsFS is a huge win.” - Adrian Colyer, The Morning Paper World’s first Hadoop platform to support GPUs-as-a-Resource World’s fastest Hadoop Published at USENIX FAST with Oracle and Spotify World’s First Open Source Feature Store for Machine Learning World’s First Distributed Filesystem to store small files in metadata on NVMe disks Winner of IEEE Scale Challenge 2017 with HopsFS - 1.2m ops/sec 2017 World’s most scalable Filesystem with Multi Data Center Availability 2018 2019 World’s first Open Source Platform to support TensorFlow Extended (TFX) on Beam
  • 9. True Project-based multi-tenancy Proj-XProject-42 Kafka TopicResources /Projs/My/Data Project-AllExperimentsModels Experiments
  • 10. Hopsworks REST API ● Manage Hopsworks resources via the REST API ○ Projects ○ Datasets ○ Jobs ○ FeatureStore ○ Experiments ○ ModelServing ○ Kafka ○ ... ● Documented with Swagger and hosted on SwaggerHub ○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
  • 12. Beam Portable Runner Beam Model: Fn Runners Apache Flink Apache Spark Beam Model: Pipeline Construction Other LanguagesBeam Java Beam Python Execution Execution Cloud Dataflo w Execution 1. End users: who want to write pipelines in a language that’s familiar. 2. SDK writers: who want to make Beam concepts available in new languages. 3. Runner writers: who have a distributed processing environment and want to support Beam pipelines https://s.apache.org/apache-beam-project-overview
  • 13. Beam-as-a-Service in Hopsworks ● Develop Beam pipelines in Python from Jupyter notebooks ● Tooling to simplify deployment and execution ● Manage lifecycle of Beam Portability JobService(JobServer) ● Logging and monitoring of Beam jobs ● SDK Workers(harness) with conda env ● Scalable execution on Flink/Spark clusters
  • 14. Hopsworks API ● hops-util-py (Python) and HopsUtil(Java) ● Simplifies development: ○ Sets security config ○ Discover cluster services ○ Helper methods for the Hopsworks REST API ○ ML Experiments ● Manage Beam Runners and Job Service https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
  • 15. Beam Portability - Process vs Docker ● Docker: ○ Build image with all your dependencies ○ Update or modify? build new containers ○ Additional infrastructure components ● Process: ○ Install dependencies on all servers ○ Management of dependencies? ○ Easy to update and modify libraries ○ Challenge? Multi-tenancy & keep servers in sync ● SDK Worker: SDK-provided program responsible for executing user code ● How to manage the user’s dependencies, libraries, … ?
  • 16. First class Python: Conda in the Cluster Conda Repo Hopsworks Cluster No need to write Dockerfiles
  • 17. Jupyter dashboard in Hopsworks ● Manage notebook settings from dashboard
  • 18. Jupyter dashboard in Hopsworks ● Execute a Beam Python pipeline ● With the Python kernel either in a docker container managed by Kubernetes or as a local Python process. ● In a PySpark executor in the cluster.
  • 19. Notebooks as Beam jobs in ML pipelines
  • 20. Beam portability architecture in Hopsworks https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019
  • 21. Beam portability architecture in Hopsworks HopsFS Local/YARN/K8s Hopsworks Session cluster on YARN
  • 22. Beam portability architecture in Hopsworks Local/YARN/K8s Compiled and shipped with HopsFS dependencies Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py
  • 23. Beam portability architecture in Hopsworks Local/YARN/K8s Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py # creates and starts runner # localizes Job Service jar file from HopsFS # Provides arguments (ports, artifacts_dir, etc.) # Start Job Service and returns host,port # Job Service automatically shuts down when Python pipeline shuts down host,port = start_runner()
  • 24. Beam portability architecture in Hopsworks Local/YARN/K8s Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py Python conda env and Hopsworks env variables are set for SDKWorker
  • 25. Hopsworks API https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util def start_runner( runner="flink", runner_name="session", runner_config=config) def start_jobservice( runner = "Resources", artifacts_dir="Resources", job_server_path="hdfs:///user/flink/", job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar", sdk_worker_parallelism=1) hops.beam.start_runner() hops.beam.start_jobservice()
  • 26. Logging ● Flink JobManager and TaskManager ● Beam Job service ○ Local mode - logs in project’s Jupyter staging dir ○ Cluster - logs in the PySpark container where process is running. ● SDK Worker ○ Logs are in the Flink TaskManager container ● Collect and visualize with the ELK stack ○ Logs are accessible only by project members
  • 28. Secure Beam with TLS certificates
  • 30. Hidden Technical Debt in Machine Learning Systems Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management Data Model Prediction φ(x) https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 32. TFX on a Flink Cluster with Portable Runner
  • 33. TFX on a Flink Cluster with Portable Runner
  • 34. Distributed Deep Learning in Hopsworks Executor 1 Executor N Driver HopsFS (HDFS)TensorBoard Model Serving
  • 35. Experiments - TensorBoard ● Repeatable experiments ● Manage experiments metadata ● Integration with Tensorboard
  • 37. Apache Airflow-as-a-Service ● Airflow available as a multi-tenant service in a Hopsworks ● Develop pipelines with Hopsworks operators and sensors
  • 39. Apache Airflow-as-a-Service - TFX pipeline ●
  • 40. Putting it all together
  • 41. Horizontally Scalable ML Pipelines Raw Data Event Data Monitor HopsFS Serving Feature Store / TFX Transform Data PrepIngest DeployExperiment / Train logs logs Metadata Store External Model Analysis FeatureStore
  • 42. Compatibility... ● Hopsworks-1.0 ● Beam 2.13.0 ● Flink 1.8.0 ● TensorFlow 1.14.0 ● TFX 0.13 ● TensorFlow Model Analysis 0.13.2
  • 43. Demo
  • 44. Conclusions & Future Work ● Summary ○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam Portable Runner with Flink runner ○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and TFX ● Future Work ○ Add support for Spark Runner ○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
  • 45. Contributors Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan, Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami, Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
  • 46. How to get started with Hopsworks? @hopsworks Register for a free account at: www.hops.site Images available for AWS, GCE, Virtualbox. https://www.logicalclocks.com/ https://github.com/logicalclocks/hopsworks https://www.meetup.com/HopsML-Stockholm Reach us @logicalclocks