Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
3. The A-Z of Data
MLOps, Natural Language Processing,
Computer Vision, Time-Series Forecasting
17 August – Introduction to MLOps
25 August – Monitoring Machine Learning Models in Production
31 August – From research to product with Hydrosphere
8 September – Kubeflow
DVC / use case webinar and expert panel discussion
5. About Me
Dmitry Spodarets
● Head of R&D & ML competency at VITech
● Founder and chief editor of Data Phoenix
● Active participant of the ODS.ai community
6.
7. Agenda
● What is MLOps?
● Principles and Practices
● ML processes and tools
10. Core phases for ML solution
Experimental
phase
QA
phase
Prod
phase
11. Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
12.
13.
14.
15. The goal of MLOps is to reduce technical friction to get the model from an idea
into production in the shortest possible time with as little risk as possible.
17. MLOps is about agreeing to do ML the right way and then supporting it.
18. A few shared principles will take you a long way…
ML should be collaborative
19. A few shared principles will take you a long way…
ML should be reproducible
20. A few shared principles will take you a long way…
ML should be continuous
21. A few shared principles will take you a long way…
ML should be tested & monitored
22. Continuous X
MLOps is an ML engineering culture that includes the following practices:
● Continuous Integration (CI) extends the testing and validating code and
components by adding testing and validating data and models.
● Continuous Delivery (CD) concerns with delivery of an ML training pipeline
that automatically deploys another the ML model prediction service.
● Continuous Training (CT) is unique to ML systems property, which
automatically retrains ML models for re-deployment.
● Continuous Monitoring (CM) concerns with monitoring production data and
model performance metrics, which are bound to business metrics.
23. And tooling will help implement your process
ML should be collaborative
Shared Infrastructure
24. And tooling will help implement your process
ML should be reproducible
Versioning for Code, Data and Metadata
25. And tooling will help implement your process
ML should be continuous
Machine Learning Pipelines
26. And tooling will help implement your process
ML should be tested & monitored
Model Deployment and Monitoring
29. Machine Learning Process
Research &
Discovery
Data storage
Data
validation
Data extraction
& collection
Data labeling
Model
validation
Model training
Feature engineering /
feature storage
Model
evaluation
Data
preparation
Model
storage
Model serving
Model
optimization
Monitoring
Predictions
34. MLOps Stack
● LF AI & Data landscape - https://landscape.lfai.foundation/
● THE 2020 DATA & AI LANDSCAPE - https://mattturck.com/data2020/
35. MLOps Stack
● All-in-one tools
● CI/CD
● Data & Model registry, Tracking Experiments.
● Model serving
● Model monitoring
36. Jupyter Notebooks
Notebooks have become fundamental to data science.
Problems with notebooks:
● Hard to version
● Very hard to test
● Out-of-order execution artifacts
Hard to run long or distributed
tasks
37. Netflix bases all ML workflows on Jupyter Notebooks
https://netflixtechblog.com/notebook-innovation-591ee3221233