From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly

1
From idea to production in a day
Leveraging Azure ML and Streamlit
to build and user test machine learning
ideas quickly
Florian Roscheck
PyCon DE & PyData Berlin 2024

4
How do we use it
to build + test
quickly?
What is our tech
stack?
What are we
building?

5
Hi, I’m Florian!
Sr. Data Scientist
Florian Roscheck
• Sr. Data Scientist at Henkel
• Instructor for Apache Spark
with 7k+ students
• Vice President NumFOCUS
Affiliated Project Selection Committee
• Active on LinkedIn

6
WHAT TO BUILD
IN ONE DAY?
A Minimum
Viable Product
• Enough features to be usable
• Ability to collect user feedback

7
WHAT TO BUILD
IN ONE DAY?
A Minimum
Viable Product
BUILD
M
E
A
-
S
U
R
E
LEARN
To learn about users quickly,
we want to implement
build-measure-learn loop
To make users happier over time,
we aim to create data flywheel

9
Ready?
Data not in
place
Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration

10
BUILD
M
E
A
-
S
U
R
E
LEARN
Data not in
place
Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration
GET DATA
BEFOREHAND

11
HOW TO BUILD IN ONE DAY?
A Time-Saving Stack Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration
Azure ML
Notebooks
Automated ML on Azure
Streamlit
Azure Application Insights
+ Streamlit

13
Example: Trash Recognizer App
bottle
• Customer: Waste management company
• Need: Want to evaluate computer vision
solutions for recognizing trash
• Idea: Waste management professionals
manually evaluate performance through app
with feedback functionality

14
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

15
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

16
Training Data: TACO Trash Image Dataset
• TACO: Trash Annotations in Context
• Dataset of 1.5k images with 4.7k+ annotations
• Annotations for 60 categories, incl. backgrounds
• Open source
Proença, P
. F., & Simões, P
. (2020). TACO: Trash Annotations in Context for
Litter Detection. arXiv Preprint arXiv:2003.06975.
tacodataset.org Source: tacodataset.org

18
What is Azure ML?
• Cloud-based ML platform by Microsoft
• Run ad-hoc analyses with
Jupyter Notebooks
• Run and track machine learning
experiments through tight integration
with MLFlow
• Version data and models
• Build complex and reproducible modeling
pipelines
• Deploy models as API
Screenshot of Azure ML web app

19
Basics of Getting Data
Data Asset
TACO-annotations
• Azure ML-managed
• Like a mask for files
• Sharable
• Version controlled
• Interactively explorable
TACO
GitHub repo
!
Azure ML Workspace
Azure blob
storage
Azure ML Notebook
0_prepare_dataset.ipynb
• Like Jupyter Notebook
• Managed environment
• Sharable
• Runs on compute in
workspace
Reproducible
environment
Collaboration-
ready
ENABLERS

21
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

24
Automated Machine Learning
on Azure ML
• Automated ML: Try different models and hyperparameters
that are automatically selected
• We have very little time for modeling!
• Depending on data and problem type, automated machine
learning can provide a reasonable starting point for
modeling with a high return on time investment
• Azure ML offers automated ML pipelines for several
common tasks, incl. classification, regression, forecasting,
NLP
, and computer vision
Modeling time
saver
ENABLERS

25
Setting Up AutoML Through Code
Create compute cluster
1
Define training job
2
Submit job to compute cluster
3
Azure ML Notebooks
1_training.ipynb
Azure ML Workspace
TACO-annotations
TACO-training

26
Compute Cluster Creation Tips & Tricks
Save Money
Shut down unused instances
120 seconds to auto shutdown
Pick auto-evictable machine
(own case: 80% cheaper)
4 experiments
can run in parallel
Tesla T4 GPU w/ 16 GB memory,
56 GB RAM, 8 vCPUs,
but many options available!
Pick a Fitting Compute

27
Increasing Efficiency for
Automated ML on Azure
• Set ML parameters based on your data science knowledge
• Train/test/validation split, cross-validation, etc.
• Hyperparameter selection strategy
• Restrict hyperparameter search space
• Set job limits
• Max no. of trials
• Max runtime per trial or of all trials
• Termination based on score

29
Our annotations, linked to the job
MLFlow model!

30
Models ordered by performance
Azure AutoML experimented with
a single model type

32
How to Dig Deeper
• Metrics are comprehensive and look
great – but what are we looking at?
• More details in the logs:
• Tip: Read Azure ML documentation!
• You still need data science knowledge to
understand what Azure ML is doing here.
Section of std_log.txt file in Outputs + logs tab

34
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

35
The Power of ONNX
for Model Packaging
• Great: Azure ML packaged model in MLFlow format
• The Issue: Tight MLFlow model dependencies restrict platforms where
model can be used
• 204 (!) pinned dependencies, incl. 31 Azure-specific packages
• Experienced issues installing some (azureml-dataprep-native)
on macOS (M1)
• The Solution: Use ONNX model file (byproduct of Azure AutoML training)
and use it with a single dependency: onnxruntime
• ONNX (Open Neural Network Exchange): Open standard for deep
learning models, makes models work across frameworks
• ONNX Runtime: Cross-platform, open source ML model accelerator
You can now use your AutoML-trained model outside of Azure!
Flexible model
deployment
ENABLERS

36
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

37
Building an App with Streamlit
• Streamlit is an open-source app framework for creating
data-based web apps in Python
• My experience with Streamlit:
• The Good: Very easy and fast to code and use, apps
look great and work – Wow!
• The Good-to-Know: Complex workflows with state
management harder to program, may be perceived as
slow by users in comparison to “professional” web apps
• Streamlit is perfect for getting a user-facing app off the
ground and testing your data-based product ideas!
Streamlit logo, see streamlit.io

39
Streamlit App Blueprint
Trash Recognizer
Upload image(s)
Detected Trash
- 2 items for yellow trash can
- 1 item for blue trash can
No trash detected.
Detected Trash
- 1 item for other trash can
[Model + data description] Load ONNX model
1
Preprocess images
2
Run model inference
3
Postprocess images
4
What the user sees What the app does
Display results
5

40
Easy-to-use
interface
ENABLERS
Movie

41
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

42
Thanks,
Data Science
Engineering Team!

43
Deployment Pipeline
• Henkel Data Science Engineering team
developed pipeline for one-click
deployment of data science infrastructure,
incl. Streamlit apps, on secure Azure cloud
infrastructure
• Open sourced via article series
“Kickstarting Data Science Projects in
Azure DevOps” by Roberto Alonso
• Part 1 and 2 already available on
Henkel Data & Analytics Blog medium.com/henkel-data-and-analytics
medium.com/henkel-data-and-analytics

45
BUILD
M
E
A
-
S
U
R
E
LEARN
1 Get Data
2
Train
Model
3 Build App
4
Deploy App
with Model
5
Collect
Feedback
Our Plan

46
Collecting Feedback
streamlit-feedback
Azure
Application Insights
Python logging
Azure Dashboards
Open-source feedback
plugin for Streamlit
Use AzureLogHandler through
opencensus logging extension
Collect logs from
application, query with
Kusto language
Interactive live dashboards on
Azure for application metrics

48
Easy and fast
measurement
ENABLERS

49
bottle
BUILD
M
E
A
-
S
U
R
E
LEARN
Reproducible
environment
Collaboration-
ready
Modeling
time saver
Flexible model
deployment
Easy and fast
measurement
Easy-to-use
interface
Learning
culture

50
Code, Slides, Details
PyData team + sponsors, Henkel, incl. Henkel Data Science CoE team,
Open-source contributors for TACO, ONNX, ONNX Runtime, Streamlit,
streamlit-feedback, Streamlit for reaching out before talk
Learn More
• Build-Measure-Learn Loop: The Lean Startup | Methodology
• Data Flywheel: Data Flywheel: Scaling a world-class data strategy
• Dataset: Tacodataset.org
• Automated Machine Learning on Azure: What is automated ML?
• ONNX: ONNX Runtime, ONNX File Format
• Streamlit: Get started with Streamlit, streamlit-feedback
• Azure Tricks for Data Science: Henkel Data & Analytics Blog
• Logging to Azure from Python: Monitor Python applications
• Azure Dashboards: Dashboards of Azure Log Analytics data
• A similar project: Instance Segmentation with Azure Machine Learning github.com/flrs/build_and_test_ml_quickly
Thanks
Photo credits, in order of appearance: Greg Rakozy, Canva Studio, Sewupari Studio, Massimo Botturi, Charlotte Coneybeer, Desola Landre Ologun, Studio Saiz, Claudio
Schwarz, NASA, Alena Darmel, Anna Shvetz, Vadim B, The Lucky Neko, Visual Tag Mx; User icon from “Redefining Women” icon collection by Iconathon

51
What are
your questions?
Sr. Data Scientist
linkedin.com/in/florianroscheck
github.com/flrs
Florian Roscheck
Let’s connect!

From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly

Recommandé

Recommandé

Contenu connexe

Similaire à From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly

Similaire à From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly (20)

Dernier

Dernier (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly