How to leverage Azure ML, automated machine learning, and Streamlit to build and test machine learning apps quickly? Find out about our favorite Hackathon stack and walk away with some code to build and user-test your own machine learning ideas fast.
Experimentation, bringing machine learning ideas in front of users, is essential to innovation. Yet, in our corporate hackathons, our data science team has struggled many times with how to build and deploy user-facing machine learning ideas in just a single day.
Over the past 2+ years, we have developed a routine around using Azure Machine Learning, automated machine learning, and Streamlit to build and user test machine learning ideas quickly. The aim of this talk is to pass on practical, technical knowledge to fellow data scientists about how to leverage this stack to achieve high build and user test speeds.
During the talk, we will walk through the process of building a computer vision system for identifying trash in images via an app using the open-source TACO dataset (http://tacodataset.org/). Working through a Jupyter notebook, we will load the data into Azure Machine Learning and trigger an automated machine learning run on the data. In this context, we will quickly get to know the training and testing metrics available in Azure ML to evaluate the model. We will then download the machine learning model as a file packaged in the open-source ONNX format (https://onnx.ai/). Using the open-source Python web application framework Streamlit (https://github.com/streamlit/streamlit), we will program an application in which users can upload images and embed the machine learning model in it to identify trash in these images. Using a to-be-published infrastructure-as-code pipeline on Azure DevOps, we will deploy the application to the public internet on the Azure platform. From here, users can test it.
The stack and code presented in this talk will enable fellow data scientists to accelerate their data science development, leading to quicker experimentation and, therefore, to faster innovation of products with machine learning at their core.
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
From idea to production in a day – Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly
1. 1
From idea to production in a day
Leveraging Azure ML and Streamlit
to build and user test machine learning
ideas quickly
Florian Roscheck
PyCon DE & PyData Berlin 2024
4. 4
How do we use it
to build + test
quickly?
What is our tech
stack?
What are we
building?
5. 5
Hi, I’m Florian!
Sr. Data Scientist
Florian Roscheck
• Sr. Data Scientist at Henkel
• Instructor for Apache Spark
with 7k+ students
• Vice President NumFOCUS
Affiliated Project Selection Committee
• Active on LinkedIn
6. 6
WHAT TO BUILD
IN ONE DAY?
A Minimum
Viable Product
• Enough features to be usable
• Ability to collect user feedback
7. 7
WHAT TO BUILD
IN ONE DAY?
A Minimum
Viable Product
BUILD
M
E
A
-
S
U
R
E
LEARN
To learn about users quickly,
we want to implement
build-measure-learn loop
To make users happier over time,
we aim to create data flywheel
11. 11
HOW TO BUILD IN ONE DAY?
A Time-Saving Stack Environment
issues
Lost in
modeling
Inappropriate
user interface
No feedback
about use
Difficult
collaboration
Azure ML
Notebooks
Automated ML on Azure
Streamlit
Azure Application Insights
+ Streamlit
18. 18
What is Azure ML?
• Cloud-based ML platform by Microsoft
• Run ad-hoc analyses with
Jupyter Notebooks
• Run and track machine learning
experiments through tight integration
with MLFlow
• Version data and models
• Build complex and reproducible modeling
pipelines
• Deploy models as API
Screenshot of Azure ML web app
19. 19
Basics of Getting Data
Data Asset
TACO-annotations
• Azure ML-managed
• Like a mask for files
• Sharable
• Version controlled
• Interactively explorable
TACO
GitHub repo
!
Azure ML Workspace
Azure blob
storage
Azure ML Notebook
0_prepare_dataset.ipynb
• Like Jupyter Notebook
• Managed environment
• Sharable
• Runs on compute in
workspace
Reproducible
environment
Collaboration-
ready
ENABLERS
24. 24
Automated Machine Learning
on Azure ML
• Automated ML: Try different models and hyperparameters
that are automatically selected
• We have very little time for modeling!
• Depending on data and problem type, automated machine
learning can provide a reasonable starting point for
modeling with a high return on time investment
• Azure ML offers automated ML pipelines for several
common tasks, incl. classification, regression, forecasting,
NLP
, and computer vision
Modeling time
saver
ENABLERS
25. 25
Setting Up AutoML Through Code
Create compute cluster
1
Define training job
2
Submit job to compute cluster
3
Azure ML Notebooks
1_training.ipynb
Azure ML Workspace
TACO-annotations
TACO-training
26. 26
Compute Cluster Creation Tips & Tricks
Save Money
Shut down unused instances
120 seconds to auto shutdown
Pick auto-evictable machine
(own case: 80% cheaper)
4 experiments
can run in parallel
Tesla T4 GPU w/ 16 GB memory,
56 GB RAM, 8 vCPUs,
but many options available!
Pick a Fitting Compute
27. 27
Increasing Efficiency for
Automated ML on Azure
• Set ML parameters based on your data science knowledge
• Train/test/validation split, cross-validation, etc.
• Hyperparameter selection strategy
• Restrict hyperparameter search space
• Set job limits
• Max no. of trials
• Max runtime per trial or of all trials
• Termination based on score
32. 32
How to Dig Deeper
• Metrics are comprehensive and look
great – but what are we looking at?
• More details in the logs:
• Tip: Read Azure ML documentation!
• You still need data science knowledge to
understand what Azure ML is doing here.
Section of std_log.txt file in Outputs + logs tab
35. 35
The Power of ONNX
for Model Packaging
• Great: Azure ML packaged model in MLFlow format
• The Issue: Tight MLFlow model dependencies restrict platforms where
model can be used
• 204 (!) pinned dependencies, incl. 31 Azure-specific packages
• Experienced issues installing some (azureml-dataprep-native)
on macOS (M1)
• The Solution: Use ONNX model file (byproduct of Azure AutoML training)
and use it with a single dependency: onnxruntime
• ONNX (Open Neural Network Exchange): Open standard for deep
learning models, makes models work across frameworks
• ONNX Runtime: Cross-platform, open source ML model accelerator
You can now use your AutoML-trained model outside of Azure!
Flexible model
deployment
ENABLERS
37. 37
Building an App with Streamlit
• Streamlit is an open-source app framework for creating
data-based web apps in Python
• My experience with Streamlit:
• The Good: Very easy and fast to code and use, apps
look great and work – Wow!
• The Good-to-Know: Complex workflows with state
management harder to program, may be perceived as
slow by users in comparison to “professional” web apps
• Streamlit is perfect for getting a user-facing app off the
ground and testing your data-based product ideas!
Streamlit logo, see streamlit.io
39. 39
Streamlit App Blueprint
Trash Recognizer
Upload image(s)
Detected Trash
- 2 items for yellow trash can
- 1 item for blue trash can
No trash detected.
Detected Trash
- 1 item for other trash can
[Model + data description] Load ONNX model
1
Preprocess images
2
Run model inference
3
Postprocess images
4
What the user sees What the app does
Display results
5
43. 43
Deployment Pipeline
• Henkel Data Science Engineering team
developed pipeline for one-click
deployment of data science infrastructure,
incl. Streamlit apps, on secure Azure cloud
infrastructure
• Open sourced via article series
“Kickstarting Data Science Projects in
Azure DevOps” by Roberto Alonso
• Part 1 and 2 already available on
Henkel Data & Analytics Blog medium.com/henkel-data-and-analytics
medium.com/henkel-data-and-analytics
46. 46
Collecting Feedback
streamlit-feedback
Azure
Application Insights
Python logging
Azure Dashboards
Open-source feedback
plugin for Streamlit
Use AzureLogHandler through
opencensus logging extension
Collect logs from
application, query with
Kusto language
Interactive live dashboards on
Azure for application metrics
50. 50
Code, Slides, Details
PyData team + sponsors, Henkel, incl. Henkel Data Science CoE team,
Open-source contributors for TACO, ONNX, ONNX Runtime, Streamlit,
streamlit-feedback, Streamlit for reaching out before talk
Learn More
• Build-Measure-Learn Loop: The Lean Startup | Methodology
• Data Flywheel: Data Flywheel: Scaling a world-class data strategy
• Dataset: Tacodataset.org
• Automated Machine Learning on Azure: What is automated ML?
• ONNX: ONNX Runtime, ONNX File Format
• Streamlit: Get started with Streamlit, streamlit-feedback
• Azure Tricks for Data Science: Henkel Data & Analytics Blog
• Logging to Azure from Python: Monitor Python applications
• Azure Dashboards: Dashboards of Azure Log Analytics data
• A similar project: Instance Segmentation with Azure Machine Learning github.com/flrs/build_and_test_ml_quickly
Thanks
Photo credits, in order of appearance: Greg Rakozy, Canva Studio, Sewupari Studio, Massimo Botturi, Charlotte Coneybeer, Desola Landre Ologun, Studio Saiz, Claudio
Schwarz, NASA, Alena Darmel, Anna Shvetz, Vadim B, The Lucky Neko, Visual Tag Mx; User icon from “Redefining Women” icon collection by Iconathon
51. 51
What are
your questions?
Sr. Data Scientist
linkedin.com/in/florianroscheck
github.com/flrs
Florian Roscheck
Let’s connect!