SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
DevDays Asia 2019
Services
Infrastructure
Tools
But ML is hard!
Building a model
Building a
model
Building
a model
Data ingestion Data analysis
Data
transformation
Data validation Data splitting
Trainer
Model
validation
Training
at scale
LoggingRoll-out Serving Monitoring
End-to-End Machine Learning
What is the E2E ML lifecycle?
• Develop & train model with reusable ML pipelines
• Package model using containers to capture runtime dependencies for inference
• Validate model behavior—functionally, in terms of responsiveness, in terms of regulatory compliance
• Deploy model—to cloud & edge, for use in real-time/streaming/batch processing
• Monitor model behavior & business value, know when to replace/deprecate a stale model
Train Model Validate Model Deploy ModelPackage Model Monitor Model
Retrain Model
MLOps = ML + DEV + OPS
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
ML
+ Testing
Continuous Integration
Continuous Deployment
Azure Machine Learning:
Azure ML service
Key Artifacts
Workspace
Azure ML service Workspace Taxonomy
AML Python API
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2' # or other supported Azure region
)
# see workspace details
ws.get_details()
Step 2 – Create an Experiment
experiment_name = ‘my-experiment-1'
from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)
Step 1 – Create a workspace
Step 3 – Create remote compute target
# choose a name for your cluster, specify min and max nodes
compute_name = os.environ.get("BATCHAI_CLUSTER_NAME", "cpucluster")
compute_min_nodes = os.environ.get("BATCHAI_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("BATCHAI_CLUSTER_MAX_NODES", 4)
# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("BATCHAI_CLUSTER_SKU", "STANDARD_D2_V2")
provisioning_config = AmlCompute.provisioning_configuration(
vm_size = vm_size,
min_nodes = compute_min_nodes,
max_nodes = compute_max_nodes)
# create the cluster
print(‘ creating a new compute target... ')
compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
# You can poll for a minimum number of nodes and for a specific timeout.
# if no min node count is provided it will use the scale settings for the cluster
compute_target.wait_for_completion(show_output=True,
min_node_count=None, timeout_in_minutes=20)
# note that while loading, we are shrinking the intensity values (X) from 0-255 to 0-1 so that the
model converge faster.
X_train = load_data('./data/train-images.gz', False) / 255.0
y_train = load_data('./data/train-labels.gz', True).reshape(-1)
X_test = load_data('./data/test-images.gz', False) / 255.0
y_test = load_data('./data/test-labels.gz', True).reshape(-1)
First load the compressed files into numpy arrays. Note the ‘load_data’ is a custom function that simply parses
the compressed files into numpy arrays.
Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be
accessed for remote training. The files are uploaded into a directory named mnist at the root of the datastore.
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)
We now have everything you need to start training a model.
Step 4 – Upload data to the cloud
%%time from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Next, make predictions using the test set and calculate the accuracy
y_hat = clf.predict(X_test)
print(np.average(y_hat == y_test))
You should see the local model accuracy displayed. [It should be a number like
0.915]
Train a simple logistic regression model using scikit-learn locally. This should take a minute or two.
Step 5 – Train a local model
To submit a training job to a remote you have to perform the following tasks:
• 6.1: Create a directory
• 6.2: Create a training script
• 6.3: Create an estimator object
• 6.4: Submit the job
Step 6.1 – Create a directory
Create a directory to deliver the required code from your computer to the remote resource.
import os
script_folder = './sklearn-mnist' os.makedirs(script_folder, exist_ok=True)
Step 6 – Train model on remote cluster
%%writefile $script_folder/train.py
# load train and test set into numpy arrays
# Note: we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can
# converge faster.
# ‘data_folder’ variable holds the location of the data files (from datastore)
Reg = 0.8 # regularization rate of the logistic regression model.
X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0
X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)
y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = 'n’)
# get hold of the current run
run = Run.get_context()
#Train a logistic regression model with regularizaion rate of’ ‘reg’
clf = LogisticRegression(C=1.0/reg, random_state=42)
clf.fit(X_train, y_train)
Step 6.2 – Create a Training Script (1/2)
print('Predict the test set’)
y_hat = clf.predict(X_test)
# calculate accuracy on the prediction
acc = np.average(y_hat == y_test)
print('Accuracy is', acc)
run.log('regularization rate', np.float(args.reg))
run.log('accuracy', np.float(acc)) os.makedirs('outputs', exist_ok=True)
# The training script saves the model into a directory named ‘outputs’. Note files saved in the
# outputs folder are automatically uploaded into experiment record. Anything written in this
# directory is automatically uploaded into the workspace.
joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
Step 6.2 – Create a Training Script (2/2)
MLOps = ML + DEV + OPS
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
ML
+ Testing
Continuous Integration
Continuous Deployment
App developer
using Azure DevOps
Build appCollaborate Test app Release app Monitor app
Model reproducibility Model retrainingModel deploymentModel validation
Data scientist using
Azure Machine Learning
Code, dataset, and
environment
versioning
Model reproducibility Model retrainingModel deploymentModel validation
Build appCollaborate Test app Release app Monitor app
App developer
using Azure DevOps
Data scientist using
Azure Machine Learning
Model reproducibility Model retrainingModel deploymentModel validation
Automated ML
ML Pipelines
Hyperparameter
tuning
Train model
Build appCollaborate Test app Release app Monitor app
App developer
using Azure DevOps
Data scientist using
Azure Machine Learning
Model
validation &
certification
Model reproducibility Model retrainingModel deploymentModel validation
Train model Validate
model
Build appCollaborate Test app Release app Monitor app
App developer
using Azure DevOps
Data scientist using
Azure Machine Learning
Model packaging
Simple deployment
Model reproducibility Model retrainingModel deploymentModel validation
Train model Validate
model
Deploy
model
Build appCollaborate Test app Release app Monitor app
App developer
using Azure DevOps
Data scientist using
Azure Machine Learning
Model
management
& monitoring
Model performance
analysis
Model reproducibility Model retrainingModel deploymentModel validation
Train model Validate
model
Deploy
model
Monitor
model
Retrain model
Build appCollaborate Test app Release app Monitor app
App developer
using Azure DevOps
Data scientist using
Azure Machine Learning
Train model Validate
model
Deploy
model
Monitor
model
Retrain model
Model reproducibility Model retrainingModel deploymentModel validation
Build appCollaborate Test app Release app Monitor app
Azure DevOps integration
App developer
using Azure DevOps
Data scientist using
Azure Machine Learning
© Microsoft Corporation
Reproducibility /
Auditability
MLOps Benefits
Validation
Automation /
Observability
• Code drives generation
and deployments
• Pipelines are
reproducible and
verifiable
• All artifacts can be
tagged and audited
• SWE best practices for
quality control
• Offline comparisons of
model quality
• Minimize bias and
enable explainability
• Controlled rollout
capabilities
• Live comparison of
predicted vs. expected
performance
• Results fed back to
watch for drift and
improve model
Real World Examples
Customer Demo
Automation of Energy Demand Forecasting
Digital Vault built on Microsoft Azure, is to transform the Bee'ah building with
intelligent edge systems, devices and software designed to optimize energy efficiency, make the
best use of available space and help the building’s occupants be more productive through a
virtual AI persona
What is Johnson Controls Digital Vault?
§ A platform to bring together the
data of the built environment to
create smart scenarios
§ Spaces, assets, people can have a
digital representation in the platform
§ Digital Vault enables complete
digitization of the building wherever
it is in the building lifecycle
Who are we helping?
Bee’ah Solution
§ AI-based building with real-time energy, asset
and space management
§ Natural language user interface
(AI-enabled)
§ Unified cross-domain integration between BMS,
Security, Lighting, HR, FM, and other
subsystems.
§ Building life-cycle Management
§ Improved productivity and comfort
§ Friction-free building access and security
Train & Validate Model with Azure ML + Azure DevOps
MLOps on Azure
Recap—E2E ML Lifecycle
• Develop & train model with reusable ML pipelines
• Package model using containers to capture runtime dependencies for inference
• Validate model behavior—functionally, in terms of responsiveness, in terms of regulatory compliance
• Deploy model—to cloud & edge, for use in real-time/streaming/batch processing
• Monitor model behavior & business value, know when to replace/deprecate a stale model
Train Model Validate Model Deploy ModelPackage Model Monitor Model
Retrain Model
Orchestration Services
New Capabilities - Azure MLOps
Monitoring
Real-Time
Azure Kubernetes Service
ML Data Drift
Experimentation Monitoring
Batch
Azure ML Compute
Inference Monitoring
Compute
Azure DevOps ML
Extension
Storage
Asset management & orchestration services to help manage the ML lifecycle.
Model Packaging
Model Validation
Run History
Model Deployment
Asset Management
Environments **
Code **
Datasets **
ML Audit Trail
Training Services
Edge
Azure IoT Hub
New capabilities
Advanced ML pipelines
Continuous model deployment with Azure ML + Azure
DevOps
Automated, data-driven retraining
Model auditability & interpretability
New capabilities
Advanced ML pipelines
Continuous model deployment with Azure ML +
Azure DevOps
Automated, data-driven retraining
Model auditability & interpretability
Step 1 – Connect your Workspace to Azure DevOps
Step 2 – set up a release pipeline for your
model
Step 2 – Create a Release to Deploy a Model!
New capabilities
Advanced ML pipelines
Continuous model deployment with Azure ML + Azure
DevOps
Automated, data-driven retraining
Model auditability & interpretability
Model’s deployed…what’s next?
Let’s set up automated retraining!
How are we going to do that?
How can we scale this up / out?
We can speed up retraining by re-using our previous model!
Transfer learning will leverage the current PROD model, prune the
graph, and train a model using fresh data
We can customize our “model re-use” depending on the scenario!
- New video dataset for the same video feed
- New video dataset, same general scenario
- Totally different dataset
Recap: What did MLOps give us here?
Model reproducibility
Model validation
Model deployment
Automated retraining
New capabilities
Advanced ML pipelines
Continuous model deployment with Azure ML + Azure
DevOps
Automated, data-driven retraining
Model auditability & interpretability
Create a dataset
Azure ML now lets you track, profile, and snapshot DataSets.
Train a model
We’re going to submit a training job from a notebook!
(but you can also submit from the CLI)
Let’s see what we captured in the run…
We can explore a trained model’s behavior
Let’s deploy the model
Show off the new Python SDK in the notebook
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data": "IBM_Attrition",
"method" : "local_explanation"},
description='Explain predictions on employee attrition')
inference_config = InferenceConfig(entry_script='score.py',
extra_docker_file_steps='Dockerfile',
runtime='python',
conda_file='myenv.yml')
service = Model.deploy(ws, name='predictattritionsvc', models=[scoring_explainer_model, attrition_model], inference_config=
inference_config, deployment_config=aciconfig)
How do we know when to retrain?
MLOps Recipes
MLOps accelerates going to Production
• We have pre-built solutions you can customize for your business
• We will continue to flesh these recipes out as we go
• Everything we build, we will build in the open!
Conclusions
MLOps makes it possible to do production-grade ML
Azure is investing heavily in MLOps
Try Azure ML for MLOps today!
https://aka.ms/mlops
Breaking the Wall between Data Scientists and App Developers
with MLOps
https://mybuild.techcommunity.microsoft.com/sessions/76973
APPENDIX
Bring your own base container for inference
• Use an optimized inference container (TensorRT + ONNX)
• Define your dependencies inside of the container, instead of installing with pip
and conda
• Re-use the same base image you used for training (which you know works,
because you trained your model over there)
• Coming soon:
– Support for bringing your own base container + serve stack
(Tfserving/ONNXserving/R/Scala support)
Automatic Schema Generation
• Provide a sample input &
output in score.py
• Swagger schema generated
automatically
• Enables integration with Excel +
PowerBI

Contenu connexe

Tendances

MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
Edge AI and Vision Alliance
 

Tendances (20)

MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production Application
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 

Similaire à ML-Ops how to bring your data science to production

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
Steve Feldman
 

Similaire à ML-Ops how to bring your data science to production (20)

Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
Oleksandr Krakovetskyi: Azure Machine Learning Servise for Data Specialists
Oleksandr Krakovetskyi: Azure Machine Learning Servise for Data SpecialistsOleksandr Krakovetskyi: Azure Machine Learning Servise for Data Specialists
Oleksandr Krakovetskyi: Azure Machine Learning Servise for Data Specialists
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
[AWS Dev Day] 실습워크샵 | 모두를 위한 컴퓨터 비전 딥러닝 툴킷, GluonCV 따라하기
[AWS Dev Day] 실습워크샵 | 모두를 위한 컴퓨터 비전 딥러닝 툴킷, GluonCV 따라하기[AWS Dev Day] 실습워크샵 | 모두를 위한 컴퓨터 비전 딥러닝 툴킷, GluonCV 따라하기
[AWS Dev Day] 실습워크샵 | 모두를 위한 컴퓨터 비전 딥러닝 툴킷, GluonCV 따라하기
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
 
Oracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinarOracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinar
 
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
 
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
Amazon SageMaker를 통한 대용량 모델 훈련 방법 살펴보기 - 김대근 AWS AI/ML 스페셜리스트 솔루션즈 아키텍트 / 최영준...
 

Plus de Herman Wu

好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)
Herman Wu
 

Plus de Herman Wu (14)

Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
運用MMLSpark 來加速Spark 上 機器學習專案
運用MMLSpark 來加速Spark 上機器學習專案運用MMLSpark 來加速Spark 上機器學習專案
運用MMLSpark 來加速Spark 上 機器學習專案
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
運用對話機器人提供線上客服服務
運用對話機器人提供線上客服服務運用對話機器人提供線上客服服務
運用對話機器人提供線上客服服務
 
Bot Framework & Azure cognitive service簡介
Bot Framework & Azure cognitive service簡介Bot Framework & Azure cognitive service簡介
Bot Framework & Azure cognitive service簡介
 
選擇正確的Solution 來建置現代化的雲端資料倉儲
選擇正確的Solution 來建置現代化的雲端資料倉儲選擇正確的Solution 來建置現代化的雲端資料倉儲
選擇正確的Solution 來建置現代化的雲端資料倉儲
 
Azure Data Lake 簡介
Azure Data Lake 簡介Azure Data Lake 簡介
Azure Data Lake 簡介
 
Azure HDInsight 介紹
Azure HDInsight 介紹Azure HDInsight 介紹
Azure HDInsight 介紹
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Azure Machine Learning using R
Azure Machine Learning using RAzure Machine Learning using R
Azure Machine Learning using R
 
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
貫通物聯網每一哩路 with Microsfot Azure IoT Sutie
 
物聯網應用全貌以及微軟全球案例
物聯網應用全貌以及微軟全球案例物聯網應用全貌以及微軟全球案例
物聯網應用全貌以及微軟全球案例
 
Windows phone發展概況 2013Q3
Windows phone發展概況 2013Q3Windows phone發展概況 2013Q3
Windows phone發展概況 2013Q3
 
好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)好的Windows Phone App 主要特色 (注意事項)
好的Windows Phone App 主要特色 (注意事項)
 

Dernier

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Dernier (20)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

ML-Ops how to bring your data science to production

  • 1.
  • 3.
  • 5. But ML is hard!
  • 7. Building a model Building a model Data ingestion Data analysis Data transformation Data validation Data splitting Trainer Model validation Training at scale LoggingRoll-out Serving Monitoring
  • 9. What is the E2E ML lifecycle? • Develop & train model with reusable ML pipelines • Package model using containers to capture runtime dependencies for inference • Validate model behavior—functionally, in terms of responsiveness, in terms of regulatory compliance • Deploy model—to cloud & edge, for use in real-time/streaming/batch processing • Monitor model behavior & business value, know when to replace/deprecate a stale model Train Model Validate Model Deploy ModelPackage Model Monitor Model Retrain Model
  • 10. MLOps = ML + DEV + OPS Experiment Data Acquisition Business Understanding Initial Modeling Develop Modeling Operate Continuous Delivery Data Feedback Loop System + Model Monitoring ML + Testing Continuous Integration Continuous Deployment
  • 12. Azure ML service Key Artifacts Workspace
  • 13. Azure ML service Workspace Taxonomy
  • 15. from azureml.core import Workspace ws = Workspace.create(name='myworkspace', subscription_id='<azure-subscription-id>', resource_group='myresourcegroup', create_resource_group=True, location='eastus2' # or other supported Azure region ) # see workspace details ws.get_details() Step 2 – Create an Experiment experiment_name = ‘my-experiment-1' from azureml.core import Experiment exp = Experiment(workspace=ws, name=experiment_name) Step 1 – Create a workspace
  • 16. Step 3 – Create remote compute target # choose a name for your cluster, specify min and max nodes compute_name = os.environ.get("BATCHAI_CLUSTER_NAME", "cpucluster") compute_min_nodes = os.environ.get("BATCHAI_CLUSTER_MIN_NODES", 0) compute_max_nodes = os.environ.get("BATCHAI_CLUSTER_MAX_NODES", 4) # This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6 vm_size = os.environ.get("BATCHAI_CLUSTER_SKU", "STANDARD_D2_V2") provisioning_config = AmlCompute.provisioning_configuration( vm_size = vm_size, min_nodes = compute_min_nodes, max_nodes = compute_max_nodes) # create the cluster print(‘ creating a new compute target... ') compute_target = ComputeTarget.create(ws, compute_name, provisioning_config) # You can poll for a minimum number of nodes and for a specific timeout. # if no min node count is provided it will use the scale settings for the cluster compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
  • 17. # note that while loading, we are shrinking the intensity values (X) from 0-255 to 0-1 so that the model converge faster. X_train = load_data('./data/train-images.gz', False) / 255.0 y_train = load_data('./data/train-labels.gz', True).reshape(-1) X_test = load_data('./data/test-images.gz', False) / 255.0 y_test = load_data('./data/test-labels.gz', True).reshape(-1) First load the compressed files into numpy arrays. Note the ‘load_data’ is a custom function that simply parses the compressed files into numpy arrays. Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The files are uploaded into a directory named mnist at the root of the datastore. ds = ws.get_default_datastore() print(ds.datastore_type, ds.account_name, ds.container_name) ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True) We now have everything you need to start training a model. Step 4 – Upload data to the cloud
  • 18. %%time from sklearn.linear_model import LogisticRegression clf = LogisticRegression() clf.fit(X_train, y_train) # Next, make predictions using the test set and calculate the accuracy y_hat = clf.predict(X_test) print(np.average(y_hat == y_test)) You should see the local model accuracy displayed. [It should be a number like 0.915] Train a simple logistic regression model using scikit-learn locally. This should take a minute or two. Step 5 – Train a local model
  • 19. To submit a training job to a remote you have to perform the following tasks: • 6.1: Create a directory • 6.2: Create a training script • 6.3: Create an estimator object • 6.4: Submit the job Step 6.1 – Create a directory Create a directory to deliver the required code from your computer to the remote resource. import os script_folder = './sklearn-mnist' os.makedirs(script_folder, exist_ok=True) Step 6 – Train model on remote cluster
  • 20. %%writefile $script_folder/train.py # load train and test set into numpy arrays # Note: we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can # converge faster. # ‘data_folder’ variable holds the location of the data files (from datastore) Reg = 0.8 # regularization rate of the logistic regression model. X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0 X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0 y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1) y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = 'n’) # get hold of the current run run = Run.get_context() #Train a logistic regression model with regularizaion rate of’ ‘reg’ clf = LogisticRegression(C=1.0/reg, random_state=42) clf.fit(X_train, y_train) Step 6.2 – Create a Training Script (1/2)
  • 21. print('Predict the test set’) y_hat = clf.predict(X_test) # calculate accuracy on the prediction acc = np.average(y_hat == y_test) print('Accuracy is', acc) run.log('regularization rate', np.float(args.reg)) run.log('accuracy', np.float(acc)) os.makedirs('outputs', exist_ok=True) # The training script saves the model into a directory named ‘outputs’. Note files saved in the # outputs folder are automatically uploaded into experiment record. Anything written in this # directory is automatically uploaded into the workspace. joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl') Step 6.2 – Create a Training Script (2/2)
  • 22. MLOps = ML + DEV + OPS Experiment Data Acquisition Business Understanding Initial Modeling Develop Modeling Operate Continuous Delivery Data Feedback Loop System + Model Monitoring ML + Testing Continuous Integration Continuous Deployment
  • 23. App developer using Azure DevOps Build appCollaborate Test app Release app Monitor app Model reproducibility Model retrainingModel deploymentModel validation Data scientist using Azure Machine Learning
  • 24. Code, dataset, and environment versioning Model reproducibility Model retrainingModel deploymentModel validation Build appCollaborate Test app Release app Monitor app App developer using Azure DevOps Data scientist using Azure Machine Learning
  • 25. Model reproducibility Model retrainingModel deploymentModel validation Automated ML ML Pipelines Hyperparameter tuning Train model Build appCollaborate Test app Release app Monitor app App developer using Azure DevOps Data scientist using Azure Machine Learning
  • 26. Model validation & certification Model reproducibility Model retrainingModel deploymentModel validation Train model Validate model Build appCollaborate Test app Release app Monitor app App developer using Azure DevOps Data scientist using Azure Machine Learning
  • 27. Model packaging Simple deployment Model reproducibility Model retrainingModel deploymentModel validation Train model Validate model Deploy model Build appCollaborate Test app Release app Monitor app App developer using Azure DevOps Data scientist using Azure Machine Learning
  • 28. Model management & monitoring Model performance analysis Model reproducibility Model retrainingModel deploymentModel validation Train model Validate model Deploy model Monitor model Retrain model Build appCollaborate Test app Release app Monitor app App developer using Azure DevOps Data scientist using Azure Machine Learning
  • 29. Train model Validate model Deploy model Monitor model Retrain model Model reproducibility Model retrainingModel deploymentModel validation Build appCollaborate Test app Release app Monitor app Azure DevOps integration App developer using Azure DevOps Data scientist using Azure Machine Learning
  • 30.
  • 31. © Microsoft Corporation Reproducibility / Auditability MLOps Benefits Validation Automation / Observability • Code drives generation and deployments • Pipelines are reproducible and verifiable • All artifacts can be tagged and audited • SWE best practices for quality control • Offline comparisons of model quality • Minimize bias and enable explainability • Controlled rollout capabilities • Live comparison of predicted vs. expected performance • Results fed back to watch for drift and improve model
  • 33. Customer Demo Automation of Energy Demand Forecasting Digital Vault built on Microsoft Azure, is to transform the Bee'ah building with intelligent edge systems, devices and software designed to optimize energy efficiency, make the best use of available space and help the building’s occupants be more productive through a virtual AI persona
  • 34. What is Johnson Controls Digital Vault? § A platform to bring together the data of the built environment to create smart scenarios § Spaces, assets, people can have a digital representation in the platform § Digital Vault enables complete digitization of the building wherever it is in the building lifecycle
  • 35. Who are we helping? Bee’ah Solution § AI-based building with real-time energy, asset and space management § Natural language user interface (AI-enabled) § Unified cross-domain integration between BMS, Security, Lighting, HR, FM, and other subsystems. § Building life-cycle Management § Improved productivity and comfort § Friction-free building access and security
  • 36.
  • 37.
  • 38. Train & Validate Model with Azure ML + Azure DevOps
  • 39.
  • 41. Recap—E2E ML Lifecycle • Develop & train model with reusable ML pipelines • Package model using containers to capture runtime dependencies for inference • Validate model behavior—functionally, in terms of responsiveness, in terms of regulatory compliance • Deploy model—to cloud & edge, for use in real-time/streaming/batch processing • Monitor model behavior & business value, know when to replace/deprecate a stale model Train Model Validate Model Deploy ModelPackage Model Monitor Model Retrain Model
  • 42. Orchestration Services New Capabilities - Azure MLOps Monitoring Real-Time Azure Kubernetes Service ML Data Drift Experimentation Monitoring Batch Azure ML Compute Inference Monitoring Compute Azure DevOps ML Extension Storage Asset management & orchestration services to help manage the ML lifecycle. Model Packaging Model Validation Run History Model Deployment Asset Management Environments ** Code ** Datasets ** ML Audit Trail Training Services Edge Azure IoT Hub
  • 43. New capabilities Advanced ML pipelines Continuous model deployment with Azure ML + Azure DevOps Automated, data-driven retraining Model auditability & interpretability
  • 44. New capabilities Advanced ML pipelines Continuous model deployment with Azure ML + Azure DevOps Automated, data-driven retraining Model auditability & interpretability
  • 45. Step 1 – Connect your Workspace to Azure DevOps
  • 46. Step 2 – set up a release pipeline for your model
  • 47. Step 2 – Create a Release to Deploy a Model!
  • 48. New capabilities Advanced ML pipelines Continuous model deployment with Azure ML + Azure DevOps Automated, data-driven retraining Model auditability & interpretability
  • 49. Model’s deployed…what’s next? Let’s set up automated retraining! How are we going to do that?
  • 50. How can we scale this up / out? We can speed up retraining by re-using our previous model! Transfer learning will leverage the current PROD model, prune the graph, and train a model using fresh data We can customize our “model re-use” depending on the scenario! - New video dataset for the same video feed - New video dataset, same general scenario - Totally different dataset
  • 51. Recap: What did MLOps give us here? Model reproducibility Model validation Model deployment Automated retraining
  • 52. New capabilities Advanced ML pipelines Continuous model deployment with Azure ML + Azure DevOps Automated, data-driven retraining Model auditability & interpretability
  • 53. Create a dataset Azure ML now lets you track, profile, and snapshot DataSets.
  • 54. Train a model We’re going to submit a training job from a notebook! (but you can also submit from the CLI)
  • 55. Let’s see what we captured in the run…
  • 56. We can explore a trained model’s behavior
  • 57. Let’s deploy the model Show off the new Python SDK in the notebook aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, tags={"data": "IBM_Attrition", "method" : "local_explanation"}, description='Explain predictions on employee attrition') inference_config = InferenceConfig(entry_script='score.py', extra_docker_file_steps='Dockerfile', runtime='python', conda_file='myenv.yml') service = Model.deploy(ws, name='predictattritionsvc', models=[scoring_explainer_model, attrition_model], inference_config= inference_config, deployment_config=aciconfig)
  • 58. How do we know when to retrain?
  • 60. MLOps accelerates going to Production • We have pre-built solutions you can customize for your business • We will continue to flesh these recipes out as we go • Everything we build, we will build in the open!
  • 61. Conclusions MLOps makes it possible to do production-grade ML Azure is investing heavily in MLOps Try Azure ML for MLOps today! https://aka.ms/mlops Breaking the Wall between Data Scientists and App Developers with MLOps https://mybuild.techcommunity.microsoft.com/sessions/76973
  • 63. Bring your own base container for inference • Use an optimized inference container (TensorRT + ONNX) • Define your dependencies inside of the container, instead of installing with pip and conda • Re-use the same base image you used for training (which you know works, because you trained your model over there) • Coming soon: – Support for bringing your own base container + serve stack (Tfserving/ONNXserving/R/Scala support)
  • 64. Automatic Schema Generation • Provide a sample input & output in score.py • Swagger schema generated automatically • Enables integration with Excel + PowerBI