Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Waking the Data Scientist @ 2am:
Detect Model Degradation on Production Models
with Amazon SageMaker Endpoints & Model Mon...
Who am I?
• Former Netflix, Databricks
• Organizer Advanced Kubeflow
Meetup (Globally)
• Co-Author @ Data Science on AWS (...
Data Science on AWS – Book Outline
https://www.datascienceonaws.com/
Amazon SageMaker
A fully managed service that covers the entire Machine Learning workflow
Amazon SageMaker
re:Invent 2019 announcements
First fully integrated
development
environment (IDE) for
machine learning
Am...
Amazon SageMaker
focus of this session
First fully integrated
development
environment (IDE) for
machine learning
Amazon Sa...
Amazon SageMaker Debugger
Debugging machine
learning training is
painful
Large neural networks
with many layers
Many connections
Computationally
int...
Manually print debug
data
Manually analyze the debug
data
Use open source tools for
charting
Valuable data
scientist/ML pr...
Example Issues While Training ML models
• Vanishing gradients
• Exploding gradients
• Loss not decreasing across steps
• W...
An example: vanishing gradients
𝑥!
𝑥"
𝑥#
𝑤1
!,!
𝑤1
!,"
𝑤1
!,#
𝑤2
!,!
𝑤2
!,%
𝑤3
!,!
𝑤3
!,"
…
…
…
backpropagation
Weights up...
An Example: XGBoost- Loss Not Decreasing
• Overfitting is a problem with non-linear algorithms such as XGBoost
• By monito...
Automatic data
analysis
Relevant data
capture
Automatic error
detection
Faster training
Amazon SageMaker
Studio
integratio...
How does Amazon SageMaker Debugger Work?
Training in
progress
Analysis in
progress
Customer’s S3 Bucket
Amazon
CloudWatch ...
Add Debugger to Training Job
Initialize your hook and
save tensors in specified
path
Initialize your rules.
These will rea...
Amazon SageMaker Model Monitor
Deploying a model is not the end.
You need to continuously monitor
models in production and iterate
Concept drift due to
d...
Introducing Amazon SageMaker Model Monitor
Automatic data
collection
Continuous
Monitoring
CloudWatch
Integration
Data col...
How Does Model Monitor Work?
1. Create/ Update Amazon SageMaker Endpoint
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Applications
2: Enable Data Collection for SageMaker Endpoint
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Application...
Enable data capture
s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}
/yyyy/mm/dd/hh/filename.jsonl
View data collected from end...
Example saved prediction request & response
3. Create baseline with train / validation dataset
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Applicati...
Create a baseline
Under the hood
1. Amazon Model Monitor runs a ProcessingJob on your behalf
• On-demand, distributed job
• Fully managed – ...
Baselining results - Statistics
baselining/results/statistics.json
Baselining results – suggested constraints
baselining/results/constraints.json
4. Create a monitoring schedule
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Applications
Scheduled
Monit...
Schedule Monitoring Job
Under the hood
1. Amazon Model Monitor runs ProcessingJob on your behalf at the
schedule you select –i.e. Monitoring Jobs
...
View Monitoring Jobs
5. View monitoring results
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Applications
Scheduled
Monitoring...
6. Get alerted and take corrective actions
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Applications
Sche...
Take corrective actions
1. Set alarms in Amazon CloudWatch and triggers for retraining
SageMaker Model Monitor Summary
Amazon SageMaker
Training job
Model Amazon SageMaker
Endpoint
Applications
Scheduled
Monit...
References
https://github.com/aws-samples/reinvent2019-aim362-sagemaker-debugger-model-monitor/
https://aws.amazon.com/blo...
Thank You!
Waking the Data Scientist @ 2am:
Detect Model Degradation on Production
Models with Amazon SageMaker Endpoints ...
Prochain SlideShare
Chargement dans…5
×

Waking the Data Scientist at 2am: Detect Model Degradation on Production Models with Amazon SageMaker Endpoints & Model Monitor

588 vues

Publié le

Waking the Data Scientist at 2am:
Detect Model Degradation on Production Models with Amazon SageMaker Endpoints & Model Monitor

In this talk, I describe how to deploy a model into production and monitor its performance using SageMaker Model Monitor. With Model Monitor, I can detect if a model's predictive performance has degraded - and alert an on-call data scientist to take action and improve the model at 2am while the DevOps folks sleep soundly through the night.

Topics: AI and Machine Learning, Model Deployment, Anomaly Detection, Amazon SageMaker Endpoints, and Model Monitor

Publié dans : Logiciels
  • Soyez le premier à commenter

Waking the Data Scientist at 2am: Detect Model Degradation on Production Models with Amazon SageMaker Endpoints & Model Monitor

  1. 1. Waking the Data Scientist @ 2am: Detect Model Degradation on Production Models with Amazon SageMaker Endpoints & Model Monitor Chris Fregly Developer Advocate @ AWS AI and Machine Learning https://datascienceonaws.com github.com/data-science-on-aws @cfregly linkedin.com/in/cfregly
  2. 2. Who am I? • Former Netflix, Databricks • Organizer Advanced Kubeflow Meetup (Globally) • Co-Author @ Data Science on AWS (O’Reilly 2021)
  3. 3. Data Science on AWS – Book Outline https://www.datascienceonaws.com/
  4. 4. Amazon SageMaker A fully managed service that covers the entire Machine Learning workflow
  5. 5. Amazon SageMaker re:Invent 2019 announcements First fully integrated development environment (IDE) for machine learning Amazon SageMaker Studio Enhanced notebook experience with quick-start & easy collaboration Amazon SageMaker Notebooks Automatic debugging, analysis, and alerting Amazon SageMaker Debugger Experiment management system to organize, track, & compare thousands of experiments Amazon SageMaker Experiments monitoring to detect deviation in quality & take corrective actions Amazon SageMaker Model Monitor Automatic generation of ML models with full visibility & control Amazon SageMaker Autopilot
  6. 6. Amazon SageMaker focus of this session First fully integrated development environment (IDE) for machine learning Amazon SageMaker Studio Enhanced notebook experience with quick-start & easy collaboration Amazon SageMaker Notebooks Automatic debugging, analysis, and alerting Amazon SageMaker Debugger Experiment management system to organize, track, & compare thousands of experiments Amazon SageMaker Experiments Model monitoring to detect deviation in quality & take corrective actions Amazon SageMaker Model Monitor Automatic generation of ML models with full visibility & control Amazon SageMaker Autopilot
  7. 7. Amazon SageMaker Debugger
  8. 8. Debugging machine learning training is painful Large neural networks with many layers Many connections Computationally intensive Extraordinarily difficult to inspect, debug, and profile the ‘black box’ + + = Challenges with Machine Learning Training
  9. 9. Manually print debug data Manually analyze the debug data Use open source tools for charting Valuable data scientist/ML practitioner time wasted + + = Challenges with Machine Learning Training Debugging machine learning training is painful
  10. 10. Example Issues While Training ML models • Vanishing gradients • Exploding gradients • Loss not decreasing across steps • Weight update ratios are either too small or too large • Tensor values are all zeros Debugging them is hard, even harder when running distributed training All these issues impact on the learning process
  11. 11. An example: vanishing gradients 𝑥! 𝑥" 𝑥# 𝑤1 !,! 𝑤1 !," 𝑤1 !,# 𝑤2 !,! 𝑤2 !,% 𝑤3 !,! 𝑤3 !," … … … backpropagation Weights update rule 𝑊!"# = 𝑊 − 𝜂 % ∇$ 𝐿 Intuition Gradients vanish when they assume a very small value à almost no weight update during backpropagation Why this happens? An example 𝜎 𝑧 = 1 1 + 𝑒!" 𝜕𝐿 𝜕𝜎 𝜕𝐿 𝜕𝑤! = 𝜕𝐿 𝜕𝑜𝑢𝑡𝑝𝑢𝑡 ∗ 𝜕𝑜𝑢𝑡𝑝𝑢𝑡 𝜕ℎ𝑖𝑑𝑑𝑒𝑛2 ∗ 𝜕ℎ𝑖𝑑𝑑𝑒𝑛2 𝜕ℎ𝑖𝑑𝑑𝑒𝑛1 ∗ 𝜕ℎ𝑖𝑑𝑑𝑒𝑛1 𝜕𝑤! can be small input hidden1 hidden2 output 𝜕𝐿 𝜕𝑧 = 𝜕𝐿 𝜕𝜎 𝜕𝜎 𝜕𝑧
  12. 12. An Example: XGBoost- Loss Not Decreasing • Overfitting is a problem with non-linear algorithms such as XGBoost • By monitoring the performance of the loss over the last number of steps, training can be completed early, by defining that the loss is not decreasing or not decreasing at the expected rate. • In this example training could be completed somewhere between 20 and 40 epochs
  13. 13. Automatic data analysis Relevant data capture Automatic error detection Faster training Amazon SageMaker Studio integration Debug data with no code changes Data is automatically captured for analysis Errors are automatically detected and alerts are sent Analyze and debug across distributed clusters Analyze & debug from Amazon SageMaker Studio Training data analysis, debugging, & alert generation Introducing Amazon SageMaker Debugger
  14. 14. How does Amazon SageMaker Debugger Work? Training in progress Analysis in progress Customer’s S3 Bucket Amazon CloudWatch Event Amazon SageMaker Amazon SageMaker Studio Visualization Amazon SageMaker Notebook Action à Stop the training Action à Analyze using Debugger SDK Action à Visualize Tensors using charts • No code change is necessary to emit debug data with built in algorithms and custom training script • Analysis occurs real time as data is emitted making real time alerts possible
  15. 15. Add Debugger to Training Job Initialize your hook and save tensors in specified path Initialize your rules. These will read data for analysis from the path specified in the hook
  16. 16. Amazon SageMaker Model Monitor
  17. 17. Deploying a model is not the end. You need to continuously monitor models in production and iterate Concept drift due to divergence of data Model performance can change due to unknown factors Continuous monitoring involves a lot of tooling and expense Model monitoring is cumbersome but critical + + =
  18. 18. Introducing Amazon SageMaker Model Monitor Automatic data collection Continuous Monitoring CloudWatch Integration Data collected from endpoints is stored in Amazon S3 Metrics emitted to Amazon CloudWatch make it easy to alarm and automate corrective actions Continuous monitoring of models in production Visual Data analysis Define a monitoring schedule and detect changes in quality against a pre-defined baseline See monitoring results, data statistics, and violation reports in Amazon SageMaker Studio; Analyze in Notebooks Flexibility with rules Use built-in rules to detect data drift or write your own rules for custom analysis
  19. 19. How Does Model Monitor Work?
  20. 20. 1. Create/ Update Amazon SageMaker Endpoint Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications
  21. 21. 2: Enable Data Collection for SageMaker Endpoint Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications Requests, predictions
  22. 22. Enable data capture
  23. 23. s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name} /yyyy/mm/dd/hh/filename.jsonl View data collected from endpoint sagemaker/UC-DEMO-ModelMonitor/datacapture/UC-DEMO-xgb-churn-pred-model- monitor-2019-12-01-21-09-29/AllTraffic/2019/12/01/21/28-45-917-ae917300- 350f-4482-ac73-4e838d9d6115.jsonl sagemaker/UC-DEMO-ModelMonitor/datacapture/UC-DEMO-xgb-churn-pred-model- monitor-2019-12-01-21-09-29/AllTraffic/2019/12/01/21/29-45-951-27c7035d- 87f8-45f9-9993-8008abc43aaa.jsonl
  24. 24. Example saved prediction request & response
  25. 25. 3. Create baseline with train / validation dataset Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications Baseline statistics and constraints Requests, predictions Baseline Processing Job
  26. 26. Create a baseline
  27. 27. Under the hood 1. Amazon Model Monitor runs a ProcessingJob on your behalf • On-demand, distributed job • Fully managed – ideal for data processing and custom analysis • Pay for duration for which the job runs 2. Analyzes the data collected • SageMaker provides pre-built container for analysis • Pre-built container runs Deequ on Spark • Custom analysis also supported
  28. 28. Baselining results - Statistics baselining/results/statistics.json
  29. 29. Baselining results – suggested constraints baselining/results/constraints.json
  30. 30. 4. Create a monitoring schedule Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications Scheduled Monitoring Job Baseline statistics and constraints Requests, predictions Baseline Processing Job
  31. 31. Schedule Monitoring Job
  32. 32. Under the hood 1. Amazon Model Monitor runs ProcessingJob on your behalf at the schedule you select –i.e. Monitoring Jobs 2. Analyzes the data collected using your choice of analysis container (pre-built or custom) Compares results against the baseline Generates results for each Monitoring job • Violations report for each job in Amazon S3 • Statistics report for data collected during the run • Emits summary metrics and statistics to Amazon CloudWatch
  33. 33. View Monitoring Jobs
  34. 34. 5. View monitoring results Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications Scheduled Monitoring Job Results: statistics and violations Baseline statistics and constraints Amazon CloudWatch metrics Requests, predictions Baseline Processing Job
  35. 35. 6. Get alerted and take corrective actions Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications Scheduled Monitoring Job Results: statistics and violations Baseline statistics and constraints Amazon CloudWatch metrics Requests, predictions Baseline Processing Job Analysis of results Notifications • Model updates • Training data updates • Retraining
  36. 36. Take corrective actions 1. Set alarms in Amazon CloudWatch and triggers for retraining
  37. 37. SageMaker Model Monitor Summary Amazon SageMaker Training job Model Amazon SageMaker Endpoint Applications Scheduled Monitoring Job Results: statistics and violations Baseline statistics and constraints Amazon CloudWatch metrics Requests, predictions Baseline Processing Job Analysis of results Notifications • Model updates • Training data updates • Retraining
  38. 38. References https://github.com/aws-samples/reinvent2019-aim362-sagemaker-debugger-model-monitor/ https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model- predictions-with-amazon-sagemaker-model-monitor-and-debugger/
  39. 39. Thank You! Waking the Data Scientist @ 2am: Detect Model Degradation on Production Models with Amazon SageMaker Endpoints & Model Monitor Chris Fregly Developer Advocate @ AWS AI and Machine Learning https://datascienceonaws.com github.com/data-science-on-aws @cfregly linkedin.com/in/cfregly

×