SlideShare une entreprise Scribd logo
1  sur  25
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Soji Adeshina, Machine Learning Engineer, Amazon AI
SageMaker Automatic Model
Tuning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roadmap
• Hyperparameters
• Search Based HPO
• Bayesian HPO
• Amazon SageMaker AMT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameters
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is a Hyperparameter
• Hyperparameter = algorithm parameter
• Training algorithm accepts hyperparameter(s) and returns model
parameters
• It affects how an algorithm behaves during model training process
• “Any decision an algorithm author can’t make for you”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples of Hyperparameters
Model:
Number of layers: 1, 2, 3, …
Activation functions: Sigmoid, tanh, RELU, …
Optimization:
Method: SGD, Adam, AdaGrad, …
Learning Rate: 0.01 to 2
Data:
Batch Size: 8, 16, 32 …
Augmentation: Resize, Normalize, Color Jitter, …
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model vs Hyperparameter Optimization
𝑙∗
= min
𝜃
ℎ(𝜃)
ℎ(𝜃) = min
𝑤
𝑓(𝑤|𝑋, 𝑦, 𝜃)
Optimize Model params (𝑤)
Optimize Hyperparams (𝜃)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blackbox Optimization
• We aim to minimize the objective function .
• We have no knowledge of what the objective function is.
• We don’t have access to the gradients of the objective function.
• All we know is what goes into the function and what comes out.
ℎ( 𝜃)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Search Based HPO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search - Shortcomings
• In grid search the user specifies a finite set of values for each hyperparameter.
• Each hyperparam increases degree of freedom and results in combinatorial explosion.
• Assume each hyper-param has 5 options
e.g. Learning Rate: 0, 0.5, 1, 1.5, 2
1 HP = 5 combinations
2 HPs = 5*5 = 25 combinations
3 HPs = 5*5*5 = 125 combinations
…
10 HPs = 5^10 = 9,765,625 combinations
N HPs = 5^N combinations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search - Shortcomings
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
Some hyper-params more important than others.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
Wasted Compute
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Random Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bayesian HPO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model based Bayesian HPO
Learning Rate
Activation
RELU
0 20.5 1 1.5
ℎ 𝜃 : 𝑡𝑟𝑢𝑒 (ℎ𝑖𝑑𝑑𝑒𝑛)
𝐷: 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
ℎ′ 𝜃 : 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒
𝑐: 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
• ℎ 𝜃 is expensive so use an approximation or surrogate model ℎ′(𝜃) instead
• Use an acquisition function 𝔼[𝐼 𝜆 ] to selects next points
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Keeps track of previous evaluations and infers expected behaviour.
• It is Bayesian in a sense that the surrogate model model uses prior probability
distribution to make predictions about the posterior.
𝑃 𝑌 𝑋 ∝ 𝑃 𝑌 𝑋 𝑃(𝑌)
• Improves our beliefs about the objective function by applying iterative learning.
Model based Bayesian HPO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Surrogate Model - Gaussian Process
• Gaussian Process is a distribution over functions each of which returns mean and variance of a
Gaussian distribution.
𝑓: 𝒳 → ℝ
𝑓(𝑋𝑡1
), 𝑓(𝑋𝑡2
), … , 𝑓(𝑋𝑡 𝑛
)~𝒩(𝝁, 𝜮)
• Gaussian distribution is a distribution of random numbers that is described by mean 𝜇 and variance
𝜎2
.
• Each distribution corresponds to a set of hyperparameters Λ;
𝜆𝑖 𝜖Λ = 𝑖=1
𝑛
Λ 𝑖
• A Gaussian process is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′).
𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′))
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gaussian Process for model of model loss
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Covariance Matrix
Similarity between 2 points: controls ‘smoothness’.
SageMaker uses Matérn kernel with 𝜐 = 5/2
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Acquisition Function
• Given posterior distribution of functions…
𝔼 𝕀 𝜆 = 𝔼[max(𝑓_ min −𝑌, 0)]
• Used as criteria for selecting next candidate hyperparams for evaluation.
• Often depends on the best hyperparams seen so far in search.
• Controls exploration vs exploitation in search.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Acquisition Function: Expected Improvement
0.3 0.2
𝐸𝐼 𝑥1 > 𝐸𝐼(𝑥2)
𝑥1
𝑥2
1
-1
70%
Current best
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Acquisition Function
• Expected improvement
[maximining the dashed line] has
two components:
• One is dependent on −𝜇 [solid line]
• The other dependent on uncertainty or
variance 𝑘(𝜆, 𝜆′) [blue line]
• There fore we maximize the
acquisition function wherever:
• Mean, 𝜇, is low, or
• Uncertainty,𝑘(𝜆, 𝜆′), is high.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Part 2: Hands On with Amazon SageMaker AMT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Contenu connexe

Similaire à Sagemaker Automatic model tuning

Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Codiax
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
Amazon Web Services
 

Similaire à Sagemaker Automatic model tuning (20)

NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Deep Learning with MXNet
Deep Learning with MXNetDeep Learning with MXNet
Deep Learning with MXNet
 
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019 RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech Session
 
Using Amazon SageMaker to build, train, and deploy your ML Models
Using Amazon SageMaker to build, train, and deploy your ML ModelsUsing Amazon SageMaker to build, train, and deploy your ML Models
Using Amazon SageMaker to build, train, and deploy your ML Models
 
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNet
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Machine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon SagemakerMachine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon Sagemaker
 
Machine Learning - From Notebook to Production with Amazon Sagemaker
Machine Learning - From Notebook to Production with Amazon SagemakerMachine Learning - From Notebook to Production with Amazon Sagemaker
Machine Learning - From Notebook to Production with Amazon Sagemaker
 
Deep Learning Workshop
Deep Learning WorkshopDeep Learning Workshop
Deep Learning Workshop
 
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
 
Using Amazon SageMaker to Build, Train, and Deploy Your ML Models
Using Amazon SageMaker to Build, Train, and Deploy Your ML ModelsUsing Amazon SageMaker to Build, Train, and Deploy Your ML Models
Using Amazon SageMaker to Build, Train, and Deploy Your ML Models
 
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWSAWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Sagemaker Automatic model tuning

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Soji Adeshina, Machine Learning Engineer, Amazon AI SageMaker Automatic Model Tuning
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roadmap • Hyperparameters • Search Based HPO • Bayesian HPO • Amazon SageMaker AMT
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hyperparameters
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is a Hyperparameter • Hyperparameter = algorithm parameter • Training algorithm accepts hyperparameter(s) and returns model parameters • It affects how an algorithm behaves during model training process • “Any decision an algorithm author can’t make for you”
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples of Hyperparameters Model: Number of layers: 1, 2, 3, … Activation functions: Sigmoid, tanh, RELU, … Optimization: Method: SGD, Adam, AdaGrad, … Learning Rate: 0.01 to 2 Data: Batch Size: 8, 16, 32 … Augmentation: Resize, Normalize, Color Jitter, …
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model vs Hyperparameter Optimization 𝑙∗ = min 𝜃 ℎ(𝜃) ℎ(𝜃) = min 𝑤 𝑓(𝑤|𝑋, 𝑦, 𝜃) Optimize Model params (𝑤) Optimize Hyperparams (𝜃)
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Blackbox Optimization • We aim to minimize the objective function . • We have no knowledge of what the objective function is. • We don’t have access to the gradients of the objective function. • All we know is what goes into the function and what comes out. ℎ( 𝜃)
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Search Based HPO
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search - Shortcomings • In grid search the user specifies a finite set of values for each hyperparameter. • Each hyperparam increases degree of freedom and results in combinatorial explosion. • Assume each hyper-param has 5 options e.g. Learning Rate: 0, 0.5, 1, 1.5, 2 1 HP = 5 combinations 2 HPs = 5*5 = 25 combinations 3 HPs = 5*5*5 = 125 combinations … 10 HPs = 5^10 = 9,765,625 combinations N HPs = 5^N combinations
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search - Shortcomings Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5 Some hyper-params more important than others.
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5 Wasted Compute
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Random Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bayesian HPO
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model based Bayesian HPO Learning Rate Activation RELU 0 20.5 1 1.5 ℎ 𝜃 : 𝑡𝑟𝑢𝑒 (ℎ𝑖𝑑𝑑𝑒𝑛) 𝐷: 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ℎ′ 𝜃 : 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑐: 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 • ℎ 𝜃 is expensive so use an approximation or surrogate model ℎ′(𝜃) instead • Use an acquisition function 𝔼[𝐼 𝜆 ] to selects next points
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Keeps track of previous evaluations and infers expected behaviour. • It is Bayesian in a sense that the surrogate model model uses prior probability distribution to make predictions about the posterior. 𝑃 𝑌 𝑋 ∝ 𝑃 𝑌 𝑋 𝑃(𝑌) • Improves our beliefs about the objective function by applying iterative learning. Model based Bayesian HPO
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Surrogate Model - Gaussian Process • Gaussian Process is a distribution over functions each of which returns mean and variance of a Gaussian distribution. 𝑓: 𝒳 → ℝ 𝑓(𝑋𝑡1 ), 𝑓(𝑋𝑡2 ), … , 𝑓(𝑋𝑡 𝑛 )~𝒩(𝝁, 𝜮) • Gaussian distribution is a distribution of random numbers that is described by mean 𝜇 and variance 𝜎2 . • Each distribution corresponds to a set of hyperparameters Λ; 𝜆𝑖 𝜖Λ = 𝑖=1 𝑛 Λ 𝑖 • A Gaussian process is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′). 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′))
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gaussian Process for model of model loss
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Covariance Matrix Similarity between 2 points: controls ‘smoothness’. SageMaker uses Matérn kernel with 𝜐 = 5/2
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Acquisition Function • Given posterior distribution of functions… 𝔼 𝕀 𝜆 = 𝔼[max(𝑓_ min −𝑌, 0)] • Used as criteria for selecting next candidate hyperparams for evaluation. • Often depends on the best hyperparams seen so far in search. • Controls exploration vs exploitation in search.
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Acquisition Function: Expected Improvement 0.3 0.2 𝐸𝐼 𝑥1 > 𝐸𝐼(𝑥2) 𝑥1 𝑥2 1 -1 70% Current best
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Acquisition Function • Expected improvement [maximining the dashed line] has two components: • One is dependent on −𝜇 [solid line] • The other dependent on uncertainty or variance 𝑘(𝜆, 𝜆′) [blue line] • There fore we maximize the acquisition function wherever: • Mean, 𝜇, is low, or • Uncertainty,𝑘(𝜆, 𝜆′), is high.
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Part 2: Hands On with Amazon SageMaker AMT
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Notes de l'éditeur

  1. Various data types: Continuous, Integer, Categorical Various ranges
  2. 𝑓 and ℎ return the loss: cross entropy loss. Can find gradient of 𝜆: 1st order. Can’t find gradient of 𝜃: 0th order. Often no closed form.
  3. Underlying true relationship is hidden. Cost time and money to evaluate. Must sample.
  4. Discretize
  5. 1000 years for model that takes 1h to train
  6. Often some hyper-params more important than others.
  7. Wasted compute.
  8. Can limit number of samples
  9. Use quick model to choose next point to evaluate. Use acquisition function to choose next point.
  10. Assumes similar points give similar results: Co-variance function. Gives probabilistic estimates. Closed form expressions for mean and variance.
  11. Most common is Squared Exponential Kernel (Gaussian radial basis function). Matérn generalizes this. V=Inf gives Squared Exponential Kernel, Infinitely differentiable. V=5/2 Can differentiate twice but not 3 times) – good default, works on wide range of problems, robust Simplifications for these cases.