SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic Model Tuning
Using Amazon SageMaker
Cyrus M Vahid
Principal Evangelist – AWS AI Labs
CyrusMV@amazon.com
A I M 4 1 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Hyperparameters
Search-based HPO
Bayesian HPO
Amazon SageMaker Automatic
Model Tuning (AMT)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is a hyperparameter
• Hyperparameter = algorithm parameter
• It affects how an algorithm behaves during model learning
process
• “Any decision an algorithm author can’t make for you”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Examples of hyperparameters
Model
Number of layers: 1, 2, 3, . . .
Activation functions: Sigmoid, tanh, RELU, . . .
Optimization
Method: SGD, Adam, AdaGrad, . . .
Learning rate: 0.01 to 2
Data
Batch size: 8, 16, 32, . . .
Augmentation: Resize, Normalize, Color Jitter, . . .
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing loss function for a model
gradient
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing loss function for a model
gradient
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimizing loss function for a model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blackbox optimization
• We aim to minimize the objective function 𝐽
• We have no knowledge of what the objective function is
• We do have access to the gradients of the objective function
• All we know is what goes into the function and what comes out
𝐽(𝑊, 𝜃|𝜆)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grid search
Learning rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Curse of dimensionality
• In grid search the user specifies a finite set of values for each hyperparameter and then
• Each hyperparameter increases degree of freedom and results in combinatorial explosion
• Example
• One HP and five options  5 combinations
• Two HP and five options each  5*5 combinations
• Ten HO and five options each  5*5*5*5*5*5*5*5*5*5 = 510 ≈ 106 combinations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Random grid search
• Random search samples configurations at random until a certain budget 𝐵 for the search is
exhausted
• The number of evaluations for 𝑁 hyperparameters is only 𝐵1/𝑁 vs. 𝐵 for grid search
Bergstra, Bengio. 2012. “Random Search for Hyper-Parameter Optimization”
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model optimization
• The aim of model optimization is to find a set of model parameters,
𝜃∗
, that returns the most accurate results for a given dataset
Optimize model-params: 𝜃
𝜃∗ = 𝑎𝑟𝑔 min
𝜃𝜖Θ
𝑓(𝜆, 𝜃, 𝒟𝑡 𝒟 𝑣)
𝒟𝑡, 𝒟 𝑣 ∼ 𝒟
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hyperparameter optimization
• The aim of HPO is to find a set of hyperparameters, 𝜆∗
, that given
algorithm 𝒜 and dataset 𝒟, returns the most accurate results
Optimize model-params: 𝜃
𝜃∗ = 𝑎𝑟𝑔 min
𝜃𝜖Θ
𝑓(𝜆, 𝜃, 𝒟𝑡 𝒟 𝑣)
𝒟𝑡, 𝒟 𝑣 ∼ 𝒟
Optimize model-params: 𝜆∗
𝜆∗ = 𝑎𝑟𝑔 min
𝜆𝜖Λ
𝔼 𝒟 𝑡,𝒟 𝑣 ∼𝒟 𝑉(ℒ, 𝒜 𝜆, 𝒟𝑡, 𝒟 𝑣)
𝑊ℎ𝑒𝑟𝑒 𝑉 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑠 𝑡ℎ𝑒 𝑙𝑜𝑠𝑠 𝑜𝑓 𝑎 𝑚𝑜𝑑𝑒𝑙 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑏𝑦
𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝒜 𝑤𝑖𝑡ℎ ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝜆 𝑜𝑛 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝒟
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bayesian optimization
• Keeps track of previous evaluations and infers expected behavior
• It is Bayesian in a sense that the surrogate model model uses prior
probability distribution to make predictions about the posterior
𝑃 𝑌 𝑋 ∝ 𝑃 𝑌 𝑋 𝑃(𝑌)
• An acquisition function selects next points to be evaluated. Commonly
expected improvement
𝔼 𝕀 𝜆 = 𝔼[max(𝑓_ min −𝑌, 0)]
• As per iteration improves our beliefs about the objective function by
applying iterative learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bayesian optimization
1: 𝑓𝑜𝑟 𝑛 = 1, 2, 3, … 𝑑𝑜
2: 𝑠𝑒𝑙𝑒𝑐𝑡 𝑛𝑒𝑤 𝑋 𝑛+1 𝑏𝑦 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑖𝑛𝑔 𝑎𝑐𝑞𝑢𝑖𝑠𝑖𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝛼
𝑋 𝑛+1 = arg max
𝑥
𝛼 𝑥; 𝒟
3: 𝑞𝑢𝑒𝑟𝑦 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡𝑜 𝑜𝑏𝑡𝑎𝑖𝑛 𝑦 𝑛+1
4: augment data 𝒟 𝑛+1 = 𝒟 𝑛, 𝑥 𝑛+1, 𝑦 𝑛+1
5: 𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑠𝑢𝑟𝑟𝑜𝑔𝑎𝑡𝑒 𝑚𝑜𝑑𝑒𝑙
https://ieeexplore.ieee.org/document/7352306
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gaussian process
• Gaussian distribution is a series of random numbers that is described
by mean 𝜇 and variance 𝜎2
• Gaussian process is a distribution over functions, each of which returns
mean and variance of a Gaussian distribution
𝑓: 𝒳 → ℝ
𝑓(𝑋𝑡1
), 𝑓(𝑋𝑡2
), … , 𝑓(𝑋𝑡 𝑛
)~𝒩(𝝁, 𝜮)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gaussian process – continued . . .
• Each distribution corresponds to a set of hyperparameters Λ;
𝜆𝑖 𝜖Λ = ς𝑖=1
𝑛
Λ 𝑖
• A Gaussian process is fully specified by a mean 𝜇 𝜆 and a covariance
function 𝑘(𝜆, 𝜆′)
𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′))
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gaussian process for model of model loss
• A Gaussian process 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′)) is fully specified by a mean 𝜇 𝜆 and
a covariance function 𝑘(𝜆, 𝜆′)
𝑓(𝑋𝑡1
), 𝑓(𝑋𝑡2
), … , 𝑓(𝑋𝑡 𝑛
)~𝒩(𝝁, 𝜮)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gaussian process for model of model loss
• A Gaussian process 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′)) is fully specified by a mean 𝜇 𝜆 and
a covariance function 𝑘(𝜆, 𝜆′)
𝑓(𝑋𝑡1
), 𝑓(𝑋𝑡2
), … , 𝑓(𝑋𝑡 𝑛
)~𝒩(𝝁, 𝜮)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gaussian process for model of model loss
• A Gaussian process 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′)) is fully specified by a mean 𝜇 𝜆 and
a covariance function 𝑘(𝜆, 𝜆′)
𝑓(𝑋𝑡1
), 𝑓(𝑋𝑡2
), … , 𝑓(𝑋𝑡 𝑛
)~𝒩(𝝁, 𝜮)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Covariant function
• The mean function is usually
constant in Bayesian optimization
• The quality of the Gaussian
process is solely dependent on
the quality of the covariant
function
• A common choice is the Matern
5/2 kernel, with its
hyperparameters integrated out
by Markov chain Monte Carlo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Acquisition function: Probability of improvement
70%
Current best
50% 70%
𝑃𝐼 𝑥1 < 𝑃𝐼(𝑥2)
𝑥1
𝑥2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Acquisition function: Expected improvement
0.3 0.2
𝐸𝐼 𝑥1 > 𝐸𝐼(𝑥2)
𝑥1
𝑥2
1
-1
70%
Current best
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using acquisition function
• Expected improvement
[maximining the dashed line] has
two components
• One is dependent on −𝜇 [solid line]
• The other dependent on uncertainty or
variance 𝑘(𝜆, 𝜆′) [blue line]
• Therefore we maximize the
acquisition function wherever
• Mean, 𝜇, is low, or
• Uncertainty,𝑘(𝜆, 𝜆′), is high
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Parallelism through Thompson sampling (TS)
• TS is a decades-old heuristic for
choosing actions that maximize
the expected reward
𝑎 𝑛+1 = arg max
𝑎
𝑓𝑤 𝑎 , where ෥𝑤 ~ 𝑝(𝑤|𝒟 𝑛)
• Sequentiality makes sampling
computationally interactable
• We can use multiple workers for
parallelizing TS
~~
https://www.cs.cmu.edu/~kkandasa/misc/automl-slides.pdf
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic Model Tuning — Workflow
Training jobHyperparameter
tuning job
Tuning strategy
Objective
metrics
Training job
Training job
Training job
Clients
(console, notebook, IDEs, CLI)
Model name
model1
model2
…
Objective
metric
0.8
0.75
…
eta
0.07
0.09
…
max_depth
6
5
…
…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic Model Tuning — Architecture
Training code
• Factorization machine
• Regression/classification
• Principal component analysis
• K-means clustering
• XGBoost
• DeepAR
• And more
Amazon SageMaker
built-in algorithms
Bring Your Own Script (prebuilt containers) Bring Your Own Algorithm
Fetch training data Save model artifacts
Fully
managed –
Secured–
Automatic Model Tuning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Input mapping
• Continuous – e.g., learning rate
• Integral – e.g., number of epochs
• Categorical – e.g., type of activation
0,2 → 0,1
1,2, … , 50 → [0,1]
𝑆𝑖𝑔𝑚𝑜𝑖𝑑, 𝑅𝐸𝐿𝑈, 𝑡𝑎𝑛ℎ → 0,1 3
Create a unit hypercube from all input hyperparameters
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cyrus Vahid
cyrusmv@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Contenu connexe

Tendances

Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...Amazon Web Services
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Amazon Web Services
 
Governance at Scale (SEC349-R1) - AWS re:Invent 2018
Governance at Scale (SEC349-R1) - AWS re:Invent 2018Governance at Scale (SEC349-R1) - AWS re:Invent 2018
Governance at Scale (SEC349-R1) - AWS re:Invent 2018Amazon Web Services
 
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...Amazon Web Services
 
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...Amazon Web Services
 
Monetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdf
Monetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdfMonetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdf
Monetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdfAmazon Web Services
 
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...Amazon Web Services
 
Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...
Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...
Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...Amazon Web Services
 
Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...
Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...
Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...Amazon Web Services
 
Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...
Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...
Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...Amazon Web Services
 
Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...
Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...
Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...Amazon Web Services
 
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Amazon Web Services
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon Web Services
 
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...Amazon Web Services
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Amazon Web Services
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Amazon Web Services
 
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018Amazon Web Services
 
Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...
Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...
Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...Amazon Web Services
 

Tendances (20)

Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
Optimize Your SQL Server Licenses on Amazon Web Services (DAT210) - AWS re:In...
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
 
Governance at Scale (SEC349-R1) - AWS re:Invent 2018
Governance at Scale (SEC349-R1) - AWS re:Invent 2018Governance at Scale (SEC349-R1) - AWS re:Invent 2018
Governance at Scale (SEC349-R1) - AWS re:Invent 2018
 
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
 
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
Using Amazon Mechanical Turk to Crowdsource Data Collection (AIM359) - AWS re...
 
Monetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdf
Monetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdfMonetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdf
Monetize Your Mobile App with Amazon Mobile Ads (MOB311) - AWS reInvent 2018.pdf
 
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
 
Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...
Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...
Building a Governance, Risk, and Compliance Strategy with AWS (WPS204) - AWS ...
 
Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...
Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...
Serverless Application Debugging and Delivery Best Practices (DEV307-R1) - AW...
 
Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...
Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...
Building Your First Graph Application with Amazon Neptune - Workshop (DAT310-...
 
Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...
Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...
Provide Faster, Scalable Solutions to Support Research Use Cases with AWS (WP...
 
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
Optimize Amazon EC2 Instances, AWS Fargate Containers, & Lambda Functions (CM...
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
 
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
 
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
Use Auto Scaling, Spot Pricing, and More Expert Strategies (ANT347) - AWS re:...
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
 
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
 
Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...
Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...
Neptune Performance Tuning: Get the Best out of Amazon Neptune (DAT360) - AWS...
 

Similaire à Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018

Sagemaker Automatic model tuning
Sagemaker Automatic model tuningSagemaker Automatic model tuning
Sagemaker Automatic model tuningSoji Adeshina
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNetApache MXNet
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning FundamentalsSigOpt
 
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
Multivariate Time Series
Multivariate Time SeriesMultivariate Time Series
Multivariate Time SeriesApache MXNet
 
An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)Julien SIMON
 
An Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMakerAn Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMakerAmazon Web Services
 
Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!AWS Germany
 
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018Yotam Yarden
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Build, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage MakerBuild, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage MakerAWS User Group Bengaluru
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Amazon Web Services
 
Quickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scaleQuickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scaleAWS Germany
 
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...Amazon Web Services
 
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018Amazon Web Services
 
Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...
Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...
Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...Amazon Web Services
 

Similaire à Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018 (20)

Sagemaker Automatic model tuning
Sagemaker Automatic model tuningSagemaker Automatic model tuning
Sagemaker Automatic model tuning
 
Deep Learning with MXNet
Deep Learning with MXNetDeep Learning with MXNet
Deep Learning with MXNet
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
 
Deep ar presentation
Deep ar presentationDeep ar presentation
Deep ar presentation
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning Fundamentals
 
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - ...
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Multivariate Time Series
Multivariate Time SeriesMultivariate Time Series
Multivariate Time Series
 
An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)
 
An Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMakerAn Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMaker
 
Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!Build Your Recommendation Engine on AWS Today!
Build Your Recommendation Engine on AWS Today!
 
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
Build Your Recommendation Engine on AWS Today - AWS Summit Berlin 2018
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Build, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage MakerBuild, train and deploy your ML models with Amazon Sage Maker
Build, train and deploy your ML models with Amazon Sage Maker
 
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
Get the Most out of Your Amazon Elasticsearch Service Domain (ANT334-R1) - AW...
 
Quickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scaleQuickly and easily build, train, and deploy machine learning models at any scale
Quickly and easily build, train, and deploy machine learning models at any scale
 
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
 
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
Supercharge Your ML Model with SageMaker - AWS Summit Sydney 2018
 
Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...
Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...
Sequence-to-Sequence Modeling with Apache MXNet, Sockeye, and Amazon SageMake...
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Automatic Model Tuning Using Amazon SageMaker (AIM412) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatic Model Tuning Using Amazon SageMaker Cyrus M Vahid Principal Evangelist – AWS AI Labs CyrusMV@amazon.com A I M 4 1 2
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Hyperparameters Search-based HPO Bayesian HPO Amazon SageMaker Automatic Model Tuning (AMT)
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is a hyperparameter • Hyperparameter = algorithm parameter • It affects how an algorithm behaves during model learning process • “Any decision an algorithm author can’t make for you”
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Examples of hyperparameters Model Number of layers: 1, 2, 3, . . . Activation functions: Sigmoid, tanh, RELU, . . . Optimization Method: SGD, Adam, AdaGrad, . . . Learning rate: 0.01 to 2 Data Batch size: 8, 16, 32, . . . Augmentation: Resize, Normalize, Color Jitter, . . .
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimizing loss function for a model gradient
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimizing loss function for a model gradient
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimizing loss function for a model
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Blackbox optimization • We aim to minimize the objective function 𝐽 • We have no knowledge of what the objective function is • We do have access to the gradients of the objective function • All we know is what goes into the function and what comes out 𝐽(𝑊, 𝜃|𝜆)
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Grid search Learning rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Curse of dimensionality • In grid search the user specifies a finite set of values for each hyperparameter and then • Each hyperparameter increases degree of freedom and results in combinatorial explosion • Example • One HP and five options  5 combinations • Two HP and five options each  5*5 combinations • Ten HO and five options each  5*5*5*5*5*5*5*5*5*5 = 510 ≈ 106 combinations
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Random grid search • Random search samples configurations at random until a certain budget 𝐵 for the search is exhausted • The number of evaluations for 𝑁 hyperparameters is only 𝐵1/𝑁 vs. 𝐵 for grid search Bergstra, Bengio. 2012. “Random Search for Hyper-Parameter Optimization”
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Model optimization • The aim of model optimization is to find a set of model parameters, 𝜃∗ , that returns the most accurate results for a given dataset Optimize model-params: 𝜃 𝜃∗ = 𝑎𝑟𝑔 min 𝜃𝜖Θ 𝑓(𝜆, 𝜃, 𝒟𝑡 𝒟 𝑣) 𝒟𝑡, 𝒟 𝑣 ∼ 𝒟
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hyperparameter optimization • The aim of HPO is to find a set of hyperparameters, 𝜆∗ , that given algorithm 𝒜 and dataset 𝒟, returns the most accurate results Optimize model-params: 𝜃 𝜃∗ = 𝑎𝑟𝑔 min 𝜃𝜖Θ 𝑓(𝜆, 𝜃, 𝒟𝑡 𝒟 𝑣) 𝒟𝑡, 𝒟 𝑣 ∼ 𝒟 Optimize model-params: 𝜆∗ 𝜆∗ = 𝑎𝑟𝑔 min 𝜆𝜖Λ 𝔼 𝒟 𝑡,𝒟 𝑣 ∼𝒟 𝑉(ℒ, 𝒜 𝜆, 𝒟𝑡, 𝒟 𝑣) 𝑊ℎ𝑒𝑟𝑒 𝑉 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑠 𝑡ℎ𝑒 𝑙𝑜𝑠𝑠 𝑜𝑓 𝑎 𝑚𝑜𝑑𝑒𝑙 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑏𝑦 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝒜 𝑤𝑖𝑡ℎ ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝜆 𝑜𝑛 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝒟
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bayesian optimization • Keeps track of previous evaluations and infers expected behavior • It is Bayesian in a sense that the surrogate model model uses prior probability distribution to make predictions about the posterior 𝑃 𝑌 𝑋 ∝ 𝑃 𝑌 𝑋 𝑃(𝑌) • An acquisition function selects next points to be evaluated. Commonly expected improvement 𝔼 𝕀 𝜆 = 𝔼[max(𝑓_ min −𝑌, 0)] • As per iteration improves our beliefs about the objective function by applying iterative learning
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bayesian optimization 1: 𝑓𝑜𝑟 𝑛 = 1, 2, 3, … 𝑑𝑜 2: 𝑠𝑒𝑙𝑒𝑐𝑡 𝑛𝑒𝑤 𝑋 𝑛+1 𝑏𝑦 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑖𝑛𝑔 𝑎𝑐𝑞𝑢𝑖𝑠𝑖𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝛼 𝑋 𝑛+1 = arg max 𝑥 𝛼 𝑥; 𝒟 3: 𝑞𝑢𝑒𝑟𝑦 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡𝑜 𝑜𝑏𝑡𝑎𝑖𝑛 𝑦 𝑛+1 4: augment data 𝒟 𝑛+1 = 𝒟 𝑛, 𝑥 𝑛+1, 𝑦 𝑛+1 5: 𝑢𝑝𝑑𝑎𝑡𝑒 𝑡ℎ𝑒 𝑠𝑢𝑟𝑟𝑜𝑔𝑎𝑡𝑒 𝑚𝑜𝑑𝑒𝑙 https://ieeexplore.ieee.org/document/7352306
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gaussian process • Gaussian distribution is a series of random numbers that is described by mean 𝜇 and variance 𝜎2 • Gaussian process is a distribution over functions, each of which returns mean and variance of a Gaussian distribution 𝑓: 𝒳 → ℝ 𝑓(𝑋𝑡1 ), 𝑓(𝑋𝑡2 ), … , 𝑓(𝑋𝑡 𝑛 )~𝒩(𝝁, 𝜮)
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gaussian process – continued . . . • Each distribution corresponds to a set of hyperparameters Λ; 𝜆𝑖 𝜖Λ = ς𝑖=1 𝑛 Λ 𝑖 • A Gaussian process is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′) 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′))
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gaussian process for model of model loss • A Gaussian process 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′)) is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′) 𝑓(𝑋𝑡1 ), 𝑓(𝑋𝑡2 ), … , 𝑓(𝑋𝑡 𝑛 )~𝒩(𝝁, 𝜮)
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gaussian process for model of model loss • A Gaussian process 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′)) is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′) 𝑓(𝑋𝑡1 ), 𝑓(𝑋𝑡2 ), … , 𝑓(𝑋𝑡 𝑛 )~𝒩(𝝁, 𝜮)
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gaussian process for model of model loss • A Gaussian process 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′)) is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′) 𝑓(𝑋𝑡1 ), 𝑓(𝑋𝑡2 ), … , 𝑓(𝑋𝑡 𝑛 )~𝒩(𝝁, 𝜮)
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Covariant function • The mean function is usually constant in Bayesian optimization • The quality of the Gaussian process is solely dependent on the quality of the covariant function • A common choice is the Matern 5/2 kernel, with its hyperparameters integrated out by Markov chain Monte Carlo
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Acquisition function: Probability of improvement 70% Current best 50% 70% 𝑃𝐼 𝑥1 < 𝑃𝐼(𝑥2) 𝑥1 𝑥2
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Acquisition function: Expected improvement 0.3 0.2 𝐸𝐼 𝑥1 > 𝐸𝐼(𝑥2) 𝑥1 𝑥2 1 -1 70% Current best
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using acquisition function • Expected improvement [maximining the dashed line] has two components • One is dependent on −𝜇 [solid line] • The other dependent on uncertainty or variance 𝑘(𝜆, 𝜆′) [blue line] • Therefore we maximize the acquisition function wherever • Mean, 𝜇, is low, or • Uncertainty,𝑘(𝜆, 𝜆′), is high
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Parallelism through Thompson sampling (TS) • TS is a decades-old heuristic for choosing actions that maximize the expected reward 𝑎 𝑛+1 = arg max 𝑎 𝑓𝑤 𝑎 , where ෥𝑤 ~ 𝑝(𝑤|𝒟 𝑛) • Sequentiality makes sampling computationally interactable • We can use multiple workers for parallelizing TS ~~ https://www.cs.cmu.edu/~kkandasa/misc/automl-slides.pdf
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatic Model Tuning — Workflow Training jobHyperparameter tuning job Tuning strategy Objective metrics Training job Training job Training job Clients (console, notebook, IDEs, CLI) Model name model1 model2 … Objective metric 0.8 0.75 … eta 0.07 0.09 … max_depth 6 5 … …
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatic Model Tuning — Architecture Training code • Factorization machine • Regression/classification • Principal component analysis • K-means clustering • XGBoost • DeepAR • And more Amazon SageMaker built-in algorithms Bring Your Own Script (prebuilt containers) Bring Your Own Algorithm Fetch training data Save model artifacts Fully managed – Secured– Automatic Model Tuning
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Input mapping • Continuous – e.g., learning rate • Integral – e.g., number of epochs • Categorical – e.g., type of activation 0,2 → 0,1 1,2, … , 50 → [0,1] 𝑆𝑖𝑔𝑚𝑜𝑖𝑑, 𝑅𝐸𝐿𝑈, 𝑡𝑎𝑛ℎ → 0,1 3 Create a unit hypercube from all input hyperparameters
  • 34. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cyrus Vahid cyrusmv@amazon.com
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.