SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
USING BAYESIAN OPTIMIZATION
TO TUNE MACHINE LEARNING MODELS
Scott Clark
Co-founder and CEO of SigOpt
scott@sigopt.com @DrScottClark
TRIAL AND ERROR WASTES EXPERT TIME
Machine Learning is extremely
powerful
Tuning Machine Learning systems
is extremely non-intuitive
UNRESOLVED PROBLEM IN ML
https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3
What is the most important unresolved problem in machine learning?
“...we still don't really know why some configurations of deep neural networks work
in some case and not others, let alone having a more or less automatic approach
to determining the architectures and the hyperparameters.”
Xavier Amatriain, VP Engineering at Quora
(former Director of Research at Netflix)
LOTS OF TUNABLE PARAMETERS
COMMON APPROACH
Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012
1. Random search or grid search
2. Expert defined grid search near “good” points
3. Refine domain and repeat steps - “grad student descent”
COMMON APPROACH
● Expert intensive
● Computationally intensive
● Finds potentially local optima
● Does not fully exploit useful information
Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012
1. Random search or grid search
2. Expert defined grid search near “good” points
3. Refine domain and repeat steps - “grad student descent”
… the challenge of how to collect information as efficiently
as possible, primarily for settings where collecting information
is time consuming and expensive.
Prof. Warren Powell - Princeton
What is the most efficient way to collect information?
Prof. Peter Frazier - Cornell
How do we make the most money, as fast as possible?
Me - @DrScottClark
OPTIMAL LEARNING
● Optimize some Overall Evaluation Criterion (OEC)
○ Loss, Accuracy, Likelihood, Revenue
● Given tunable parameters
○ Hyperparameters, feature parameters
● In an efficient way
○ Sample function as few times as possible
○ Training on big data is expensive
BAYESIAN GLOBAL OPTIMIZATION
Details at https://sigopt.com/research
Grid Search Random Search
...
...
...
...
...
...
GRID SEARCH SCALES EXPONENTIALLY
4D
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
BAYESIAN OPT SCALES LINEARLY
6D
HOW DOES IT FIT IN THE STACK?
Big Data
Machine
Learning
Models
with tunable
parameters
Optimally suggests
new parameters
HOW DOES IT FIT IN THE STACK?
Objective Metric
New parameters
Big Data
Machine
Learning
Models
with tunable
parameters
Optimally suggests
new parameters
HOW DOES IT FIT IN THE STACK?
Objective Metric
New parameters
Better
Models
Big Data
Machine
Learning
Models
with tunable
parameters
QUICK EXAMPLES
Optimally suggests
new parameters
Ex: LOAN CLASSIFICATION (xgboost)
Prediction Accuracy
New parameters
Better
AccuracyLoan
Applications
Default
Prediction
with tunable
ML parameters
● Income
● Credit Score
● Loan Amount
COMPARATIVE PERFORMANCE
● 8.2% Better
Accuracy than
baseline
● 100x faster
than standard
tuning methods
Accuracy
Cost
Grid Search
Random Search
Iterations
AUC
.698
.690
.683
.675
1,00010,000100,000
EXAMPLE: ALGORITHMIC TRADING
Expected Revenue
New parameters
Higher
Returns
Market Data
Trading
Strategy
with tunable
weights and
thresholds
● Closing Prices
● Day of Week
● Market Volatility
Optimally suggests
new parameters
COMPARATIVE PERFORMANCE
Standard Method
Expert
● 200% Higher
model returns
than expert
● 10x faster
than standard
methods
HOW BAYESIAN OPTIMIZATION WORKS
1. Build Gaussian Process (GP) with points
sampled so far
2. Optimize the fit of the GP (covariance
hyperparameters)
3. Find the point(s) of highest Expected
Improvement within parameter domain
4. Return optimal next best point(s) to sample
HOW DOES IT WORK?
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
EXTENDED EXAMPLE:
EFFICIENTLY BUILDING CONVNETS
● Classify house numbers
with more training data and
more sophisticated model
PROBLEM
● TensorFlow makes it easier to design DNN architectures,
but what structure works best on a given dataset?
CONVNET STRUCTURE
● Per parameter
adaptive SGD variants
like RMSProp and
Adagrad seem to
work best
● Still require careful
selection of learning
rate (α), momentum
(β), decay (γ) terms
STOCHASTIC GRADIENT DESCENT
● Comparison of several RMSProp SGD parametrizations
● Not obvious which configurations will work best on a
given dataset without experimentation
STOCHASTIC GRADIENT DESCENT
RESULTS
● Avg Hold out accuracy after 5 optimization runs
consisting of 80 objective evaluations
● Optimized single 80/20 CV fold on training set, ACC
reported on test set as hold out
PERFORMANCE
SigOpt
(TensorFlow CNN)
Rnd Search
(TensorFlow CNN)
No Tuning
(sklearn RF)
No Tuning
(TensorFlow CNN)
Hold Out
ACC
0.8130 (+315.2%) 0.5690 0.5278 0.1958
COST ANALYSIS
Model Performance
(CV Acc. threshold)
Random
Search Cost
SigOpt
Cost
SigOpt Cost
Savings
Potential Savings In
Production (50 GPUs)
87 % $275 $42 84% $12,530
85 % $195 $23 88% $8,750
80 % $46 $21 55% $1,340
70 % $29 $21 27% $400
EXAMPLE: TUNING DNN CLASSIFIERS
CIFAR10 Dataset
● Photos of objects
● 10 classes
● Metric: Accuracy
○ [0.1, 1.0]
Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
● All convolutional neural network
● Multiple convolutional and dropout layers
● Hyperparameter optimization mixture of
domain expertise and grid search (brute force)
USE CASE: ALL CONVOLUTIONAL
http://arxiv.org/pdf/1412.6806.pdf
MANY TUNABALE PARAMETERS...
● epochs: “number of epochs to run fit” - int [1,∞]
● learning rate: influence on current value of weights at each step - double (0, 1]
● momentum coefficient: “the coefficient of momentum” - double (0, 1]
● weight decay: parameter affecting how quickly weight decays - double (0, 1]
● depth: parameter affecting number of layers in net - int [1, 20(?)]
● gaussian scale: standard deviation of initialization normal dist. - double (0,∞]
● momentum step change: mul. amount to decrease momentum - double (0, 1]
● momentum step schedule start: epoch to start decreasing momentum - int [1,∞]
● momentum schedule width: epoch stride for decreasing momentum - int [1,∞]
...optimal values non-intuitive
COMPARATIVE PERFORMANCE
● Expert baseline: 0.8995
○ (using neon)
● SigOpt best: 0.9011
○ 1.6% reduction in
error rate
○ No expert time
wasted in tuning
USE CASE: DEEP RESIDUAL
http://arxiv.org/pdf/1512.03385v1.pdf
● Explicitly reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
● Variable depth
● Hyperparameter optimization mixture of domain expertise and grid
search (brute force)
COMPARATIVE PERFORMANCE
Standard Method
● Expert baseline: 0.9339
○ (from paper)
● SigOpt best: 0.9436
○ 15% relative error
rate reduction
○ No expert time
wasted in tuning
Questions?
scott@sigopt.com
@DrScottClark
https://sigopt.com
@SigOpt
TRY OUT SIGOPT FOR FREE
https://sigopt.com/getstarted
● Quick example and intro to SigOpt
● No signup required
● Visual and code examples
MORE EXAMPLES
https://github.com/sigopt/sigopt-examples
Examples of using SigOpt in a variety of languages and contexts.
Tuning Machine Learning Models (with code)
A comparison of different hyperparameter optimization methods.
Using Model Tuning to Beat Vegas (with code)
Using SigOpt to tune a model for predicting basketball scores.
Learn more about the technology behind SigOpt at
https://sigopt.com/research
GPs: FUNCTIONAL VIEW
overfit good fit underfit
GPs: FITTING THE GP
USE CASE: CLASSIFICATION MODELS
Machine Learning models have many
non-intuitive tunable hyperparameters
Problem:
Before
Standard methods use high
resources for low performance
After
SigOpt finds better parameters
with 10x fewer evaluations
than standard methods
USE CASE: SIMULATIONS
BETTER RESULTS
+450% FASTER
Expensive simulations require
high resources for every run
Problem:
Before
Brute force tuning approach
prohibitively expensive
After
SigOpt finds better results with
fewer required simulations

Contenu connexe

Similaire à Using Bayesian Optimization to Tune Machine Learning Models

Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
 

Similaire à Using Bayesian Optimization to Tune Machine Learning Models (20)

SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
C3 w1
C3 w1C3 w1
C3 w1
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic trading
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the Untunable
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Adopting software design practices for better machine learning
Adopting software design practices for better machine learningAdopting software design practices for better machine learning
Adopting software design practices for better machine learning
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimization
 

Dernier

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Dernier (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Using Bayesian Optimization to Tune Machine Learning Models

  • 1. USING BAYESIAN OPTIMIZATION TO TUNE MACHINE LEARNING MODELS Scott Clark Co-founder and CEO of SigOpt scott@sigopt.com @DrScottClark
  • 2. TRIAL AND ERROR WASTES EXPERT TIME Machine Learning is extremely powerful Tuning Machine Learning systems is extremely non-intuitive
  • 3. UNRESOLVED PROBLEM IN ML https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3 What is the most important unresolved problem in machine learning? “...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.” Xavier Amatriain, VP Engineering at Quora (former Director of Research at Netflix)
  • 4. LOTS OF TUNABLE PARAMETERS
  • 5. COMMON APPROACH Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012 1. Random search or grid search 2. Expert defined grid search near “good” points 3. Refine domain and repeat steps - “grad student descent”
  • 6. COMMON APPROACH ● Expert intensive ● Computationally intensive ● Finds potentially local optima ● Does not fully exploit useful information Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012 1. Random search or grid search 2. Expert defined grid search near “good” points 3. Refine domain and repeat steps - “grad student descent”
  • 7. … the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive. Prof. Warren Powell - Princeton What is the most efficient way to collect information? Prof. Peter Frazier - Cornell How do we make the most money, as fast as possible? Me - @DrScottClark OPTIMAL LEARNING
  • 8. ● Optimize some Overall Evaluation Criterion (OEC) ○ Loss, Accuracy, Likelihood, Revenue ● Given tunable parameters ○ Hyperparameters, feature parameters ● In an efficient way ○ Sample function as few times as possible ○ Training on big data is expensive BAYESIAN GLOBAL OPTIMIZATION Details at https://sigopt.com/research
  • 9.
  • 13. HOW DOES IT FIT IN THE STACK? Big Data Machine Learning Models with tunable parameters
  • 14. Optimally suggests new parameters HOW DOES IT FIT IN THE STACK? Objective Metric New parameters Big Data Machine Learning Models with tunable parameters
  • 15. Optimally suggests new parameters HOW DOES IT FIT IN THE STACK? Objective Metric New parameters Better Models Big Data Machine Learning Models with tunable parameters
  • 17. Optimally suggests new parameters Ex: LOAN CLASSIFICATION (xgboost) Prediction Accuracy New parameters Better AccuracyLoan Applications Default Prediction with tunable ML parameters ● Income ● Credit Score ● Loan Amount
  • 18. COMPARATIVE PERFORMANCE ● 8.2% Better Accuracy than baseline ● 100x faster than standard tuning methods Accuracy Cost Grid Search Random Search Iterations AUC .698 .690 .683 .675 1,00010,000100,000
  • 19. EXAMPLE: ALGORITHMIC TRADING Expected Revenue New parameters Higher Returns Market Data Trading Strategy with tunable weights and thresholds ● Closing Prices ● Day of Week ● Market Volatility Optimally suggests new parameters
  • 20. COMPARATIVE PERFORMANCE Standard Method Expert ● 200% Higher model returns than expert ● 10x faster than standard methods
  • 22. 1. Build Gaussian Process (GP) with points sampled so far 2. Optimize the fit of the GP (covariance hyperparameters) 3. Find the point(s) of highest Expected Improvement within parameter domain 4. Return optimal next best point(s) to sample HOW DOES IT WORK?
  • 23. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 24. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 25. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 26. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 27. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 28. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 30. ● Classify house numbers with more training data and more sophisticated model PROBLEM
  • 31. ● TensorFlow makes it easier to design DNN architectures, but what structure works best on a given dataset? CONVNET STRUCTURE
  • 32. ● Per parameter adaptive SGD variants like RMSProp and Adagrad seem to work best ● Still require careful selection of learning rate (α), momentum (β), decay (γ) terms STOCHASTIC GRADIENT DESCENT
  • 33. ● Comparison of several RMSProp SGD parametrizations ● Not obvious which configurations will work best on a given dataset without experimentation STOCHASTIC GRADIENT DESCENT
  • 35. ● Avg Hold out accuracy after 5 optimization runs consisting of 80 objective evaluations ● Optimized single 80/20 CV fold on training set, ACC reported on test set as hold out PERFORMANCE SigOpt (TensorFlow CNN) Rnd Search (TensorFlow CNN) No Tuning (sklearn RF) No Tuning (TensorFlow CNN) Hold Out ACC 0.8130 (+315.2%) 0.5690 0.5278 0.1958
  • 36. COST ANALYSIS Model Performance (CV Acc. threshold) Random Search Cost SigOpt Cost SigOpt Cost Savings Potential Savings In Production (50 GPUs) 87 % $275 $42 84% $12,530 85 % $195 $23 88% $8,750 80 % $46 $21 55% $1,340 70 % $29 $21 27% $400
  • 37. EXAMPLE: TUNING DNN CLASSIFIERS CIFAR10 Dataset ● Photos of objects ● 10 classes ● Metric: Accuracy ○ [0.1, 1.0] Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
  • 38. ● All convolutional neural network ● Multiple convolutional and dropout layers ● Hyperparameter optimization mixture of domain expertise and grid search (brute force) USE CASE: ALL CONVOLUTIONAL http://arxiv.org/pdf/1412.6806.pdf
  • 39. MANY TUNABALE PARAMETERS... ● epochs: “number of epochs to run fit” - int [1,∞] ● learning rate: influence on current value of weights at each step - double (0, 1] ● momentum coefficient: “the coefficient of momentum” - double (0, 1] ● weight decay: parameter affecting how quickly weight decays - double (0, 1] ● depth: parameter affecting number of layers in net - int [1, 20(?)] ● gaussian scale: standard deviation of initialization normal dist. - double (0,∞] ● momentum step change: mul. amount to decrease momentum - double (0, 1] ● momentum step schedule start: epoch to start decreasing momentum - int [1,∞] ● momentum schedule width: epoch stride for decreasing momentum - int [1,∞] ...optimal values non-intuitive
  • 40. COMPARATIVE PERFORMANCE ● Expert baseline: 0.8995 ○ (using neon) ● SigOpt best: 0.9011 ○ 1.6% reduction in error rate ○ No expert time wasted in tuning
  • 41. USE CASE: DEEP RESIDUAL http://arxiv.org/pdf/1512.03385v1.pdf ● Explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions ● Variable depth ● Hyperparameter optimization mixture of domain expertise and grid search (brute force)
  • 42. COMPARATIVE PERFORMANCE Standard Method ● Expert baseline: 0.9339 ○ (from paper) ● SigOpt best: 0.9436 ○ 15% relative error rate reduction ○ No expert time wasted in tuning
  • 44. TRY OUT SIGOPT FOR FREE https://sigopt.com/getstarted ● Quick example and intro to SigOpt ● No signup required ● Visual and code examples
  • 45. MORE EXAMPLES https://github.com/sigopt/sigopt-examples Examples of using SigOpt in a variety of languages and contexts. Tuning Machine Learning Models (with code) A comparison of different hyperparameter optimization methods. Using Model Tuning to Beat Vegas (with code) Using SigOpt to tune a model for predicting basketball scores. Learn more about the technology behind SigOpt at https://sigopt.com/research
  • 47. overfit good fit underfit GPs: FITTING THE GP
  • 48. USE CASE: CLASSIFICATION MODELS Machine Learning models have many non-intuitive tunable hyperparameters Problem: Before Standard methods use high resources for low performance After SigOpt finds better parameters with 10x fewer evaluations than standard methods
  • 49. USE CASE: SIMULATIONS BETTER RESULTS +450% FASTER Expensive simulations require high resources for every run Problem: Before Brute force tuning approach prohibitively expensive After SigOpt finds better results with fewer required simulations