SlideShare une entreprise Scribd logo
1  sur  19
Competition Winning Learning Rates
Leslie N. Smith
Naval Center for Applied Research in Artificial Intelligence
US Naval Research Laboratory, Washington, DC 20375
leslie.smith@nrl.navy.mil; Phone: (202) 767-9532
MLConf 2018
November 14, 2018
UNCLASSIFIED
Introduction
• My story of enlightenment about learning rates (LR)
• My first steps: Cyclical learning rates (CLR)
– What are cyclical learning rates? Why do they matter?
• A new LR schedule and Super-Convergence
– Fast training of networks with large learning rates
• Competition winning learning rates
– Stanford’s BENCHDawn competition
– Kaggle’s iMaterialist Challenge
• Enlightenment is a never-ending story
– Is weight decay more important than LR?
2
Outline
• Outline
Deep Learning Basics Background
– Uses a neural network composed of many “hidden” layers, 𝒍; each layer
contains trainable weights, 𝑾𝒍, and biases, 𝒃𝒍, and a non-linear function, σ
– Image x is input and output y is compared to the label, which defines the loss
}Loss
Label
Back-propagationVanishing Gradient
Input x
Classes
Weights
Speed limit 20
Yield
Pedestrian crossing
𝒚𝒍 = 𝑭 𝒚𝒍−𝟏 = 𝝈 𝑾𝒍 𝒚𝒍−𝟏 + 𝒃𝒍
𝒚 𝑳 = 𝝈 𝑾 𝑳 𝝈(𝑾 𝑳−𝟏 𝝈(… 𝑾 𝟏 𝒙 + 𝒃 𝟏 … ))
3
|| 𝒚 𝑳 - y ||
4
What are learning rates?
• Learning rates are the step size in Stochastic Gradient
Descent’s (SGD) back-propagation
• Long known that the learning rate (LR) is the most
important hyper-parameter to tune
– Too large: the training diverges
– Too small: trains slowly to a sub-optimal solution
• How to find an optimal LR?
– Grid or random search
– Time consuming and inaccurate
Cyclical learning rates (CLR)
𝒘 𝒕+𝟏 = 𝒘 𝒕 − 𝜺 𝛁 𝑳 𝜽, 𝒙
5
What is CLR? And who cares?
• Cyclical learning rates (CLR)
– Learning rate schedule that varies between
min and max values
• LR range test: One stepsize of
increasing LR
– Quick and easy way to find an optimal
learning rate
– The peak defines the max_lr
– The optimal LR is a bit less than max_lr
– min_lr ≈ max_lr / 3
Cyclical learning rates (CLR)
max_lr
min_lr
6
Super-convergence
• What if there’s no peak?
– Implies the ability to train at very large
learning rates
• Super-convergence
– Start with a small LR
– Grow to a large learning rate maximum
• 1cycle learning rate schedule
– One CLR cycle
Super-convergence
7
Super-convergence
• What if there’s no peak?
– Implies the ability to train at very large
learning rates
• Super-convergence
– Start with a small LR
– Grow to a large learning rate maximum
• 1cycle learning rate schedule
– One CLR cycle
Super-convergence
8
Super-convergence
• What if there’s no peak?
– Implies the ability to train at very large
learning rates
• Super-convergence
– Start with a small LR
– Grow to a large learning rate maximum
• 1cycle learning rate schedule
– One CLR cycle, ending with a smaller LR
than the min
Super-convergence
9
ImageNet Super -convergence
Introduction
• My story of enlightenment about learning rates (LR)
• My first steps: Cyclical learning rates (CLR)
– What are cyclical learning rates? Why do they matter?
• A new LR schedule and Super-Convergence
– Fast training of networks with large learning rates
• Competition winning learning rates
– Stanford’s BENCHDawn competition
– Kaggle’s iMaterialist Challenge
• Enlightenment is a never-ending story
– Is weight decay more important than LR?
10
Outline
11
Competition winning LRCompetition Winning Learning Rates
• DAWNBench challenge
– “Howard explains that in order to create an algorithm for solving CIFAR,
Fast.AI’s group turned to a relatively unknown technique known as “super
convergence.” This wasn’t developed by a well-funded tech company or
published in a big journal, but was created and self-published by a single
engineer named Leslie Smith working at the Naval Research Laboratory.”
The Verge, 5/7/2018 article about fast.ai team 1st place winning
12
Competition winning LRCompetition Winning Learning Rates
• Kaggle iMaterialist
Challenge (Fashion)
– “For training I used Adam
initially but I switched to the
1cycle policy with SGD very
early on. You can read more
about this training regime in
a paper by Leslie Smith and
you can find details on how to
use it by Sylvain Gugger, the
author of the implementation
in the fastai library here.”
1st place winner, Radek
Osmulski
13
Relevant publications
• Smith, Leslie N. "Cyclical learning rates for training neural
networks." In Applications of Computer Vision (WACV),
2017 IEEE Winter Conference on, pp. 464-472. IEEE, 2017.
• Smith, Leslie N., and Nicholay Topin. "Super-Convergence:
Very Fast Training of Residual Networks Using Large
Learning Rates." arXiv preprint arXiv:1708.07120 (2017).
• Smith, Leslie N. "A disciplined approach to neural network
hyper-parameters: Part 1--learning rate, batch size,
momentum, and weight decay." arXiv preprint
arXiv:1803.09820 (2018).
• There’s a large batch of literature on Large Batch Training
Competition Winning Learning Rates
Introduction
• My story of enlightenment about learning rates (LR)
• My first steps: Cyclical learning rates (CLR)
– What are cyclical learning rates? Why do they matter?
• A new LR schedule and Super-Convergence
– Fast training of networks with large learning rates
• Competition winning learning rates
– Stanford’s BENCHDawn competition
– Kaggle’s iMaterialist Challenge
• Enlightenment is a never-ending story
– Is weight decay more important than LR?
14
Outline
15
What is Weight Decay?
• L2 normalization
• The “effective weight
decay” is a combination of
the WD coefficient, λ, and
the learning rate, ε
• Which term does the
learning rate schedule
impact more?
Weight decay (WD)
𝑳(𝜽, 𝒙) = |𝒇 𝜽, 𝒙 − 𝒚| 2 + ½ λ||w||2
Effective WD
𝒘 𝒕+𝟏 = 𝒘 𝒕 − 𝜺 𝛁 𝑳 𝜽, 𝒙 − 𝜺 𝝀 𝒘 𝒕
𝒘 𝒕+𝟏 = (𝟏 − 𝜺 𝝀) 𝒘 𝒕 − 𝜺 𝛁 𝑳 𝜽, 𝒙
16
What is the optimal WD?
• Decreasing weight decay has a much greater effect than
decreasing the learning rate
• The maximum weight decay, maxWD, can be found in a
similar way as maxLR (i.e., WD range test)
• Use a large WD early in training and let it decay
Weight decay (WD)
MaxWD
17
Dynamic weight decay
• Hyper-parameter’s relationship
where LR = learning rate, WD = weight decay coefficient, TBS = total batch size,
and α = momentum
– Large TBS and smaller values of LR and α permit larger maxWD
• Large batch training can be improved by a large WD if WD decays
during training
Weight decay (WD)
(𝐿𝑅 ∗ 𝑊𝐷)/(𝑇𝐵𝑆 ∗ (1 − α)) ≈ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
83.7%
82.2%
84%
83.5%
18
Conclusions (for now)
• Takeaways
– Enlightenment is a never-ending story
– A new diet plan for your network
• Decaying weight decay in large batch training; set WD large in the
early phase of training and zero near the end
– Hyper-parameters are tightly coupled and must be tuned
together
Competition Winning Learning Rates
(𝐿𝑅 ∗ 𝑊𝐷)/(𝑇𝐵𝑆 ∗ (1 − α)) ≈ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
• Outline
Competition Winning Learning Rates
Questions and comments?
The End
19

Contenu connexe

Tendances

【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural NetworksYosuke Shinya
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionChristoph Körner
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based PoliciesDeep Learning JP
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium ThermodynamicsHa Phuong
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKSDeep Learning JP
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networksDeep Learning JP
 
[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image Classification[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image ClassificationDeep Learning JP
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP
 
attention_is_all_you_need_nips17_論文紹介
attention_is_all_you_need_nips17_論文紹介attention_is_all_you_need_nips17_論文紹介
attention_is_all_you_need_nips17_論文紹介Masayoshi Kondo
 
レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方Shun Nukui
 
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with TransformersDeep Learning JP
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향홍배 김
 
Fin bert paper review !
Fin bert paper review !Fin bert paper review !
Fin bert paper review !taeseon ryu
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?Deep Learning JP
 
딥러닝 자연어처리 - RNN에서 BERT까지
딥러닝 자연어처리 - RNN에서 BERT까지딥러닝 자연어처리 - RNN에서 BERT까지
딥러닝 자연어처리 - RNN에서 BERT까지deepseaswjh
 

Tendances (20)

【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
AlphaGoのしくみ
AlphaGoのしくみAlphaGoのしくみ
AlphaGoのしくみ
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
 
Contrastive learning 20200607
Contrastive learning 20200607Contrastive learning 20200607
Contrastive learning 20200607
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks
 
[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image Classification[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image Classification
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
 
attention_is_all_you_need_nips17_論文紹介
attention_is_all_you_need_nips17_論文紹介attention_is_all_you_need_nips17_論文紹介
attention_is_all_you_need_nips17_論文紹介
 
Anomaly detection survey
Anomaly detection surveyAnomaly detection survey
Anomaly detection survey
 
レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方
 
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
Fin bert paper review !
Fin bert paper review !Fin bert paper review !
Fin bert paper review !
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?
 
딥러닝 자연어처리 - RNN에서 BERT까지
딥러닝 자연어처리 - RNN에서 BERT까지딥러닝 자연어처리 - RNN에서 BERT까지
딥러닝 자연어처리 - RNN에서 BERT까지
 

Similaire à Competition winning learning rates

Introduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural netsIntroduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural netsSayak Paul
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
The Art Of Backpropagation
The Art Of BackpropagationThe Art Of Backpropagation
The Art Of BackpropagationJennifer Prendki
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellSri Ambati
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutSeunghyun Hwang
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfvudinhphuong96
 
Neural Nets from Scratch
Neural Nets from ScratchNeural Nets from Scratch
Neural Nets from ScratchSeth H. Weidman
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Neural Network Based Individual Classification System
Neural Network Based Individual Classification SystemNeural Network Based Individual Classification System
Neural Network Based Individual Classification SystemIRJET Journal
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 

Similaire à Competition winning learning rates (20)

Introduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural netsIntroduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural nets
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
The Art Of Backpropagation
The Art Of BackpropagationThe Art Of Backpropagation
The Art Of Backpropagation
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
lecture3.pdf
lecture3.pdflecture3.pdf
lecture3.pdf
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
 
Neural Nets from Scratch
Neural Nets from ScratchNeural Nets from Scratch
Neural Nets from Scratch
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Lectura seis
Lectura seisLectura seis
Lectura seis
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Neural Network Based Individual Classification System
Neural Network Based Individual Classification SystemNeural Network Based Individual Classification System
Neural Network Based Individual Classification System
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 

Plus de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Plus de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Competition winning learning rates

  • 1. Competition Winning Learning Rates Leslie N. Smith Naval Center for Applied Research in Artificial Intelligence US Naval Research Laboratory, Washington, DC 20375 leslie.smith@nrl.navy.mil; Phone: (202) 767-9532 MLConf 2018 November 14, 2018 UNCLASSIFIED
  • 2. Introduction • My story of enlightenment about learning rates (LR) • My first steps: Cyclical learning rates (CLR) – What are cyclical learning rates? Why do they matter? • A new LR schedule and Super-Convergence – Fast training of networks with large learning rates • Competition winning learning rates – Stanford’s BENCHDawn competition – Kaggle’s iMaterialist Challenge • Enlightenment is a never-ending story – Is weight decay more important than LR? 2 Outline
  • 3. • Outline Deep Learning Basics Background – Uses a neural network composed of many “hidden” layers, 𝒍; each layer contains trainable weights, 𝑾𝒍, and biases, 𝒃𝒍, and a non-linear function, σ – Image x is input and output y is compared to the label, which defines the loss }Loss Label Back-propagationVanishing Gradient Input x Classes Weights Speed limit 20 Yield Pedestrian crossing 𝒚𝒍 = 𝑭 𝒚𝒍−𝟏 = 𝝈 𝑾𝒍 𝒚𝒍−𝟏 + 𝒃𝒍 𝒚 𝑳 = 𝝈 𝑾 𝑳 𝝈(𝑾 𝑳−𝟏 𝝈(… 𝑾 𝟏 𝒙 + 𝒃 𝟏 … )) 3 || 𝒚 𝑳 - y ||
  • 4. 4 What are learning rates? • Learning rates are the step size in Stochastic Gradient Descent’s (SGD) back-propagation • Long known that the learning rate (LR) is the most important hyper-parameter to tune – Too large: the training diverges – Too small: trains slowly to a sub-optimal solution • How to find an optimal LR? – Grid or random search – Time consuming and inaccurate Cyclical learning rates (CLR) 𝒘 𝒕+𝟏 = 𝒘 𝒕 − 𝜺 𝛁 𝑳 𝜽, 𝒙
  • 5. 5 What is CLR? And who cares? • Cyclical learning rates (CLR) – Learning rate schedule that varies between min and max values • LR range test: One stepsize of increasing LR – Quick and easy way to find an optimal learning rate – The peak defines the max_lr – The optimal LR is a bit less than max_lr – min_lr ≈ max_lr / 3 Cyclical learning rates (CLR) max_lr min_lr
  • 6. 6 Super-convergence • What if there’s no peak? – Implies the ability to train at very large learning rates • Super-convergence – Start with a small LR – Grow to a large learning rate maximum • 1cycle learning rate schedule – One CLR cycle Super-convergence
  • 7. 7 Super-convergence • What if there’s no peak? – Implies the ability to train at very large learning rates • Super-convergence – Start with a small LR – Grow to a large learning rate maximum • 1cycle learning rate schedule – One CLR cycle Super-convergence
  • 8. 8 Super-convergence • What if there’s no peak? – Implies the ability to train at very large learning rates • Super-convergence – Start with a small LR – Grow to a large learning rate maximum • 1cycle learning rate schedule – One CLR cycle, ending with a smaller LR than the min Super-convergence
  • 10. Introduction • My story of enlightenment about learning rates (LR) • My first steps: Cyclical learning rates (CLR) – What are cyclical learning rates? Why do they matter? • A new LR schedule and Super-Convergence – Fast training of networks with large learning rates • Competition winning learning rates – Stanford’s BENCHDawn competition – Kaggle’s iMaterialist Challenge • Enlightenment is a never-ending story – Is weight decay more important than LR? 10 Outline
  • 11. 11 Competition winning LRCompetition Winning Learning Rates • DAWNBench challenge – “Howard explains that in order to create an algorithm for solving CIFAR, Fast.AI’s group turned to a relatively unknown technique known as “super convergence.” This wasn’t developed by a well-funded tech company or published in a big journal, but was created and self-published by a single engineer named Leslie Smith working at the Naval Research Laboratory.” The Verge, 5/7/2018 article about fast.ai team 1st place winning
  • 12. 12 Competition winning LRCompetition Winning Learning Rates • Kaggle iMaterialist Challenge (Fashion) – “For training I used Adam initially but I switched to the 1cycle policy with SGD very early on. You can read more about this training regime in a paper by Leslie Smith and you can find details on how to use it by Sylvain Gugger, the author of the implementation in the fastai library here.” 1st place winner, Radek Osmulski
  • 13. 13 Relevant publications • Smith, Leslie N. "Cyclical learning rates for training neural networks." In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pp. 464-472. IEEE, 2017. • Smith, Leslie N., and Nicholay Topin. "Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates." arXiv preprint arXiv:1708.07120 (2017). • Smith, Leslie N. "A disciplined approach to neural network hyper-parameters: Part 1--learning rate, batch size, momentum, and weight decay." arXiv preprint arXiv:1803.09820 (2018). • There’s a large batch of literature on Large Batch Training Competition Winning Learning Rates
  • 14. Introduction • My story of enlightenment about learning rates (LR) • My first steps: Cyclical learning rates (CLR) – What are cyclical learning rates? Why do they matter? • A new LR schedule and Super-Convergence – Fast training of networks with large learning rates • Competition winning learning rates – Stanford’s BENCHDawn competition – Kaggle’s iMaterialist Challenge • Enlightenment is a never-ending story – Is weight decay more important than LR? 14 Outline
  • 15. 15 What is Weight Decay? • L2 normalization • The “effective weight decay” is a combination of the WD coefficient, λ, and the learning rate, ε • Which term does the learning rate schedule impact more? Weight decay (WD) 𝑳(𝜽, 𝒙) = |𝒇 𝜽, 𝒙 − 𝒚| 2 + ½ λ||w||2 Effective WD 𝒘 𝒕+𝟏 = 𝒘 𝒕 − 𝜺 𝛁 𝑳 𝜽, 𝒙 − 𝜺 𝝀 𝒘 𝒕 𝒘 𝒕+𝟏 = (𝟏 − 𝜺 𝝀) 𝒘 𝒕 − 𝜺 𝛁 𝑳 𝜽, 𝒙
  • 16. 16 What is the optimal WD? • Decreasing weight decay has a much greater effect than decreasing the learning rate • The maximum weight decay, maxWD, can be found in a similar way as maxLR (i.e., WD range test) • Use a large WD early in training and let it decay Weight decay (WD) MaxWD
  • 17. 17 Dynamic weight decay • Hyper-parameter’s relationship where LR = learning rate, WD = weight decay coefficient, TBS = total batch size, and α = momentum – Large TBS and smaller values of LR and α permit larger maxWD • Large batch training can be improved by a large WD if WD decays during training Weight decay (WD) (𝐿𝑅 ∗ 𝑊𝐷)/(𝑇𝐵𝑆 ∗ (1 − α)) ≈ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 83.7% 82.2% 84% 83.5%
  • 18. 18 Conclusions (for now) • Takeaways – Enlightenment is a never-ending story – A new diet plan for your network • Decaying weight decay in large batch training; set WD large in the early phase of training and zero near the end – Hyper-parameters are tightly coupled and must be tuned together Competition Winning Learning Rates (𝐿𝑅 ∗ 𝑊𝐷)/(𝑇𝐵𝑆 ∗ (1 − α)) ≈ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
  • 19. • Outline Competition Winning Learning Rates Questions and comments? The End 19