SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
An Introduction to
Neural Architecture Search
Colin White, RealityEngines.AI
Deep learning
● Explosion of interest since 2012
● Very powerful machine learning technique
● Huge variety of neural networks for different tasks
● Key ingredients took years to develop
● Algorithms are getting increasingly more specialized and complicated
Algorithms are getting increasingly more specialized and complicated
E.g. accuracy on ImageNet has steadily improved over 10 years
Deep learning
● What if an algorithm could do this
for us?
● Neural architecture search (NAS)
is a hot area of research
● Given a dataset, define a search
space of architectures, then use a
search strategy to find the best
architecture for your dataset
Neural architecture search
Outline
● Introduction to NAS
● Background on deep learning
● Automated machine learning
● Optimization techniques
● NAS Framework
○ Search space
○ Search strategy
○ Evaluation method
● Conclusions
Background - deep learning
Source:https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating-
next-generation-deep-learning/
● Studied since the 1940s - simulate the human brain
● “Neural networks are the second-best way to do almost anything” - JS Denker, 2000s
● Breakthrough: 2012 ImageNet competition [Krizhevsky, Sutskever, and Hinton]
Automated Machine Learning
● Automated machine learning
○ Data cleaning, model selection, HPO, NAS, ...
● Hyperparameter optimization (HPO)
○ Learning rate, dropout rate, batch size, ...
● Neural architecture search
○ Finding the best neural architecture
Source:https://determined.ai/blog/neural-arc
hitecture-search/
Optimization
● Zero’th order optimization (used for HPO, NAS)
○ Bayesian Optimization
● First order optimization (used for Neural nets)
○ Gradient descent / stochastic gradient descent
● Second order optimization
○ Newton’s method
Source: https://ieeexplore.ieee.org/document/8422997
Zero’th order optimization
● Grid search
● Random search
● Bayesian Optimization
○ Use the results of the previous guesses to make the next guess
Source: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
Outline
● Introduction to NAS
● Background on deep learning
● Automated machine learning
● Optimization techniques
● NAS Framework
○ Search space
○ Search strategy
○ Evaluation method
● Conclusions
Neural Architecture Search
Evaluation method
● Full training
● Partial training
● Training with shared
weights
Search space
● Cell-based search space
● Macro vs micro search
Search strategy
● Reinforcement learning
● Continuous methods
● Bayesian optimization
● Evolutionary algorithm
Search space
● Macro vs micro search
● Progressive search
● Cell-based search space
[Zoph, Le ‘16]
Bayesian optimization
● Popular method [Golovin et al. ‘17], [Jin et al. ‘18], [Kandasamy et al. ‘18]
● Great method to optimize an expensive function
● Fix a dataset (e.g. CIFAR-10, MNIST)
● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense})
● Define an objective function f:A→[0,1]
○ f(a) = validation accuracy of a after training
● Define a distance function d(a1
, a2
) between architectures
○ Quick to evaluate. If d(a1
, a2
) is small, | f(a1
) - f( a2
) | is small
Bayesian optimization
Goal: find a∈A which maximizes f(a)
● Choose several random architectures a and evaluate f(a)
● In each iteration i:
○ Use f(a1
) … f(ai-1
) to choose new ai
○ Evaluate f(ai
)
● Fix a dataset (e.g. CIFAR-10, MNIST)
● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense})
● Define an objective function f:A→[0,1]
○ f(a) = validation accuracy of a after training
● Define a distance function d(a1
, a2
) between architectures
○ Quick to evaluate. If d(a1
, a2
) is small, | f(a1
) - f( a2
) | is small
Gaussian process
● Assume the distribution f(A) is smooth
● The deviations look like Gaussian noise
● Update as we get more information
Source: http://keyonvafa.com/gp-tutorial/,
https://katbailey.github.io/post/gaussian-processes-for-dummies/
Acquisition function
● In each iteration,
find the architecture
with the largest
expected
improvement
DARTS: Differentiable Architecture Search
● Relax NAS to a continuous problem
● Use gradient descent (just like normal parameters)
[Liu et al. ‘18]
DARTS: Differentiable Architecture Search
● Upsides: “one-shot”
● Downside: may only work in “micro” search setting
[Liu et al. ‘18]
Reinforcement Learning
● Controller recurrent neural network
○ Chooses a new architecture in each round
○ Architecture is trained and evaluated
○ Controller receives feedback
[Zoph, Le ‘16]
Reinforcement Learning
● Upside: much more powerful than BayesOpt, gradient descent
● Downsides: train a whole new network using neural networks;
RL could be overkill
[Zoph, Le ‘16]
Evaluation Strategy
● Full training
○ Simple
○ Accurate
● Partial training
○ Less computation
○ Less accurate
● Shared weights
○ Least computation
○ Least accurate
Source:
https://www.automl.org/blog-2nd-a
utoml-challenge/
Is NAS ready for widespread adoption?
● Hard to reproduce results [Li, Talwalkar ‘19]
● Hard to compare different papers
● Search spaces have been getting smaller
● Random search is a strong baseline [Li, Talwalkar ‘19], [Sciuto et al. ‘19]
● Recent papers are giving fair comparisons
● NAS cannot yet consistently beat human engineers
● Auto-Keras tutorial: https://www.pyimagesearch.com/2019/01/07/auto-keras-and-automl-a-getting-started-guide/
● DARTS repo: https://github.com/quark0/darts
Conclusion
● NAS: find the best neural architecture for a given dataset
● Search space, search strategy, evaluation method
● Search strategies: RL, BayesOpt, continuous optimization
● Not yet at the point of widespread adoption in industry
Thanks! Questions?

Contenu connexe

Tendances

Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 

Tendances (20)

Principal component analysis, Code and Time Complexity
Principal component analysis, Code and Time ComplexityPrincipal component analysis, Code and Time Complexity
Principal component analysis, Code and Time Complexity
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 

Similaire à An Introduction to Neural Architecture Search

Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
Zimin Park
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
inside-BigData.com
 

Similaire à An Introduction to Neural Architecture Search (20)

DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Centernet
CenternetCenternet
Centernet
 
Scalable high-dimensional indexing with Hadoop
Scalable high-dimensional indexing with HadoopScalable high-dimensional indexing with Hadoop
Scalable high-dimensional indexing with Hadoop
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 
Thamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL InternshipThamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL Internship
 

Plus de Bill Liu

Plus de Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

An Introduction to Neural Architecture Search

  • 1. An Introduction to Neural Architecture Search Colin White, RealityEngines.AI
  • 2. Deep learning ● Explosion of interest since 2012 ● Very powerful machine learning technique ● Huge variety of neural networks for different tasks ● Key ingredients took years to develop ● Algorithms are getting increasingly more specialized and complicated
  • 3. Algorithms are getting increasingly more specialized and complicated E.g. accuracy on ImageNet has steadily improved over 10 years Deep learning
  • 4. ● What if an algorithm could do this for us? ● Neural architecture search (NAS) is a hot area of research ● Given a dataset, define a search space of architectures, then use a search strategy to find the best architecture for your dataset Neural architecture search
  • 5. Outline ● Introduction to NAS ● Background on deep learning ● Automated machine learning ● Optimization techniques ● NAS Framework ○ Search space ○ Search strategy ○ Evaluation method ● Conclusions
  • 6. Background - deep learning Source:https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating- next-generation-deep-learning/ ● Studied since the 1940s - simulate the human brain ● “Neural networks are the second-best way to do almost anything” - JS Denker, 2000s ● Breakthrough: 2012 ImageNet competition [Krizhevsky, Sutskever, and Hinton]
  • 7. Automated Machine Learning ● Automated machine learning ○ Data cleaning, model selection, HPO, NAS, ... ● Hyperparameter optimization (HPO) ○ Learning rate, dropout rate, batch size, ... ● Neural architecture search ○ Finding the best neural architecture Source:https://determined.ai/blog/neural-arc hitecture-search/
  • 8. Optimization ● Zero’th order optimization (used for HPO, NAS) ○ Bayesian Optimization ● First order optimization (used for Neural nets) ○ Gradient descent / stochastic gradient descent ● Second order optimization ○ Newton’s method Source: https://ieeexplore.ieee.org/document/8422997
  • 9. Zero’th order optimization ● Grid search ● Random search ● Bayesian Optimization ○ Use the results of the previous guesses to make the next guess Source: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
  • 10. Outline ● Introduction to NAS ● Background on deep learning ● Automated machine learning ● Optimization techniques ● NAS Framework ○ Search space ○ Search strategy ○ Evaluation method ● Conclusions
  • 11. Neural Architecture Search Evaluation method ● Full training ● Partial training ● Training with shared weights Search space ● Cell-based search space ● Macro vs micro search Search strategy ● Reinforcement learning ● Continuous methods ● Bayesian optimization ● Evolutionary algorithm
  • 12. Search space ● Macro vs micro search ● Progressive search ● Cell-based search space [Zoph, Le ‘16]
  • 13. Bayesian optimization ● Popular method [Golovin et al. ‘17], [Jin et al. ‘18], [Kandasamy et al. ‘18] ● Great method to optimize an expensive function ● Fix a dataset (e.g. CIFAR-10, MNIST) ● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense}) ● Define an objective function f:A→[0,1] ○ f(a) = validation accuracy of a after training ● Define a distance function d(a1 , a2 ) between architectures ○ Quick to evaluate. If d(a1 , a2 ) is small, | f(a1 ) - f( a2 ) | is small
  • 14. Bayesian optimization Goal: find a∈A which maximizes f(a) ● Choose several random architectures a and evaluate f(a) ● In each iteration i: ○ Use f(a1 ) … f(ai-1 ) to choose new ai ○ Evaluate f(ai ) ● Fix a dataset (e.g. CIFAR-10, MNIST) ● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense}) ● Define an objective function f:A→[0,1] ○ f(a) = validation accuracy of a after training ● Define a distance function d(a1 , a2 ) between architectures ○ Quick to evaluate. If d(a1 , a2 ) is small, | f(a1 ) - f( a2 ) | is small
  • 15. Gaussian process ● Assume the distribution f(A) is smooth ● The deviations look like Gaussian noise ● Update as we get more information Source: http://keyonvafa.com/gp-tutorial/, https://katbailey.github.io/post/gaussian-processes-for-dummies/
  • 16. Acquisition function ● In each iteration, find the architecture with the largest expected improvement
  • 17. DARTS: Differentiable Architecture Search ● Relax NAS to a continuous problem ● Use gradient descent (just like normal parameters) [Liu et al. ‘18]
  • 18. DARTS: Differentiable Architecture Search ● Upsides: “one-shot” ● Downside: may only work in “micro” search setting [Liu et al. ‘18]
  • 19. Reinforcement Learning ● Controller recurrent neural network ○ Chooses a new architecture in each round ○ Architecture is trained and evaluated ○ Controller receives feedback [Zoph, Le ‘16]
  • 20. Reinforcement Learning ● Upside: much more powerful than BayesOpt, gradient descent ● Downsides: train a whole new network using neural networks; RL could be overkill [Zoph, Le ‘16]
  • 21. Evaluation Strategy ● Full training ○ Simple ○ Accurate ● Partial training ○ Less computation ○ Less accurate ● Shared weights ○ Least computation ○ Least accurate Source: https://www.automl.org/blog-2nd-a utoml-challenge/
  • 22. Is NAS ready for widespread adoption? ● Hard to reproduce results [Li, Talwalkar ‘19] ● Hard to compare different papers ● Search spaces have been getting smaller ● Random search is a strong baseline [Li, Talwalkar ‘19], [Sciuto et al. ‘19] ● Recent papers are giving fair comparisons ● NAS cannot yet consistently beat human engineers ● Auto-Keras tutorial: https://www.pyimagesearch.com/2019/01/07/auto-keras-and-automl-a-getting-started-guide/ ● DARTS repo: https://github.com/quark0/darts
  • 23. Conclusion ● NAS: find the best neural architecture for a given dataset ● Search space, search strategy, evaluation method ● Search strategies: RL, BayesOpt, continuous optimization ● Not yet at the point of widespread adoption in industry Thanks! Questions?