Practical Block-wise Neural Network Architecture Generation

•

0 j'aime•63 vues

Author: Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu Origin: https://arxiv.org/abs/1708.05552 Related: https://github.com/number9473/nn-algorithm/issues/262 > BlockQNN

Ingénierie

Practical Block-wise Neural
Network Architecture
Generation
Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu. CVPR 2018.
Yu Kai Huang

Outline
● Neural Network Architecture Generation
● BlockQNN: Automated Network Design Framework
○ Q-Learning
○ Reward Shaping
○ Early Stop Strategy
○ Epsilon-greedy Strategy, Experience Replay
● Experiments
○ Comparison with
■ hand-crafted networks
■ auto-generated networks
○ Generalizability
■ CIFAR-10, CIFAR-100
■ ImageNet

“... a high-performance network architecture generally
possesses a tremendous number of possible configurations
about the number of layers, hyperparameters in each layer
and type of each layer.”

“It is hence infeasible for manually exhaustive search, and the
design of successful hand-crafted networks heavily rely on
expert knowledge and experience.”

“constructing the network in a smart and automatic manner
remains an open problem.”

“... some recent works have attempted computer-aided or
automated network design [2, 37], there are several
challenges still unsolved”
● huge search space and heavy computational costs
● hard to transfer to other tasks or generalize to another
dataset with different input data sizes.

BlockQNN: Automated Network Design
Framework

“... we introduce the block-wise network generation, i.e., we
construct the network architecture as a flexible stack of
personalized blocks rather tedious per-layer network piling.”
● the generated network owns a powerful generalization to
other task domains or different datasets.

Designing Network Blocks With Q-Learning
action
state
reward

Network Structure Code (NSC)
● 5-D NSC vector:
○ (layer index, operation type, kernel size, Pred1, Pred2)

Q-learning process
“We model the layer selection process as a Markov Decision Process ... To find
the optimal architecture, we ask our agent to maximize its expected reward over
all possible trajectories ...”

Q-learning process
“For this maximization problem, it is usually to use recursive Bellman Equation to
optimality.”
“... we define the maximum total expected reward to be Q∗(st, a) ...”
α: learning rate
γ: discount factor
rt: intermediate reward
st: current state
sT: final state

Shaped Intermediate Reward
“Since the reward rt cannot be explicitly measured in our task, we use reward
shaping [22] to speed up training.”

Early Stop Strategy
“... which means that some good blocks unfortunately perform worse than bad
blocks when stop training early.”

Early Stop Strategy
“we notice that the FLOPs and density of the corresponding blocks have a
negative correlation with the final accuracy. Thus, we redefine the reward function
as:”
FLOPs: computational complexity of the
block.
Density: the edge number divided by the
dot number in DAG of the block.
µ and ρ: balance the weights of FLOPs and
Density.

Training Details
● Epsilon-greedy Strategy
○ “With epsilon-greedy strategy, the random action is taken with probability ϵ and the greedy
action is chosen with probability 1-ϵ.”
● Experience Replay
○ “Within a given interval, i.e. each training iteration, the agent samples 64 blocks with their
corresponding validation accuracies from the memory and updates Q-value 64 times.”
● BlockQNN Configuration
○ learning rate α, discount factor γ, hyperparameters µ and ρ, maximum layer index, ...

Q-learning performance
“The accuracy goes up with the epsilon decrease and the top models are all found
in the final stage, show that our agent can learn to generate better block structures
instead of random searching.”

Block-QNN’s
results
compare with
state-of-
the-art

The required computing resource and time

Contenu connexe

Tendances

Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya

Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Universitat Politècnica de Catalunya

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya

Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...Universitat Politècnica de Catalunya

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Generative models and adversarial training...Universitat Politècnica de Catalunya

CenternetArithmer Inc.

Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya

Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya

The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018Universitat Politècnica de Catalunya

Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Deep Networks (UPC 2016)Universitat Politècnica de Catalunya

Tendances (20)

Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)

The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)

Deep Learning for Computer Vision: Memory usage and computational considerati...

Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)

Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...

Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)

Deep Learning for Computer Vision: Generative models and adversarial training...

Centernet

Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...

Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...

The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018

Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)

Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018

Deep Learning for Computer Vision: Deep Networks (UPC 2016)

Similaire à Practical Block-wise Neural Network Architecture Generation

Lec3 dqnRonald Teo

Eye deepsveitser

Cvpr 2018 papers review (efficient computing)DonghyunKang12

Towards quantum machine learning calogero zarbo - meet upDeep Learning Italia

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi

Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta

物件偵測與辨識技術CHENHuiMei

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf

Autoencoders for image_classificationCenk Bircanoğlu

Artificial Intelligence, Machine Learning and Deep LearningSujit Pal

Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu

[Icml2019] parameter efficient training of deep convolutional neural network...LeapMind Inc

Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu

Learning Convolutional Neural Networks for GraphsMathias Niepert

Deep Learning Module 2A Training MLP.pptxvipul6601

Anomaly Detection through Reinforcement LearningHari Koduvely (PhD)

Exploring Strategies for Training Deep Neural Networks paper reviewVimukthi Wickramasinghe

Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

CPQ_presentation_ICCV2021Jihun Yun

Similaire à Practical Block-wise Neural Network Architecture Generation (20)

Lec3 dqn

Eye deep

Cvpr 2018 papers review (efficient computing)

Towards quantum machine learning calogero zarbo - meet up

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...

Neural Networks in Data Mining - “An Overview”

物件偵測與辨識技術

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

Autoencoders for image_classification

Artificial Intelligence, Machine Learning and Deep Learning

Machine Learning, Deep Learning and Data Analysis Introduction

[Icml2019] parameter efficient training of deep convolutional neural network...

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Learning Convolutional Neural Networks for Graphs

Deep Learning Module 2A Training MLP.pptx

Anomaly Detection through Reinforcement Learning

Exploring Strategies for Training Deep Neural Networks paper review

Hands on machine learning with scikit-learn and tensor flow by ahmed yousry

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

CPQ_presentation_ICCV2021

Plus de 郁凱黃

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...郁凱黃

Human-level control through deep reinforcement learning郁凱黃

Ring loss: Convex Feature Normalization for Face Recognition郁凱黃

Playing Atari with Deep Reinforcement Learning郁凱黃

A Revisit of Feature Learning on CNN-based Face Recognition郁凱黃

Rose x Girl x White sheet郁凱黃

Akatsuki Hackathon 2015 Demo郁凱黃

Introduction to FreeBSD commands郁凱黃

Introduction to FreeBSD commands(beta)郁凱黃

電競大賽說明會ppt郁凱黃

Plus de 郁凱黃 (10)

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...

Human-level control through deep reinforcement learning

Ring loss: Convex Feature Normalization for Face Recognition

Playing Atari with Deep Reinforcement Learning

A Revisit of Feature Learning on CNN-based Face Recognition

Rose x Girl x White sheet

Akatsuki Hackathon 2015 Demo

Introduction to FreeBSD commands

Introduction to FreeBSD commands(beta)

電競大賽說明會ppt

Dernier

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran

Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774

Architect Hassan Khalil Portfolio for 2024hassan khalil

Oxy acetylene welding presentation note.eptoze12

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1

Correctly Loading Incremental Data at ScaleAlluxio, Inc.

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

computer application and construction managementMariconPadriquez1

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

Past, Present and Future of Generative AIabhishek36461

Dernier (20)

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Introduction to Machine Learning Unit-3 for II MECH

Call Girls Narol 7397865700 Independent Call Girls

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Arduino_CSE ece ppt for working and principal of arduino.ppt

Architect Hassan Khalil Portfolio for 2024

Oxy acetylene welding presentation note.

IVE Industry Focused Event - Defence Sector 2024

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync

An experimental study in using natural admixture as an alternative for chemic...

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers

Correctly Loading Incremental Data at Scale

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

computer application and construction management

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)

Software and Systems Engineering Standards: Verification and Validation of Sy...

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

Past, Present and Future of Generative AI

Practical Block-wise Neural Network Architecture Generation

1. Practical Block-wise Neural Network Architecture Generation Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu. CVPR 2018. Yu Kai Huang

2. Outline ● Neural Network Architecture Generation ● BlockQNN: Automated Network Design Framework ○ Q-Learning ○ Reward Shaping ○ Early Stop Strategy ○ Epsilon-greedy Strategy, Experience Replay ● Experiments ○ Comparison with ■ hand-crafted networks ■ auto-generated networks ○ Generalizability ■ CIFAR-10, CIFAR-100 ■ ImageNet

3. Neural Network Architecture Generation

4. “... a high-performance network architecture generally possesses a tremendous number of possible configurations about the number of layers, hyperparameters in each layer and type of each layer.”

5. “It is hence infeasible for manually exhaustive search, and the design of successful hand-crafted networks heavily rely on expert knowledge and experience.”

6. “constructing the network in a smart and automatic manner remains an open problem.”

7. “... some recent works have attempted computer-aided or automated network design [2, 37], there are several challenges still unsolved” ● huge search space and heavy computational costs ● hard to transfer to other tasks or generalize to another dataset with different input data sizes.

8. BlockQNN: Automated Network Design Framework

9. “... we introduce the block-wise network generation, i.e., we construct the network architecture as a flexible stack of personalized blocks rather tedious per-layer network piling.” ● the generated network owns a powerful generalization to other task domains or different datasets.

10.

11. Designing Network Blocks With Q-Learning action state reward

12. Network Structure Code (NSC) ● 5-D NSC vector: ○ (layer index, operation type, kernel size, Pred1, Pred2)

13. Q-learning process

14. Q-learning process “We model the layer selection process as a Markov Decision Process ... To find the optimal architecture, we ask our agent to maximize its expected reward over all possible trajectories ...”

15. Q-learning process “For this maximization problem, it is usually to use recursive Bellman Equation to optimality.” “... we define the maximum total expected reward to be Q∗(st, a) ...” α: learning rate γ: discount factor rt: intermediate reward st: current state sT: final state

16. Shaped Intermediate Reward “Since the reward rt cannot be explicitly measured in our task, we use reward shaping [22] to speed up training.”

17. Early Stop Strategy

18. Early Stop Strategy “... which means that some good blocks unfortunately perform worse than bad blocks when stop training early.”

19. Early Stop Strategy “we notice that the FLOPs and density of the corresponding blocks have a negative correlation with the final accuracy. Thus, we redefine the reward function as:” FLOPs: computational complexity of the block. Density: the edge number divided by the dot number in DAG of the block. µ and ρ: balance the weights of FLOPs and Density.

20. Training Details ● Epsilon-greedy Strategy ○ “With epsilon-greedy strategy, the random action is taken with probability ϵ and the greedy action is chosen with probability 1-ϵ.” ● Experience Replay ○ “Within a given interval, i.e. each training iteration, the agent samples 64 blocks with their corresponding validation accuracies from the memory and updates Q-value 64 times.” ● BlockQNN Configuration ○ learning rate α, discount factor γ, hyperparameters µ and ρ, maximum layer index, ...

21. Experiments

22. Q-learning performance “The accuracy goes up with the epsilon decrease and the top models are all found in the final stage, show that our agent can learn to generate better block structures instead of random searching.”

23. Q-learning result with different NSC

24. Block-QNN’s results compare with state-of- the-art

25. The required computing resource and time

26. Comparisons on ImageNet-1K Dataset.

Practical Block-wise Neural Network Architecture Generation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Practical Block-wise Neural Network Architecture Generation

Similaire à Practical Block-wise Neural Network Architecture Generation (20)

Plus de 郁凱黃

Plus de 郁凱黃 (10)

Dernier

Dernier (20)

Practical Block-wise Neural Network Architecture Generation

Practical Block-wise Neural Network Architecture Generation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Practical Block-wise Neural Network Architecture Generation

Similaire à Practical Block-wise Neural Network Architecture Generation (20)

Plus de 郁凱 黃

Plus de 郁凱 黃 (10)

Dernier

Dernier (20)

Practical Block-wise Neural Network Architecture Generation

Plus de 郁凱黃

Plus de 郁凱黃 (10)