SlideShare une entreprise Scribd logo
1  sur  36
EXPLORING RANDOMLY WIRED
NEURAL NETWORKS FOR
IMAGE RECOGNITION
2019.5.7
Yongsu Baek
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
2
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
3
• Automation of Feature Engineering
• Moves to “Architecture Engineering”
• Manually developed architectures(eg. AlexNet,
DenseNet, …)
DEEP LEARNING
Neural Architecture Search(NAS)?
• Automation of Architecture Engineering
• Methods
1) Search Space
• Prior Knowledge: Efficient Search vs. Human bias
• E.g.) Cell/Block repetition
2) Search Strategy
• E.g.) RL, Evolutionary methods, Random, …
3) Performance Estimation Strategy
NEURAL ARCHITECTURE SEARCH
Neural Architecture Search(NAS)?
DEEP LEARNING TO NAS
Neural Architecture Search(NAS)?
Design
an Individual Network
→
Design
a Network Generator
Automation of
Feature Engineering
→
Automation of
Architecture Engineering
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
7
• (NAS) Neural Architecture Search with Reinforcement Learning
• Architecture Generator: LSTM
• Architecture Search: Reinforcement Learning
• 800 GPUs, 28 days
• (NASNet) Learning transferable architectures for scalable image
recognition
• Cell Concept
• Transferable architectures
• 500 GPUs, 4 days
• ENAS, MnasNet, etc.
PREVIOUS WORKS
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
9
• Network Generator 𝑔
𝑔 ∶ Θ ↦ 𝒩
where Θ : a parameter space, 𝒩 : a family of related networks
• e.g. In ResNet generator, 𝒩: ResNets and 𝜃 ∈ Θ specify the number of
stages, number of residual blocks for each stage, depth/width/filter sizes,
activation types, etc.
• Deterministic
• Stochastic network generator 𝑔
𝑔 ∶ Θ × 𝑆 ↦ 𝒩
where Θ : a parameter space, 𝑆 : seeds of a pseudo-random
number generator, 𝒩 : a family of related networks
• e.g. NAS. 𝜃: weight matrices of LSTM, The output of each LSTM time-
step is a probability distribution conditioned on 𝜃
STOCHASTIC NETWORK GENERATOR
Randomly Wired Neural Networks
• Turing’s unorganized machines, which is a form of the earliest
randomly connected neural network
• Infant human’s cortex: Small-world properties
• Random graph modeling has been used as a tool to study the neural
networks of human brains
• Random graph models are an effective tool for modeling and
analyzing real-world graphs, e.g., social networks, world wide web,
citation networks
MOTIVATION
Randomly Wired Neural Networks
1. Generating general graphs(DAG)
2. Mapping from a general graph to neural network
operations
• Edge operations
- Data flow
• Node operations
- Aggregation: the input data via a weighted sum (learnable
and positive)
- Transformation: ReLU-convolution-BN triplet
- Distribution: The same copy of the transformed data
is sent out
3. Attaching Input and Output nodes
4. Stages
METHODOLOGY
Randomly Wired Neural Networks
• Additive aggregation maintains the same number of output channels
as input channels.
• Transformed data can be combined with the data from any other
nodes.
• Fixing the channel count keeps the FLOPs and parameter count
unchanged for each node, regardless of its input and output degrees.
• The overall FLOPs and parameter count of a graph are roughly
proportional to the number of nodes and nearly independent of the
number of edges
• This enables the comparison of different graphs without
inflating/deflating model complexity. Differences in task
performance are therefore reflective of the properties of the
wiring pattern.
NICE PROPERTIES OF NODE OPERATION
Randomly Wired Neural Networks
• Input
• The same copy of the data flow
• Output
• average(unweighted) from all original
output nodes.
ATTACHING INPUT AND OUTPUT NODES
Randomly Wired Neural Networks
Extra input node
Extra output node
• An entire network consists of multiple stages.
• One random graph represents one stage
• For all nodes that are directly connected to the input node, their
transformations are modified to have a stride 2.
• The channel count in a random graph is increased by 2x when going
from one stage to the next stage.
STAGES
Randomly Wired Neural Networks
RandWire Architecture
RANDOMLY WIRED NEURAL NETWORKS
• Erdös-Rényi(ER), 1959.
• ER(N, P)
• Has N nodes
• An edge between two nodes is connected with probability P.
• The ER generation model has only a single parameter P, and is denoted
as ER(P).
• Any graph with N nodes has non-zero probability of being generated by
the ER model.
GENERATING GENERAL GRAPHS
Randomly Wired Neural Networks
• Barabási-Albert (BA), 1999.
• BA(N, M)
• 1 ≤ M < N
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝐺 𝑎𝑠 𝑀 𝑛𝑜𝑑𝑒𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑒𝑑𝑔𝑒
𝑖𝑡𝑒𝑟𝑎𝑡𝑒: 𝐴𝑑𝑑 𝑎 𝑛𝑜𝑑𝑒 𝑣 𝑡 𝑠. 𝑡.
𝑓𝑜𝑟 𝑛𝑜𝑑𝑒 𝑣 𝑖𝑛 𝐺
𝑐𝑜𝑛𝑛𝑒𝑐𝑡 𝑣 𝑎𝑛𝑑 𝑣 𝑡 𝑏𝑦
𝑃 𝑣 𝑡 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 ∝ 𝑑𝑒𝑔𝑟𝑒𝑒(𝑣)
𝑢𝑛𝑡𝑖𝑙 𝑣 𝑡 ℎ𝑎𝑠 𝑀 𝑒𝑑𝑔𝑒𝑠
𝑢𝑛𝑡𝑖𝑙 𝐺 ℎ𝑎𝑠 𝑁 𝑛𝑜𝑑𝑒𝑠
• has exactly M(N-M) edges. → a subset of all graphs with N nodes
GENERATING GENERAL GRAPHS
Randomly Wired Neural Networks
• Watts-Strogatz(WS), 1998.
• WS(N, K, P)
• “Small World” model.
 High clustering, small diameter
0. N nodes를 원형으로 나열
1. 각 node 별로 양쪽으로 K/2개의 nodes를 연결
2. 시계방향으로 돌면서 rewire with probability P (uniformly)
• Has N⋅K edges → Smaller subset of N-node Graph
GENERATING GENERAL GRAPHS
Randomly Wired Neural Networks
• a stochastic network generator 𝑔(𝜃, 𝑠).
• The random graph parameters, P, M, (K; P) in ER, BA, WS
respectively, are part of the parameters 𝜃.
• The “optimization” of such a 1-or 2-parameter space is essentially
done by trial-and-error by human designers. – line/grid search
• The accuracy variation of our networks is small for different seeds 𝑠 so
they perform no random search and report mean accuracy of
multiple random network instances.
DESIGN AND OPTIMIZATION
Randomly Wired Neural Networks
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
21
• Imagenet Classification
• A small computation regime – MobileNet& ShuffleNet
• A regular computation regime – ResNet-50/101
• N nodes, C channels determine network complexity.
• N = 32, C = 79 for the small regime.
• N = 32, C = 109 or 154 for the regular regime.
• Random Seeds
• Randomlysample 5 network instances, train them from scratch.
• Report the classification accuracy with “mean±std” for all 5 network instances.
• Implementation Details
• Train for 100 epochs
• Half-period-cosine shaped learning rate decay and initial learning rate 0.1
• The weight decay is 5e-5
• Momentum 0.9
• Label smoothing regularization with a coefficient of 0.1
ARCHITECTURE DETAILS
Experiments
• 모두 학습 성공
• ER, BA, WS 모두 특정 세팅에서 mean accuracy > 73%
• Accuracy의 variance가 작음(std : 0.2 ~ 0.4 %)
• Random generator 별로 평균적인 accuracy차이가 있음
IMAGENET CLASSIFICATION
Experiments
• Node remove
• WS
 the mean degradation of accuracy is larger when the output degree
of the removed node is higher.
 “hub” nodes in WS that send information to many nodes are
influential
GRAPH DAMAGE
Experiments
• Edge remove
• If the input degree of an edge’s target node is smaller, removing this
edge tends to change a larger portion of the target node’s inputs.
• ER
 less sensitive to edge removal, possibly because in ER’s definition
wiring of every edge is independent.
GRAPH DAMAGE
Experiments
• Same conv in all nodes
• adjust the factor C to keep the complexity of all alternative networks
• the Pearson correlation between any two series in Figure is 0:91 ~
0:98
NODE OPERATIONS
Experiments
• Small computation regime
COMPARISONS
Experiments
*250 epochs for fair comparisons
• Regular computation regime
• Use a regularization method inspired by edge removal analysis.
Randomly remove one edge whose target node has input degree > 1
with probability of 0.1.
COMPARISONS
Experiments
• Larger computation
• Increase the test image size to 320 x 320 without retraining
COMPARISONS
Experiments
• Object detection
• The features learned by randomly wired networks can also transfer.
COMPARISONS
Experiments
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
31
• The mean accuracy of these models is competitive with hand-
designed and optimized from NAS(Net).
• The authors hope that future work exploring new generator designs
may yield new, powerful networks designs.
• Contribution
• Layer type보다는 wiring pattern에 집중하여 search space를 잘 정의함
• 좋은 Search space를 찾는 것 만으로도 좋은 결과를 낼 수 있음
• (Stochastic) Network Generator 개념을 도입함
CONCLUSION
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
33
• Search Space
• 더 나은 search space를 찾는 아이디어
• Search Methods
• Prior knowledge: 의도나 해석이 없음
• 성능이 좋은 Network의 특성에 대한 연구가 있으면 좋을듯(AutoML의 문
제)
DISCUSSION
[1] Xie, S., Kirillov, A., Girshick, R., & He, K. (2019). Exploring Randomly
Wired Neural Networks for Image Recognition. arXiv preprint
arXiv:1904.01569.
[2] Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural Architecture
Search: A Survey. Journal of Machine Learning Research, 20(55), 1-21.
[3] Zoph, B., & Le, Q. V. (2017). Neural architecture search with
reinforcement learning. ICLR 2017.
[4] Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning
transferable architectures for scalable image recognition. In
Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 8697-8710).
[5] Jinwon, L. PR-155: Exploring Randomly Wired Neural Networks for
Image Recognition. https://www.youtube.com/watch?v=qnGm1h365tc
REFERENCES
ANY QUESTIONS?

Contenu connexe

Tendances

PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 

Tendances (20)

ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Cnn
CnnCnn
Cnn
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 

Similaire à Exploring Randomly Wired Neural Networks for Image Recognition

Similaire à Exploring Randomly Wired Neural Networks for Image Recognition (20)

Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatching
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
 
Neural
NeuralNeural
Neural
 
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
 
ANN load forecasting
ANN load forecastingANN load forecasting
ANN load forecasting
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
artificial neural network ppt.pptx
artificial neural network ppt.pptxartificial neural network ppt.pptx
artificial neural network ppt.pptx
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Deep Neural Networks.pptx
Deep Neural Networks.pptxDeep Neural Networks.pptx
Deep Neural Networks.pptx
 
Chapter 4 better.pptx
Chapter 4 better.pptxChapter 4 better.pptx
Chapter 4 better.pptx
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands Handwriting
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Exploring Randomly Wired Neural Networks for Image Recognition

  • 1. EXPLORING RANDOMLY WIRED NEURAL NETWORKS FOR IMAGE RECOGNITION 2019.5.7 Yongsu Baek
  • 2. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 2
  • 3. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 3
  • 4. • Automation of Feature Engineering • Moves to “Architecture Engineering” • Manually developed architectures(eg. AlexNet, DenseNet, …) DEEP LEARNING Neural Architecture Search(NAS)?
  • 5. • Automation of Architecture Engineering • Methods 1) Search Space • Prior Knowledge: Efficient Search vs. Human bias • E.g.) Cell/Block repetition 2) Search Strategy • E.g.) RL, Evolutionary methods, Random, … 3) Performance Estimation Strategy NEURAL ARCHITECTURE SEARCH Neural Architecture Search(NAS)?
  • 6. DEEP LEARNING TO NAS Neural Architecture Search(NAS)? Design an Individual Network → Design a Network Generator Automation of Feature Engineering → Automation of Architecture Engineering
  • 7. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 7
  • 8. • (NAS) Neural Architecture Search with Reinforcement Learning • Architecture Generator: LSTM • Architecture Search: Reinforcement Learning • 800 GPUs, 28 days • (NASNet) Learning transferable architectures for scalable image recognition • Cell Concept • Transferable architectures • 500 GPUs, 4 days • ENAS, MnasNet, etc. PREVIOUS WORKS
  • 9. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 9
  • 10. • Network Generator 𝑔 𝑔 ∶ Θ ↦ 𝒩 where Θ : a parameter space, 𝒩 : a family of related networks • e.g. In ResNet generator, 𝒩: ResNets and 𝜃 ∈ Θ specify the number of stages, number of residual blocks for each stage, depth/width/filter sizes, activation types, etc. • Deterministic • Stochastic network generator 𝑔 𝑔 ∶ Θ × 𝑆 ↦ 𝒩 where Θ : a parameter space, 𝑆 : seeds of a pseudo-random number generator, 𝒩 : a family of related networks • e.g. NAS. 𝜃: weight matrices of LSTM, The output of each LSTM time- step is a probability distribution conditioned on 𝜃 STOCHASTIC NETWORK GENERATOR Randomly Wired Neural Networks
  • 11. • Turing’s unorganized machines, which is a form of the earliest randomly connected neural network • Infant human’s cortex: Small-world properties • Random graph modeling has been used as a tool to study the neural networks of human brains • Random graph models are an effective tool for modeling and analyzing real-world graphs, e.g., social networks, world wide web, citation networks MOTIVATION Randomly Wired Neural Networks
  • 12. 1. Generating general graphs(DAG) 2. Mapping from a general graph to neural network operations • Edge operations - Data flow • Node operations - Aggregation: the input data via a weighted sum (learnable and positive) - Transformation: ReLU-convolution-BN triplet - Distribution: The same copy of the transformed data is sent out 3. Attaching Input and Output nodes 4. Stages METHODOLOGY Randomly Wired Neural Networks
  • 13. • Additive aggregation maintains the same number of output channels as input channels. • Transformed data can be combined with the data from any other nodes. • Fixing the channel count keeps the FLOPs and parameter count unchanged for each node, regardless of its input and output degrees. • The overall FLOPs and parameter count of a graph are roughly proportional to the number of nodes and nearly independent of the number of edges • This enables the comparison of different graphs without inflating/deflating model complexity. Differences in task performance are therefore reflective of the properties of the wiring pattern. NICE PROPERTIES OF NODE OPERATION Randomly Wired Neural Networks
  • 14. • Input • The same copy of the data flow • Output • average(unweighted) from all original output nodes. ATTACHING INPUT AND OUTPUT NODES Randomly Wired Neural Networks Extra input node Extra output node
  • 15. • An entire network consists of multiple stages. • One random graph represents one stage • For all nodes that are directly connected to the input node, their transformations are modified to have a stride 2. • The channel count in a random graph is increased by 2x when going from one stage to the next stage. STAGES Randomly Wired Neural Networks RandWire Architecture
  • 17. • Erdös-Rényi(ER), 1959. • ER(N, P) • Has N nodes • An edge between two nodes is connected with probability P. • The ER generation model has only a single parameter P, and is denoted as ER(P). • Any graph with N nodes has non-zero probability of being generated by the ER model. GENERATING GENERAL GRAPHS Randomly Wired Neural Networks
  • 18. • Barabási-Albert (BA), 1999. • BA(N, M) • 1 ≤ M < N 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝐺 𝑎𝑠 𝑀 𝑛𝑜𝑑𝑒𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑒𝑑𝑔𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑒: 𝐴𝑑𝑑 𝑎 𝑛𝑜𝑑𝑒 𝑣 𝑡 𝑠. 𝑡. 𝑓𝑜𝑟 𝑛𝑜𝑑𝑒 𝑣 𝑖𝑛 𝐺 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 𝑣 𝑎𝑛𝑑 𝑣 𝑡 𝑏𝑦 𝑃 𝑣 𝑡 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 ∝ 𝑑𝑒𝑔𝑟𝑒𝑒(𝑣) 𝑢𝑛𝑡𝑖𝑙 𝑣 𝑡 ℎ𝑎𝑠 𝑀 𝑒𝑑𝑔𝑒𝑠 𝑢𝑛𝑡𝑖𝑙 𝐺 ℎ𝑎𝑠 𝑁 𝑛𝑜𝑑𝑒𝑠 • has exactly M(N-M) edges. → a subset of all graphs with N nodes GENERATING GENERAL GRAPHS Randomly Wired Neural Networks
  • 19. • Watts-Strogatz(WS), 1998. • WS(N, K, P) • “Small World” model.  High clustering, small diameter 0. N nodes를 원형으로 나열 1. 각 node 별로 양쪽으로 K/2개의 nodes를 연결 2. 시계방향으로 돌면서 rewire with probability P (uniformly) • Has N⋅K edges → Smaller subset of N-node Graph GENERATING GENERAL GRAPHS Randomly Wired Neural Networks
  • 20. • a stochastic network generator 𝑔(𝜃, 𝑠). • The random graph parameters, P, M, (K; P) in ER, BA, WS respectively, are part of the parameters 𝜃. • The “optimization” of such a 1-or 2-parameter space is essentially done by trial-and-error by human designers. – line/grid search • The accuracy variation of our networks is small for different seeds 𝑠 so they perform no random search and report mean accuracy of multiple random network instances. DESIGN AND OPTIMIZATION Randomly Wired Neural Networks
  • 21. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 21
  • 22. • Imagenet Classification • A small computation regime – MobileNet& ShuffleNet • A regular computation regime – ResNet-50/101 • N nodes, C channels determine network complexity. • N = 32, C = 79 for the small regime. • N = 32, C = 109 or 154 for the regular regime. • Random Seeds • Randomlysample 5 network instances, train them from scratch. • Report the classification accuracy with “mean±std” for all 5 network instances. • Implementation Details • Train for 100 epochs • Half-period-cosine shaped learning rate decay and initial learning rate 0.1 • The weight decay is 5e-5 • Momentum 0.9 • Label smoothing regularization with a coefficient of 0.1 ARCHITECTURE DETAILS Experiments
  • 23. • 모두 학습 성공 • ER, BA, WS 모두 특정 세팅에서 mean accuracy > 73% • Accuracy의 variance가 작음(std : 0.2 ~ 0.4 %) • Random generator 별로 평균적인 accuracy차이가 있음 IMAGENET CLASSIFICATION Experiments
  • 24. • Node remove • WS  the mean degradation of accuracy is larger when the output degree of the removed node is higher.  “hub” nodes in WS that send information to many nodes are influential GRAPH DAMAGE Experiments
  • 25. • Edge remove • If the input degree of an edge’s target node is smaller, removing this edge tends to change a larger portion of the target node’s inputs. • ER  less sensitive to edge removal, possibly because in ER’s definition wiring of every edge is independent. GRAPH DAMAGE Experiments
  • 26. • Same conv in all nodes • adjust the factor C to keep the complexity of all alternative networks • the Pearson correlation between any two series in Figure is 0:91 ~ 0:98 NODE OPERATIONS Experiments
  • 27. • Small computation regime COMPARISONS Experiments *250 epochs for fair comparisons
  • 28. • Regular computation regime • Use a regularization method inspired by edge removal analysis. Randomly remove one edge whose target node has input degree > 1 with probability of 0.1. COMPARISONS Experiments
  • 29. • Larger computation • Increase the test image size to 320 x 320 without retraining COMPARISONS Experiments
  • 30. • Object detection • The features learned by randomly wired networks can also transfer. COMPARISONS Experiments
  • 31. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 31
  • 32. • The mean accuracy of these models is competitive with hand- designed and optimized from NAS(Net). • The authors hope that future work exploring new generator designs may yield new, powerful networks designs. • Contribution • Layer type보다는 wiring pattern에 집중하여 search space를 잘 정의함 • 좋은 Search space를 찾는 것 만으로도 좋은 결과를 낼 수 있음 • (Stochastic) Network Generator 개념을 도입함 CONCLUSION
  • 33. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 33
  • 34. • Search Space • 더 나은 search space를 찾는 아이디어 • Search Methods • Prior knowledge: 의도나 해석이 없음 • 성능이 좋은 Network의 특성에 대한 연구가 있으면 좋을듯(AutoML의 문 제) DISCUSSION
  • 35. [1] Xie, S., Kirillov, A., Girshick, R., & He, K. (2019). Exploring Randomly Wired Neural Networks for Image Recognition. arXiv preprint arXiv:1904.01569. [2] Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55), 1-21. [3] Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. ICLR 2017. [4] Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697-8710). [5] Jinwon, L. PR-155: Exploring Randomly Wired Neural Networks for Image Recognition. https://www.youtube.com/watch?v=qnGm1h365tc REFERENCES

Notes de l'éditeur

  1. Deep learning은 Feature engineering을 자동화하는데 기여함 하지만 이는 곧 직접 network architecture를 구성하는 Architecture engineering으로 변질됨 많은 network architecture들이 개발됐지만 time-consuming한 작업이고 error-prune한 작업임
  2. Layer 길이가 7인 DNN등 space를 제한 가능. 사전 지식(conv쓰는게 좋음, 3x3 conv가 좋음, BN쓰면 좋음 등)을 반영하면 search space를 줄일 수 있어 효율적일 수 있지만 이 또한 novel architecture 탐색을 방해하는 human bias가 될 수 있음. 좋은 예시: Cell/Block을 학습 시켜서 반복시키면 다른 data에 transfer가능하면서도 space reduction하면서 좋은 성능 유지 가능. RL, Random pick, evolutionary methods 등 사용 가능. Search space는 보통 exponentially 크거나 unbounded -> 잘 찾아야함; exploration-exploitation trade-off 보통은 그냥 training-validation을 거치는데 최근 이 과정을 효율적으로 만들기 위한 시도가 있음
  3. “Connectionist” Approach
  4. Swish와 같은 activation function이나 Auto augment와 같은 augmentation도 찾아내곤 함 결과적으로 한 형태의 convolution이나 layer크기들만 사용하게 됨 지금까지의 works는 search space에 대한 조절을 layer위주로 하거나 search strategy 위주의 연구가 많았음. 저자는 wiring에 대한 영향이 궁금함 (이제부터 NAS는 Neural Architecture Search with Reinforcement Learning)
  5. Relu를 마지막에 두면 weight가 positive여서 positive 가 계속 더해지게 돼서 값이 계속 커짐-> BN을 마지막에 둬서 조절함
  6. Issue: 특별한 형태의 확률이 낮고 평균적으로 비슷한 그래프를 만들듯 P > ln(N)/N 이면 single component(connected)
  7. This gives one example on how an underlying prior can be introduced by the graph generator in spite of randomness. 친구가 많은 애들이랑 연결될 확률이 높음
  8. “Rewiring” is defined as uniformly choosing a random node that is not v and that is not a duplicate edge.
  9. We randomly remove one node (top) or remove one edge (bottom) from a graph after the network is trained, and evaluate the loss (delta) in accuracy on ImageNet. Red circle: mean; gray bar: median; orange box: interquartile range; blue dot: an individual damaged instance.
  10. We randomly remove one node (top) or remove one edge (bottom) from a graph after the network is trained, and evaluate the loss (delta) in accuracy on ImageNet. Red circle: mean; gray bar: median; orange box: interquartile range; blue dot: an individual damaged instance.
  11. Regularization의 효과를 비교하는 실험도 했으면 좋겠다 그런데 내가 재현하기가 너무 어렵다