Exploring Randomly Wired Neural Networks for Image Recognition

EXPLORING RANDOMLY WIRED
NEURAL NETWORKS FOR
IMAGE RECOGNITION
2019.5.7
Yongsu Baek

1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
2

2.Previous Works
4.Experiments
5.Conclusion
6.Discussion
3

• Automation of Feature Engineering
• Moves to “Architecture Engineering”
• Manually developed architectures(eg. AlexNet,
DenseNet, …)
DEEP LEARNING
Neural Architecture Search(NAS)?

• Automation of Architecture Engineering
• Methods
1) Search Space
• Prior Knowledge: Efficient Search vs. Human bias
• E.g.) Cell/Block repetition
2) Search Strategy
• E.g.) RL, Evolutionary methods, Random, …
3) Performance Estimation Strategy
NEURAL ARCHITECTURE SEARCH

DEEP LEARNING TO NAS
Design
an Individual Network
→
Design
a Network Generator
Automation of
Feature Engineering
→
Automation of
Architecture Engineering

2.Previous Works
4.Experiments
5.Conclusion
6.Discussion
7

• (NAS) Neural Architecture Search with Reinforcement Learning
• Architecture Generator: LSTM
• Architecture Search: Reinforcement Learning
• 800 GPUs, 28 days
• (NASNet) Learning transferable architectures for scalable image
recognition
• Cell Concept
• Transferable architectures
• 500 GPUs, 4 days
• ENAS, MnasNet, etc.
PREVIOUS WORKS

2.Previous Works
4.Experiments
5.Conclusion
6.Discussion
9

• Network Generator 𝑔
𝑔 ∶ Θ ↦ 𝒩
where Θ : a parameter space, 𝒩 : a family of related networks
• e.g. In ResNet generator, 𝒩: ResNets and 𝜃 ∈ Θ specify the number of
stages, number of residual blocks for each stage, depth/width/filter sizes,
activation types, etc.
• Deterministic
• Stochastic network generator 𝑔
𝑔 ∶ Θ × 𝑆 ↦ 𝒩
where Θ : a parameter space, 𝑆 : seeds of a pseudo-random
number generator, 𝒩 : a family of related networks
• e.g. NAS. 𝜃: weight matrices of LSTM, The output of each LSTM time-
step is a probability distribution conditioned on 𝜃
STOCHASTIC NETWORK GENERATOR
Randomly Wired Neural Networks

• Turing’s unorganized machines, which is a form of the earliest
randomly connected neural network
• Infant human’s cortex: Small-world properties
• Random graph modeling has been used as a tool to study the neural
networks of human brains
• Random graph models are an effective tool for modeling and
analyzing real-world graphs, e.g., social networks, world wide web,
citation networks
MOTIVATION

1. Generating general graphs(DAG)
2. Mapping from a general graph to neural network
operations
• Edge operations
- Data flow
• Node operations
- Aggregation: the input data via a weighted sum (learnable
and positive)
- Transformation: ReLU-convolution-BN triplet
- Distribution: The same copy of the transformed data
is sent out
3. Attaching Input and Output nodes
4. Stages
METHODOLOGY

• Additive aggregation maintains the same number of output channels
as input channels.
• Transformed data can be combined with the data from any other
nodes.
• Fixing the channel count keeps the FLOPs and parameter count
unchanged for each node, regardless of its input and output degrees.
• The overall FLOPs and parameter count of a graph are roughly
proportional to the number of nodes and nearly independent of the
number of edges
• This enables the comparison of different graphs without
inflating/deflating model complexity. Differences in task
performance are therefore reflective of the properties of the
wiring pattern.
NICE PROPERTIES OF NODE OPERATION

• Input
• The same copy of the data flow
• Output
• average(unweighted) from all original
output nodes.
ATTACHING INPUT AND OUTPUT NODES
Extra input node
Extra output node

• An entire network consists of multiple stages.
• One random graph represents one stage
• For all nodes that are directly connected to the input node, their
transformations are modified to have a stride 2.
• The channel count in a random graph is increased by 2x when going
from one stage to the next stage.
STAGES
RandWire Architecture

RANDOMLY WIRED NEURAL NETWORKS

• Erdös-Rényi(ER), 1959.
• ER(N, P)
• Has N nodes
• An edge between two nodes is connected with probability P.
• The ER generation model has only a single parameter P, and is denoted
as ER(P).
• Any graph with N nodes has non-zero probability of being generated by
the ER model.
GENERATING GENERAL GRAPHS

• Barabási-Albert (BA), 1999.
• BA(N, M)
• 1 ≤ M < N
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝐺 𝑎𝑠 𝑀 𝑛𝑜𝑑𝑒𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑒𝑑𝑔𝑒
𝑖𝑡𝑒𝑟𝑎𝑡𝑒: 𝐴𝑑𝑑 𝑎 𝑛𝑜𝑑𝑒 𝑣 𝑡 𝑠. 𝑡.
𝑓𝑜𝑟 𝑛𝑜𝑑𝑒 𝑣 𝑖𝑛 𝐺
𝑐𝑜𝑛𝑛𝑒𝑐𝑡 𝑣 𝑎𝑛𝑑 𝑣 𝑡 𝑏𝑦
𝑃 𝑣 𝑡 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 ∝ 𝑑𝑒𝑔𝑟𝑒𝑒(𝑣)
𝑢𝑛𝑡𝑖𝑙 𝑣 𝑡 ℎ𝑎𝑠 𝑀 𝑒𝑑𝑔𝑒𝑠
𝑢𝑛𝑡𝑖𝑙 𝐺 ℎ𝑎𝑠 𝑁 𝑛𝑜𝑑𝑒𝑠
• has exactly M(N-M) edges. → a subset of all graphs with N nodes

• Watts-Strogatz(WS), 1998.
• WS(N, K, P)
• “Small World” model.
 High clustering, small diameter
0. N nodes를 원형으로 나열
1. 각 node 별로 양쪽으로 K/2개의 nodes를 연결
2. 시계방향으로 돌면서 rewire with probability P (uniformly)
• Has N⋅K edges → Smaller subset of N-node Graph

• a stochastic network generator 𝑔(𝜃, 𝑠).
• The random graph parameters, P, M, (K; P) in ER, BA, WS
respectively, are part of the parameters 𝜃.
• The “optimization” of such a 1-or 2-parameter space is essentially
done by trial-and-error by human designers. – line/grid search
• The accuracy variation of our networks is small for different seeds 𝑠 so
they perform no random search and report mean accuracy of
multiple random network instances.
DESIGN AND OPTIMIZATION

2.Previous Works
4.Experiments
5.Conclusion
6.Discussion
21

• Imagenet Classification
• A small computation regime – MobileNet& ShuffleNet
• A regular computation regime – ResNet-50/101
• N nodes, C channels determine network complexity.
• N = 32, C = 79 for the small regime.
• N = 32, C = 109 or 154 for the regular regime.
• Random Seeds
• Randomlysample 5 network instances, train them from scratch.
• Report the classification accuracy with “mean±std” for all 5 network instances.
• Implementation Details
• Train for 100 epochs
• Half-period-cosine shaped learning rate decay and initial learning rate 0.1
• The weight decay is 5e-5
• Momentum 0.9
• Label smoothing regularization with a coefficient of 0.1
ARCHITECTURE DETAILS
Experiments

• 모두 학습 성공
• ER, BA, WS 모두 특정 세팅에서 mean accuracy > 73%
• Accuracy의 variance가 작음(std : 0.2 ~ 0.4 %)
• Random generator 별로 평균적인 accuracy차이가 있음
IMAGENET CLASSIFICATION
Experiments

• Node remove
• WS
 the mean degradation of accuracy is larger when the output degree
of the removed node is higher.
 “hub” nodes in WS that send information to many nodes are
influential
GRAPH DAMAGE
Experiments

• Edge remove
• If the input degree of an edge’s target node is smaller, removing this
edge tends to change a larger portion of the target node’s inputs.
• ER
 less sensitive to edge removal, possibly because in ER’s definition
wiring of every edge is independent.
GRAPH DAMAGE
Experiments

• Same conv in all nodes
• adjust the factor C to keep the complexity of all alternative networks
• the Pearson correlation between any two series in Figure is 0:91 ~
0:98
NODE OPERATIONS
Experiments

• Small computation regime
COMPARISONS
Experiments
*250 epochs for fair comparisons

• Regular computation regime
• Use a regularization method inspired by edge removal analysis.
Randomly remove one edge whose target node has input degree > 1
with probability of 0.1.
COMPARISONS
Experiments

• Larger computation
• Increase the test image size to 320 x 320 without retraining
COMPARISONS
Experiments

• Object detection
• The features learned by randomly wired networks can also transfer.
COMPARISONS
Experiments

2.Previous Works
4.Experiments
5.Conclusion
6.Discussion
31

• The mean accuracy of these models is competitive with hand-
designed and optimized from NAS(Net).
• The authors hope that future work exploring new generator designs
may yield new, powerful networks designs.
• Contribution
• Layer type보다는 wiring pattern에 집중하여 search space를 잘 정의함
• 좋은 Search space를 찾는 것 만으로도 좋은 결과를 낼 수 있음
• (Stochastic) Network Generator 개념을 도입함
CONCLUSION

2.Previous Works
4.Experiments
5.Conclusion
6.Discussion
33

• Search Space
• 더 나은 search space를 찾는 아이디어
• Search Methods
• Prior knowledge: 의도나 해석이 없음
• 성능이 좋은 Network의 특성에 대한 연구가 있으면 좋을듯(AutoML의 문
제)
DISCUSSION

[1] Xie, S., Kirillov, A., Girshick, R., & He, K. (2019). Exploring Randomly
Wired Neural Networks for Image Recognition. arXiv preprint
arXiv:1904.01569.
[2] Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural Architecture
Search: A Survey. Journal of Machine Learning Research, 20(55), 1-21.
[3] Zoph, B., & Le, Q. V. (2017). Neural architecture search with
reinforcement learning. ICLR 2017.
[4] Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning
transferable architectures for scalable image recognition. In
Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 8697-8710).
[5] Jinwon, L. PR-155: Exploring Randomly Wired Neural Networks for
Image Recognition. https://www.youtube.com/watch?v=qnGm1h365tc
REFERENCES

Exploring Randomly Wired Neural Networks for Image Recognition

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Exploring Randomly Wired Neural Networks for Image Recognition

Similaire à Exploring Randomly Wired Neural Networks for Image Recognition (20)

Dernier

Dernier (20)

Exploring Randomly Wired Neural Networks for Image Recognition

Notes de l'éditeur