SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
†
‡
† †‡ †
nofDepth
,96,/4,pool/2
256,pool/2
nv,384
nv,384
256,pool/2
4096
4096
1000
3x3conv,64
3x3conv,64,pool/2
3x3conv,128
3x3conv,128,pool/2
3x3conv,256
3x3conv,256
3x3conv,256
3x3conv,256,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
3x3conv,512
3x3conv,512
3x3conv,512
3x3conv,512,pool/2
fc,4096
fc,4096
fc,1000
VGG,19layers
(ILSVRC2014)
input
Conv
7x7+2(S)
MaxPool
3x3+2(S)
LocalRespNorm
Conv
1x1+1(V)
Conv
3x3+1(S)
LocalRespNorm
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
AveragePool
5x5+3(V)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
AveragePool
5x5+3(V)
DepthConcat
MaxPool
3x3+2(S)
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
ConvConvConvConv
1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S)
ConvConvMaxPool
1x1+1(S)1x1+1(S)3x3+1(S)
DepthConcat
AveragePool
7x7+1(V)
FC
Conv
1x1+1(S)
FC
FC
SoftmaxActivation
softmax0
Conv
1x1+1(S)
FC
FC
SoftmaxActivation
softmax1
SoftmaxActivation
softmax2
GoogleNet,22layers
(ILSVRC2014)
KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x1conv,64
3x3conv,64
1x1conv,256
1x2conv,128,/2
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,128
3x3conv,128
1x1conv,512
1x1conv,256,/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
7x7conv,64,/2,pool/2
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,256
3x3conv,256
1x1conv,1024
1x1conv,512,/2
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
1x1conv,512
3x3conv,512
1x1conv,2048
avepool,fc1000
ageRecognition”.arXi
et.al
w1 w2 w3
w1
w2
w3
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DE
tion, outputting the classification scores using global average pooling or global max p
from the feature map f (·). However, global average pooling increases in the respons
of entire feature map at specific class due to using an average of all pixel at a featur
On the other hand, global max pooling does not increase the entire feature map at s
class because of using a maximum pixel value in a feature map. Response score fo
class of global average pooling and global max pooling is calculated as follow Eq. (1
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
After outputting the score for each class, the attention of pedestrian and occlusion r
are generated. First, we fuse the multiple channel feature map to one channel. In this
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) so
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is
summation of feature map. In softmax weighting, it is weighted the feature maps fo
channel using softmax score by Eq. (2). The softmax weighting can mask the unnec
channel feature map. In SE block fusion, it is weighted the feature maps for each c
using the attention of SE block like Squeeze-and-Excitation Network. After fusing
channel, pedestrian classification and occlusion state attentions are fused. In this wo
calculate the attention by subtracting the occlusion attention from pedestrian classifi
attention. Here, we call the attention the attention map because of containing positi
negative values.
Attentioni =
C
∑
c=1
fc
(xi)∗
exp(vc
i )
∑J
j=1 exp vj
i
3.4 Perception branch
In the perception branch, it outputs the final result score using attention map and featu
of RoI pooling. Attention map can refine the feature map of RoI pooling, such as m
unnecessary background feature and enhancing the important locations. Converted
map is made of the inner product of attention map and feature map from RoI poolin
perception branch is composed two fully connected layers like Fast R-CNN. The struc
the perception branch is the same as conventional Fast R-CNN, however, our model e
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
Anonymous CVPR submission
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
f(xi) (4)
f (xi, yi) (5)
2. Concolusion
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
References
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
References
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
C (4)
f (xi, yi) (5)
2. Concolusion
90
91
92
93
94
95
96
97
98
99
00
01
02
03
04
05
06
07
08
09
10
11
12
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
summation of feature map. In softmax weighting, it is weighted the feature maps for each
channel using softmax score by Eq. (2). The softmax weighting can mask the unnecessary
channel feature map. In SE block fusion, it is weighted the feature maps for each channel
using the attention of SE block like Squeeze-and-Excitation Network. After fusing to one
channel, pedestrian classification and occlusion state attentions are fused. In this work, we
calculate the attention by subtracting the occlusion attention from pedestrian classification
attention. Here, we call the attention the attention map because of containing positive and
negative values.
Attentioni =
C
∑
c=1
fc
(xi)∗
exp(vc
i )
∑J
j=1 exp vj
i
(2)
3.4 Perception branch
In the perception branch, it outputs the final result score using attention map and feature map
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5
tion, outputting the classification scores using global average pooling or global max pooling
from the feature map f (·). However, global average pooling increases in the response value
of entire feature map at specific class due to using an average of all pixel at a feature map.
On the other hand, global max pooling does not increase the entire feature map at specific
class because of using a maximum pixel value in a feature map. Response score for each
class of global average pooling and global max pooling is calculated as follow Eq. (1).
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5
tion, outputting the classification scores using global average pooling or global max pooling
from the feature map f (·). However, global average pooling increases in the response value
of entire feature map at specific class due to using an average of all pixel at a feature map.
On the other hand, global max pooling does not increase the entire feature map at specific
class because of using a maximum pixel value in a feature map. Response score for each
class of global average pooling and global max pooling is calculated as follow Eq. (1).
vc
i =
1
M×N ∑M
m=1 ∑N
n=1 fc
m,n (xi) (global average pooling),
max fc
m,n (xi) (global max pooling),
(1)
After outputting the score for each class, the attention of pedestrian and occlusion regions
are generated. First, we fuse the multiple channel feature map to one channel. In this work,
we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax-
weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply
summation of feature map. In softmax weighting, it is weighted the feature maps for each
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
v1
i , v2
i , v3
i , vc
i (3)
f(xi) (4)
f (xi, yi) (5)
2. Concolusion
References
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
How Small Network Can Detect Ped
Anonymous CVPR submission
Paper ID ****
Abstract
1. Introduction
t log y + (1 − t) log (1 − y) (1)
vc
i =
1
M × N
M
m=1
N
n=1
fc
m,n(xi) (2)
M, N (3)
C (4)
Table 1. Classification error on the ILSVRC validation set.
Networks top-1 val. error top-5 val. error
VGGnet-GAP 33.4 12.2
GoogLeNet-GAP 35.0 13.2
AlexNet∗-GAP 44.9 20.9
AlexNet-GAP 51.1 26.3
GoogLeNet 31.9 11.3
VGGnet 31.2 11.4
AlexNet 42.6 19.5
NIN 41.9 19.6
GoogLeNet-GMP 35.6 13.9
Table 2. Localization error on the ILSVRC validation set. Bac
prop refers to using [23] for localization instead of CAM.
Method top-1 val.error top-5 val. error
GoogLeNet-GAP 56.40 43.00
VGGnet-GAP 57.20 45.14
GoogLeNet 60.09 49.34
AlexNet∗-GAP 63.75 49.53
AlexNet-GAP 67.19 52.16
NIN 65.47 54.19
Backprop on GoogLeNet 61.31 50.55
Lall(x) = Eatt(x) + Eper(x)
Eper(x)
Eatt(x)
g(x)
M(x)
g′(x)
g′(x) = (1 + M(x)) ⋅ g(x)
irshick2
Piotr Doll´ar2
Zhuowen Tu1
Kaiming He2
C San Diego 2
Facebook AI Research
@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com
rized network archi-
etwork is constructed
egates a set of trans-
ur simple design re-
architecture that has
is strategy exposes a
ality” (the size of the
factor in addition to
On the ImageNet-1K
under the restricted
ncreasing cardinality
racy. Moreover, in-
han going deeper or
Our models, named
entry to the ILSVRC
secured 2nd place.
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
+
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
....
total 32
paths
256-d in
+
256, 1x1, 64
64, 3x3, 64
64, 1x1, 256
+
256-d in
256-d out
256-d out
Figure 1. Left: A block of ResNet [14]. Right: A block of
ResNeXt with cardinality = 32, with roughly the same complex-
ity. A layer is shown as (# in channels, filter size, # out channels).
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topol-
ogy. This simple rule reduces the free choices of hyper-
parameters, and depth is exposed as an essential dimension
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie1
Ross Girshick2
Piotr Doll´ar2
Zhuowen Tu1
Kaiming He2
1
UC San Diego 2
Facebook AI Research
{s9xie,ztu}@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com
Abstract
We present a simple, highly modularized network archi-
tecture for image classification. Our network is constructed
by repeating a building block that aggregates a set of trans-
formations with the same topology. Our simple design re-
sults in a homogeneous, multi-branch architecture that has
only a few hyper-parameters to set. This strategy exposes a
new dimension, which we call “cardinality” (the size of the
set of transformations), as an essential factor in addition to
the dimensions of depth and width. On the ImageNet-1K
dataset, we empirically show that even under the restricted
condition of maintaining complexity, increasing cardinality
is able to improve classification accuracy. Moreover, in-
creasing cardinality is more effective than going deeper or
wider when we increase the capacity. Our models, named
ResNeXt, are the foundations of our entry to the ILSVRC
2016 classification task in which we secured 2nd place.
We further investigate ResNeXt on an ImageNet-5K set and
the COCO detection set, also showing better results than
its ResNet counterpart. The code and models are publicly
available online1
.
1. Introduction
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
+
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
256, 1x1, 4
4, 3x3, 4
4, 1x1, 256
....
total 32
paths
256-d in
+
256, 1x1, 64
64, 3x3, 64
64, 1x1, 256
+
256-d in
256-d out
256-d out
Figure 1. Left: A block of ResNet [14]. Right: A block of
ResNeXt with cardinality = 32, with roughly the same complex-
ity. A layer is shown as (# in channels, filter size, # out channels).
ing blocks of the same shape. This strategy is inherited
by ResNets [14] which stack modules of the same topol-
ogy. This simple rule reduces the free choices of hyper-
parameters, and depth is exposed as an essential dimension
in neural networks. Moreover, we argue that the simplicity
of this rule may reduce the risk of over-adapting the hyper-
parameters to a specific dataset. The robustness of VGG-
nets and ResNets has been proven by various visual recog-
nition tasks [7, 10, 9, 28, 31, 14] and by non-visual tasks
involving speech [42, 30] and language [4, 41, 20].
Unlike VGG-nets, the family of Inception models [38,
17, 39, 37] have demonstrated that carefully designed
v:1611.05431v2[cs.CV]11Apr2017
Densely Connected Convolutional Networks
Gao Huang⇤
Cornell University
gh349@cornell.edu
Zhuang Liu⇤
Tsinghua University
liuzhuang13@mails.tsinghua.edu.cn
Laurens van der Maaten
Facebook AI Research
lvdmaaten@fb.com
Kilian Q. Weinberger
Cornell University
kqw4@cornell.edu
Abstract
Recent work has shown that convolutional networks can
be substantially deeper, more accurate, and efficient to train
if they contain shorter connections between layers close to
the input and those close to the output. In this paper, we
embrace this observation and introduce the Dense Convo-
lutional Network (DenseNet), which connects each layer
to every other layer in a feed-forward fashion. Whereas
traditional convolutional networks with L layers have L
connections—one between each layer and its subsequent
layer—our network has L(L+1)
2 direct connections. For
each layer, the feature-maps of all preceding layers are
used as inputs, and its own feature-maps are used as inputs
into all subsequent layers. DenseNets have several com-
pelling advantages: they alleviate the vanishing-gradient
problem, strengthen feature propagation, encourage fea-
ture reuse, and substantially reduce the number of parame-
ters. We evaluate our proposed architecture on four highly
competitive object recognition benchmark tasks (CIFAR-10,
CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig-
nificant improvements over the state-of-the-art on most of
them, whilst requiring less computation to achieve high per-
formance. Code and pre-trained models are available at
https://github.com/liuzhuang13/DenseNet.
1. Introduction
Convolutional neural networks (CNNs) have become
the dominant machine learning approach for visual object
recognition. Although they were originally introduced over
20 years ago [18], improvements in computer hardware and
network structure have enabled the training of truly deep
CNNs only recently. The original LeNet5 [19] consisted of
5 layers, VGG featured 19 [29], and only last year Highway
⇤Authors contributed equally
x0
x1
H1
x2
H2
H3
H4
x3
x4
Figure 1: A 5-layer dense block with a growth rate of k = 4.
Each layer takes all preceding feature-maps as input.
Networks [34] and Residual Networks (ResNets) [11] have
surpassed the 100-layer barrier.
As CNNs become increasingly deep, a new research
problem emerges: as information about the input or gra-
dient passes through many layers, it can vanish and “wash
out” by the time it reaches the end (or beginning) of the
network. Many recent publications address this or related
problems. ResNets [11] and Highway Networks [34] by-
pass signal from one layer to the next via identity connec-
tions. Stochastic depth [13] shortens ResNets by randomly
dropping layers during training to allow better information
and gradient flow. FractalNets [17] repeatedly combine sev-
eral parallel layer sequences with different number of con-
volutional blocks to obtain a large nominal depth, while
maintaining many short paths in the network. Although
these different approaches vary in network topology and
training procedure, they all share a key characteristic: they
create short paths from early layers to later layers.
1
arXiv:1608.06993v5[cs.CV]28Jan2018
tanh
× Σ
f(st)
g(st)
g′(st)
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

Contenu connexe

Tendances

[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by FactorisingDeep Learning JP
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision TransformerYusuke Uchida
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイcvpaper. challenge
 
マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向Koichiro Mori
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向Kensho Hara
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究についてMasahiro Suzuki
 
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...SSII
 
[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image Classification[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image ClassificationDeep Learning JP
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for VisionDeep Learning JP
 
分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17Takuya Akiba
 
【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models
【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models
【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Modelscvpaper. challenge
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化Yusuke Uchida
 
最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情Yuta Kikuchi
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法Deep Learning JP
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...Deep Learning JP
 
[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video Recognition[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video RecognitionDeep Learning JP
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用Yoshitaka Ushiku
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"Deep Learning JP
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方joisino
 

Tendances (20)

[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
 
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
 
[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image Classification[DL輪読会] Residual Attention Network for Image Classification
[DL輪読会] Residual Attention Network for Image Classification
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision
 
分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17分散深層学習 @ NIPS'17
分散深層学習 @ NIPS'17
 
【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models
【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models
【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models
 
実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE実装レベルで学ぶVQVAE
実装レベルで学ぶVQVAE
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化
 
最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
 
[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video Recognition[DL輪読会]SlowFast Networks for Video Recognition
[DL輪読会]SlowFast Networks for Video Recognition
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
 

Similaire à [MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEThiyagarajan G
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoderijsrd.com
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfNesrine Wagaa
 
On Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingOn Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingAndrea Tassi
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphspione30
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysislalitxp
 
Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problemAnkit Katiyar
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingMartino Ferrari
 
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONTWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONVLSICS Design
 
Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)MichaelDang47
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...zammok
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...cscpconf
 
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationTwo Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationVLSICS Design
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksSteve Nouri
 

Similaire à [MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network (20)

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEAU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdf
 
On Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service MulticastingOn Optimization of Network-coded Scalable Multimedia Service Multicasting
On Optimization of Network-coded Scalable Multimedia Service Multicasting
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysis
 
Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problem
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive Subsampling
 
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATIONTWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
TWO DIMENSIONAL MODELING OF NONUNIFORMLY DOPED MESFET UNDER ILLUMINATION
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)Chapter 3 finite difference calculus (temporarily)
Chapter 3 finite difference calculus (temporarily)
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...Solution manual for theory and applications of digital speech processing lawr...
Solution manual for theory and applications of digital speech processing lawr...
 
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
COMPARATIVE STUDY ON BENDING LOSS BETWEEN DIFFERENT S-SHAPED WAVEGUIDE BENDS ...
 
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under IlluminationTwo Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
Two Dimensional Modeling of Nonuniformly Doped MESFET Under Illumination
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Cheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networksCheatsheet convolutional-neural-networks
Cheatsheet convolutional-neural-networks
 
UDSLF
UDSLFUDSLF
UDSLF
 

Plus de Hiroshi Fukui

最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural NetworkHiroshi Fukui
 
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep LearningHiroshi Fukui
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向Hiroshi Fukui
 
CVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたCVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたHiroshi Fukui
 
2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料Hiroshi Fukui
 

Plus de Hiroshi Fukui (6)

最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural Network
 
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
[名古屋CV・PRML勉強会] ゼロからはじめたいDeep Learning
 
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
[サーベイ論文] Deep Learningを用いた歩行者検出の研究動向
 
CVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみたCVPR2016を自分なりにまとめてみた
CVPR2016を自分なりにまとめてみた
 
2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料2016/4/16 名古屋CVPRML 発表資料
2016/4/16 名古屋CVPRML 発表資料
 

Dernier

User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 

Dernier (20)

User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 

[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network

  • 2. nofDepth ,96,/4,pool/2 256,pool/2 nv,384 nv,384 256,pool/2 4096 4096 1000 3x3conv,64 3x3conv,64,pool/2 3x3conv,128 3x3conv,128,pool/2 3x3conv,256 3x3conv,256 3x3conv,256 3x3conv,256,pool/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512,pool/2 3x3conv,512 3x3conv,512 3x3conv,512 3x3conv,512,pool/2 fc,4096 fc,4096 fc,1000 VGG,19layers (ILSVRC2014) input Conv 7x7+2(S) MaxPool 3x3+2(S) LocalRespNorm Conv 1x1+1(V) Conv 3x3+1(S) LocalRespNorm MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) AveragePool 5x5+3(V) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) AveragePool 5x5+3(V) DepthConcat MaxPool 3x3+2(S) ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat ConvConvConvConv 1x1+1(S)3x3+1(S)5x5+1(S)1x1+1(S) ConvConvMaxPool 1x1+1(S)1x1+1(S)3x3+1(S) DepthConcat AveragePool 7x7+1(V) FC Conv 1x1+1(S) FC FC SoftmaxActivation softmax0 Conv 1x1+1(S) FC FC SoftmaxActivation softmax1 SoftmaxActivation softmax2 GoogleNet,22layers (ILSVRC2014) KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015. 1x1conv,64 3x3conv,64 1x1conv,256 1x1conv,64 3x3conv,64 1x1conv,256 1x1conv,64 3x3conv,64 1x1conv,256 1x2conv,128,/2 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,128 3x3conv,128 1x1conv,512 1x1conv,256,/2 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 7x7conv,64,/2,pool/2 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,256 3x3conv,256 1x1conv,1024 1x1conv,512,/2 3x3conv,512 1x1conv,2048 1x1conv,512 3x3conv,512 1x1conv,2048 1x1conv,512 3x3conv,512 1x1conv,2048 avepool,fc1000 ageRecognition”.arXi
  • 5. 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DE tion, outputting the classification scores using global average pooling or global max p from the feature map f (·). However, global average pooling increases in the respons of entire feature map at specific class due to using an average of all pixel at a featur On the other hand, global max pooling does not increase the entire feature map at s class because of using a maximum pixel value in a feature map. Response score fo class of global average pooling and global max pooling is calculated as follow Eq. (1 vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), After outputting the score for each class, the attention of pedestrian and occlusion r are generated. First, we fuse the multiple channel feature map to one channel. In this we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) so weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is summation of feature map. In softmax weighting, it is weighted the feature maps fo channel using softmax score by Eq. (2). The softmax weighting can mask the unnec channel feature map. In SE block fusion, it is weighted the feature maps for each c using the attention of SE block like Squeeze-and-Excitation Network. After fusing channel, pedestrian classification and occlusion state attentions are fused. In this wo calculate the attention by subtracting the occlusion attention from pedestrian classifi attention. Here, we call the attention the attention map because of containing positi negative values. Attentioni = C ∑ c=1 fc (xi)∗ exp(vc i ) ∑J j=1 exp vj i 3.4 Perception branch In the perception branch, it outputs the final result score using attention map and featu of RoI pooling. Attention map can refine the feature map of RoI pooling, such as m unnecessary background feature and enhancing the important locations. Converted map is made of the inner product of attention map and feature map from RoI poolin perception branch is composed two fully connected layers like Fast R-CNN. The struc the perception branch is the same as conventional Fast R-CNN, however, our model e 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 Anonymous CVPR submission Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) f(xi) (4) f (xi, yi) (5) 2. Concolusion 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion References 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion References 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) C (4) f (xi, yi) (5) 2. Concolusion 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply summation of feature map. In softmax weighting, it is weighted the feature maps for each channel using softmax score by Eq. (2). The softmax weighting can mask the unnecessary channel feature map. In SE block fusion, it is weighted the feature maps for each channel using the attention of SE block like Squeeze-and-Excitation Network. After fusing to one channel, pedestrian classification and occlusion state attentions are fused. In this work, we calculate the attention by subtracting the occlusion attention from pedestrian classification attention. Here, we call the attention the attention map because of containing positive and negative values. Attentioni = C ∑ c=1 fc (xi)∗ exp(vc i ) ∑J j=1 exp vj i (2) 3.4 Perception branch In the perception branch, it outputs the final result score using attention map and feature map 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5 tion, outputting the classification scores using global average pooling or global max pooling from the feature map f (·). However, global average pooling increases in the response value of entire feature map at specific class due to using an average of all pixel at a feature map. On the other hand, global max pooling does not increase the entire feature map at specific class because of using a maximum pixel value in a feature map. Response score for each class of global average pooling and global max pooling is calculated as follow Eq. (1). vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 AUTHOR(S): LEARNING OF OCCLUSION-AWARE ATTENTION FOR PEDESTRIAN DETECTION5 tion, outputting the classification scores using global average pooling or global max pooling from the feature map f (·). However, global average pooling increases in the response value of entire feature map at specific class due to using an average of all pixel at a feature map. On the other hand, global max pooling does not increase the entire feature map at specific class because of using a maximum pixel value in a feature map. Response score for each class of global average pooling and global max pooling is calculated as follow Eq. (1). vc i = 1 M×N ∑M m=1 ∑N n=1 fc m,n (xi) (global average pooling), max fc m,n (xi) (global max pooling), (1) After outputting the score for each class, the attention of pedestrian and occlusion regions are generated. First, we fuse the multiple channel feature map to one channel. In this work, we validate the three type fusion as follows in fig. 1(b)∼(d): 1) standard fusion, 2) softmax- weighting fusion, and 3) squeeze-and-excitation (SE) block fusion. Standard fusion is simply summation of feature map. In softmax weighting, it is weighted the feature maps for each 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) v1 i , v2 i , v3 i , vc i (3) f(xi) (4) f (xi, yi) (5) 2. Concolusion References 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 How Small Network Can Detect Ped Anonymous CVPR submission Paper ID **** Abstract 1. Introduction t log y + (1 − t) log (1 − y) (1) vc i = 1 M × N M m=1 N n=1 fc m,n(xi) (2) M, N (3) C (4)
  • 6. Table 1. Classification error on the ILSVRC validation set. Networks top-1 val. error top-5 val. error VGGnet-GAP 33.4 12.2 GoogLeNet-GAP 35.0 13.2 AlexNet∗-GAP 44.9 20.9 AlexNet-GAP 51.1 26.3 GoogLeNet 31.9 11.3 VGGnet 31.2 11.4 AlexNet 42.6 19.5 NIN 41.9 19.6 GoogLeNet-GMP 35.6 13.9 Table 2. Localization error on the ILSVRC validation set. Bac prop refers to using [23] for localization instead of CAM. Method top-1 val.error top-5 val. error GoogLeNet-GAP 56.40 43.00 VGGnet-GAP 57.20 45.14 GoogLeNet 60.09 49.34 AlexNet∗-GAP 63.75 49.53 AlexNet-GAP 67.19 52.16 NIN 65.47 54.19 Backprop on GoogLeNet 61.31 50.55
  • 7.
  • 8. Lall(x) = Eatt(x) + Eper(x) Eper(x) Eatt(x)
  • 9.
  • 10. g(x) M(x) g′(x) g′(x) = (1 + M(x)) ⋅ g(x)
  • 11.
  • 12.
  • 13. irshick2 Piotr Doll´ar2 Zhuowen Tu1 Kaiming He2 C San Diego 2 Facebook AI Research @ucsd.edu {rbg,pdollar,kaiminghe}@fb.com rized network archi- etwork is constructed egates a set of trans- ur simple design re- architecture that has is strategy exposes a ality” (the size of the factor in addition to On the ImageNet-1K under the restricted ncreasing cardinality racy. Moreover, in- han going deeper or Our models, named entry to the ILSVRC secured 2nd place. 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 + 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 .... total 32 paths 256-d in + 256, 1x1, 64 64, 3x3, 64 64, 1x1, 256 + 256-d in 256-d out 256-d out Figure 1. Left: A block of ResNet [14]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complex- ity. A layer is shown as (# in channels, filter size, # out channels). ing blocks of the same shape. This strategy is inherited by ResNets [14] which stack modules of the same topol- ogy. This simple rule reduces the free choices of hyper- parameters, and depth is exposed as an essential dimension Aggregated Residual Transformations for Deep Neural Networks Saining Xie1 Ross Girshick2 Piotr Doll´ar2 Zhuowen Tu1 Kaiming He2 1 UC San Diego 2 Facebook AI Research {s9xie,ztu}@ucsd.edu {rbg,pdollar,kaiminghe}@fb.com Abstract We present a simple, highly modularized network archi- tecture for image classification. Our network is constructed by repeating a building block that aggregates a set of trans- formations with the same topology. Our simple design re- sults in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, in- creasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online1 . 1. Introduction 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 + 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 256, 1x1, 4 4, 3x3, 4 4, 1x1, 256 .... total 32 paths 256-d in + 256, 1x1, 64 64, 3x3, 64 64, 1x1, 256 + 256-d in 256-d out 256-d out Figure 1. Left: A block of ResNet [14]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complex- ity. A layer is shown as (# in channels, filter size, # out channels). ing blocks of the same shape. This strategy is inherited by ResNets [14] which stack modules of the same topol- ogy. This simple rule reduces the free choices of hyper- parameters, and depth is exposed as an essential dimension in neural networks. Moreover, we argue that the simplicity of this rule may reduce the risk of over-adapting the hyper- parameters to a specific dataset. The robustness of VGG- nets and ResNets has been proven by various visual recog- nition tasks [7, 10, 9, 28, 31, 14] and by non-visual tasks involving speech [42, 30] and language [4, 41, 20]. Unlike VGG-nets, the family of Inception models [38, 17, 39, 37] have demonstrated that carefully designed v:1611.05431v2[cs.CV]11Apr2017 Densely Connected Convolutional Networks Gao Huang⇤ Cornell University gh349@cornell.edu Zhuang Liu⇤ Tsinghua University liuzhuang13@mails.tsinghua.edu.cn Laurens van der Maaten Facebook AI Research lvdmaaten@fb.com Kilian Q. Weinberger Cornell University kqw4@cornell.edu Abstract Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convo- lutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several com- pelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage fea- ture reuse, and substantially reduce the number of parame- ters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig- nificant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high per- formance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet. 1. Introduction Convolutional neural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago [18], improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 [19] consisted of 5 layers, VGG featured 19 [29], and only last year Highway ⇤Authors contributed equally x0 x1 H1 x2 H2 H3 H4 x3 x4 Figure 1: A 5-layer dense block with a growth rate of k = 4. Each layer takes all preceding feature-maps as input. Networks [34] and Residual Networks (ResNets) [11] have surpassed the 100-layer barrier. As CNNs become increasingly deep, a new research problem emerges: as information about the input or gra- dient passes through many layers, it can vanish and “wash out” by the time it reaches the end (or beginning) of the network. Many recent publications address this or related problems. ResNets [11] and Highway Networks [34] by- pass signal from one layer to the next via identity connec- tions. Stochastic depth [13] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow. FractalNets [17] repeatedly combine sev- eral parallel layer sequences with different number of con- volutional blocks to obtain a large nominal depth, while maintaining many short paths in the network. Although these different approaches vary in network topology and training procedure, they all share a key characteristic: they create short paths from early layers to later layers. 1 arXiv:1608.06993v5[cs.CV]28Jan2018
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.