文献紹介：Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

!"#$%&'()*#+,-".('/,-"012(),
324.50#"55,02,62&&2#,
62((.70*2#5,'#8,9"(0.(4'0*2#5
!"#$%&#'()*+,-$./01",$!2&33&(2*/-$456789:;
大谷碧生（名工大玉木研）
論文紹介898:<:9<:=

序論
n画像分類能力
• %>1"#$?2,20#$,),3&1
• クエリ画像の小さな変化にも対応
• 501@>3&($?2,20#$,),3&1
• ロバストではない
n提案
• 評価基準の確立
• テスト用データセットの作成
• 41"A&B&3C5
• *0((>@320#（破損）のロバスト性
• 41"A&B&3CD
• @&(3>(E"320#（摂動）のロバスト性
• 評価指標
• 一般的な破損や摂動に対する性能を評価
Published as a conference paper at ICLR 2019
Figure 1: Our IMAGENET-C dataset consists of 15 types of
from noise, blur, weather, and digital categories. Each type o
resulting in 75 distinct corruptions. See different severity lev
face of minor input changes. Now in order to approximate C
41"A&B&3C5 41"A&B&3CD
with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom
blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast
can be high or low depending on lighting conditions and the photographed object’s color. Elastic
transformations stretch or contract small image regions. Pixelation occurs when upsampling a low-
resolution image. JPEG is a lossy image compression format which introduces compression artifacts.
Figure 2: Example frames from the
beginning (T = 0) to end (T = 30)
of some Tilt and Brightness pertur-
bation sequences.
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
The following frames in the sequence consist in the same
image but with minute Gaussian noise perturbations applied.
This sequence design is similar for the shot noise perturbation
sequence. However the remaining perturbation sequences have
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shro
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensi
can be high or low depending on lighting conditions and the photographed object’s co
transformations stretch or contract small image regions. Pixelation occurs when upsamp
resolution image. JPEG is a lossy image compression format which introduces compressi
Figure 2: Example fram
beginning (T = 0) to en
of some Tilt and Bright
The following frames in the sequence consist in the same
image but with minute Gaussian noise perturbations applied.
This sequence design is similar for the shot noise perturbation

関連研究
n畳み込みネットワークの脆弱性
• 41@>F,&$#02,&を適用して5F0>'$G2,20#$HD4の非ロバスト性を証明
I%0,,&2#2J-$"(K2?89:LM
• 人間の視覚がN2#&C3>#2#Aした畳み込みネットワークに優る
I!0'A&OP"("1-$"(K2?89:LM
nロバスト性の改良
• 分類器の脆弱性を減らすためにN2#&C3>#2#A
• :種類のぼかしでN2#&C3>#2#Aするだけでは不十分と判明
IG",2FQ&?2*J-$"(K2?89:RM
• ノイズの多い画像でのN2#&C3>#2#Aは>#'&(N2332#Aの原因となる
IS/&#AJ-$5GD689:RM

ロバスト性のベンチマーク（データセット）
n41"A&B&3C5
• :=種類の*0((>@320#タイプから構成
• 各*0((>@320#タイプには５段階の設定
• *0((>@320#は様々な強度で発生
n41"A&B&3CD
• 多様な*0((>@320#タイプから構成
• @&(3>(E"320#シーケンスを持つ
Common Corruptions. The first corruption type is Gaussian noise. This corruption can appear
in low-lighting conditions. Shot noise, also called Poisson noise, is electronic noise caused by the
discrete nature of light itself. Impulse noise is a color analogue of salt-and-pepper noise and can be
caused by bit errors. Defocus blur occurs when an image is out of focus. Frosted Glass Blur appears
with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom
blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast
can be high or low depending on lighting conditions and the photographed object’s color. Elastic
transformations stretch or contract small image regions. Pixelation occurs when upsampling a low-
resolution image. JPEG is a lossy image compression format which introduces compression artifacts.
ノイズ
ぼかし
天候
デジタル

!"種類の破損タイプと５段階の強度
nT">,,2"#$B02,&：光量が少ない時に発生
nU/03$B02,&：光そのものの性質に起因
n41@>F,&$B02,&：ビットエラーによって発生
n!&N0*, VF>(：焦点があっていない時
nW(0,3&'$TF",,$VF>(：すりガラスによる
nX0320#$VF>(：カメラの動きが速い時
nS001$VF>(：カメラが急激に移動した時
nU#0Y：降水の一種
nW(0,3：氷の結晶が付着し，霜が発生
nW0A：霧
nV(2A/3#&,,：日中の明るさによって変化
n5#3(",3W(0,3&'：被写体の色によって変化
nZF",32*：画像の小さな領域の伸縮
nD2[&F"3&：低解像度画像のアップサンプリング
nDZT：非可逆的な画像圧縮形式
A EXAMPLE OF IMAGENET-C SEVERITIES
=段階で4#@F>,& B02,&を加えた例

ロバスト性のベンチマーク（評価指標）
n41"A&B&3C5の評価指標
• 50((>@320#$Z((0(
• :=個の50((>@320#$Z((0(の平均
• 6&F"32?&$50((>@320#$Z((0(
n41"A&B&3CDの評価指標
• フリップレート
• 1.=!$I](/"#-$89:;M
• トップ５距離の平均
CE!
"
= $
%
#$%
&
𝐸#,!
"
%
#$%
&
𝐸#,!
()*+,*-
mCE
CE!
"
= (
%
#$%
&
𝐸#,!
"
− 𝐸!./01
"
%
#$%
&
𝐸#,!
2./34/5 − 𝐸!./01
2./34/5
FP6
"
=
1
𝑚(𝑛 − 1)
%
7$%
8
%
9$:
1
１(𝑓(𝑥9
7
) ≠ 𝑓(𝑥9;%
(7)
))
= ℙ3~𝒮 (𝑓 𝑥9 ≠ 𝑓 𝑥9;% )
FR6
"
= 6
FP6
"
FP6
()*+,*-
!"#""分類器
$"#"強度（５段階）
%"#"%&''()*+&,の種類
-"#")('*('./*+&,シーケンス
)"#")('*('./*+&,の種類

実験（アーキテクチャのロバスト性）
n評価 I*0((>@320#M
• アーキテクチャの向上に伴い15Zが低下
• GTTB&3と6&,B&3の比較
• GTTB&3が劣る
• V"3*/$B0(1"F2^"320#$IVBMの利用
• ロバスト性向上
n評価 I@&(3>(E"320#M
• GTTB&3と6&,B&3の比較
• 同程度の性能
• V"3*/$B0(1"F2^"320#の利用
• ロバスト性低下
!"の利用
#$$%&'
()*")+%,-

実験（ロバスト性の向上）
nヒストグラムの均等化
• 50#3(",3$72123&'$H'"@32?&$%2,30A("1$Z_>"F2^"320#$I57H%ZM$ID2^&(J-$:;`LMで
前処理をした画像を用いて6&,B&3C=9をN2#&C3>#2#A
• 15Zの向上
nマルチスケールアーキテクチャ
• 各層でスケールを超えて特徴を伝播
• X>F32A(2'$B&3Y0(+$IP&J-$5GD689:RM
• 各層にグリッドのピラミッドを持ち，後続の層がスケールを超えて動作
• XU!B&3, IX>F32CU*"F&$!&#,&$B&3Y0(+,M$I%>"#AJ-$457689:`M
• X>F32A(2'$B&3Y0(+と同様にスケールを超えて情報を使用
• 15Zの向上
ResNet-50 ResNet-50 with
CLAHE
ResNet-50 Multigrid
network
MSDNet
mCE 76.7% 74.5%. 76.7% 73.3% 73.6%

実験（ロバスト性の向上）
n特徴集約とネットワークの拡大
• !&#,&B&3,-$6&,B&K3, IK2&J-$5GD689:RM
• *0((>@320#-$@>(3>(E"320#に対するロバスト性の向上
nU3)F2^&'$41"A&B&3$
• 41"A&B&3とU3)F2^&'$41"A&B&3$ IT&2(/0,J-$457689:;Mで学習すると性能向上
• 15ZがLRaLbからR;acbへ向上
ResNet-50 DenseNet-121 DenseNet-161 ResNet-50 ResNeXt-50 ResNeXt-101
mCE 76.7% 73.4% 66.4% 76.7% 68.2% 62.2%
mFR 58.0% 56.4% 46.9% 58.0% 52.4% 43.2%
mT5D 78.3% 76.8% 69.5% 78.3% 74.2% 65.9%
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to
ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur.
Right: ten examples of images with content/shape of left image and style/texture from different
paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive
of the target class, while the global shape tends to be retained. Note that within SIN, every source
image is stylized only once.
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying Ada
ImageNet images. Left: randomly selected ImageNet image of class ring-
Right: ten examples of images with content/shape of left image and style/tex
paintings. After applying AdaIN style transfer, local texture cues are no longe
of the target class, while the global shape tends to be retained. Note that within
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaI
ImageNet images. Left: randomly selected ImageNet image of class ring-
Right: ten examples of images with content/shape of left image and style/tex
paintings. After applying AdaIN style transfer, local texture cues are no longe
of the target class, while the global shape tends to be retained. Note that within
41"A&B&3画像 U3)F2^&'$41"A&B&3

まとめ
n*0((>@320#-$@>(3>(E"320#に関する包括的なベンチマークを作成
• テスト用データセット
• 41"A&B&3C5
• 41"A&B&3CD
• 評価基準
n*0((>@320#-$@>(3>(E"320#においてのロバスト性の向上を確認
• ヒストグラムの均等化
• マルチスケールアーキテクチャ
• 特徴集約とネットワークの拡大
• U3)F2^&'$41"A&B&3$

予備スライド
n50#3(",3$72123&'$H'"@32?&$%2,30A("1$Z_>"F2^"320#$I57H%ZM
• ヒストグラムを両側に平坦に伸ばす
n明るい画像
• 画素値が高い範囲に集中
n良い画像
• 画素値が全範囲に万遍なく分布
I./&$X"3/d0(+,M

予備スライド
nX>F32A(2'$B&3Y0(+
• 畳み込みフィルターがスケール空間を超えて作用し，粗いグリッドと釜かいグリッドの
間のコミュニケーションメカニズムを提供
• 遠く離れた文脈状の手がかりを混合するための必要な深さを削減
3
c
x
y
64 64 64 64 64 64
128 128 128 128 128 128 128
256 256 256 256 256 256 256
512 512 512 512 512 512 512
c
3
3
3
3
x
y
scale
64
32
16
8
64
32
16
8
128
64
32
128
64
32
256
128
512
Standard CNN
Multigrid CNN
depth
Figure 1. Multigrid networks. Top: Standard CNN architectures conflate scale with abstraction (depth). Filters with limited receptive
field propagate information slowly across the spatial grid, necessitating the use of very deep networks to fully integrate contextual cues.
Bottom: In multigrid networks, convolutional filters act across scale space (x, y, c, s), thereby providing a communication mechanism
(Ke+, CVPR2016)

予備スライド
nU3)F2^&'$41"A&B&3
• 41"A&B&3の画像にH'"4Bのスタイル変換を適用して作成
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to
ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur.
Right: ten examples of images with content/shape of left image and style/texture from different
paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive
of the target class, while the global shape tends to be retained. Note that within SIN, every source
(Geirhos+, ICLR2019)

文献紹介：Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 文献紹介：Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Similaire à 文献紹介：Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (20)

Plus de Toru Tamaki

Plus de Toru Tamaki (20)

Dernier

Dernier (10)

文献紹介：Benchmarking Neural Network Robustness to Common Corruptions and Perturbations