Dan Hendrycks, Thomas Dietterich, Benchmarking Neural Network Robustness to Common Corruptions and Perturbations, ICLR2019
https://openreview.net/forum?id=HJz6tiCqYm
2. 序論
n画像分類能力
• %>1"#$?2,20#$,),3&1
• クエリ画像の小さな変化にも対応
• 501@>3&($?2,20#$,),3&1
• ロバストではない
n提案
• 評価基準の確立
• テスト用データセットの作成
• 41"A&B&3C5
• *0((>@320#(破損)のロバスト性
• 41"A&B&3CD
• @&(3>(E"320#(摂動)のロバスト性
• 評価指標
• 一般的な破損や摂動に対する性能を評価
Published as a conference paper at ICLR 2019
Figure 1: Our IMAGENET-C dataset consists of 15 types of
from noise, blur, weather, and digital categories. Each type o
resulting in 75 distinct corruptions. See different severity lev
face of minor input changes. Now in order to approximate C
41"A&B&3C5 41"A&B&3CD
with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom
blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast
can be high or low depending on lighting conditions and the photographed object’s color. Elastic
transformations stretch or contract small image regions. Pixelation occurs when upsampling a low-
resolution image. JPEG is a lossy image compression format which introduces compression artifacts.
Figure 2: Example frames from the
beginning (T = 0) to end (T = 30)
of some Tilt and Brightness pertur-
bation sequences.
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
The following frames in the sequence consist in the same
image but with minute Gaussian noise perturbations applied.
This sequence design is similar for the shot noise perturbation
sequence. However the remaining perturbation sequences have
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shro
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensi
can be high or low depending on lighting conditions and the photographed object’s co
transformations stretch or contract small image regions. Pixelation occurs when upsamp
resolution image. JPEG is a lossy image compression format which introduces compressi
Figure 2: Example fram
beginning (T = 0) to en
of some Tilt and Bright
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
The following frames in the sequence consist in the same
image but with minute Gaussian noise perturbations applied.
This sequence design is similar for the shot noise perturbation
4. ロバスト性のベンチマーク(データセット)
n41"A&B&3C5
• :=種類の*0((>@320#タイプから構成
• 各*0((>@320#タイプには5段階の設定
• *0((>@320#は様々な強度で発生
n41"A&B&3CD
• 多様な*0((>@320#タイプから構成
• @&(3>(E"320#シーケンスを持つ
Common Corruptions. The first corruption type is Gaussian noise. This corruption can appear
in low-lighting conditions. Shot noise, also called Poisson noise, is electronic noise caused by the
discrete nature of light itself. Impulse noise is a color analogue of salt-and-pepper noise and can be
caused by bit errors. Defocus blur occurs when an image is out of focus. Frosted Glass Blur appears
with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom
blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast
can be high or low depending on lighting conditions and the photographed object’s color. Elastic
transformations stretch or contract small image regions. Pixelation occurs when upsampling a low-
resolution image. JPEG is a lossy image compression format which introduces compression artifacts.
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
Published as a conference paper at ICLR 2019
ノイズ
ぼかし
天候
デジタル
9. 実験(ロバスト性の向上)
n特徴集約とネットワークの拡大
• !&#,&B&3,-$6&,B&K3, IK2&J-$5GD689:RM
• *0((>@320#-$@>(3>(E"320#に対するロバスト性の向上
nU3)F2^&'$41"A&B&3$
• 41"A&B&3とU3)F2^&'$41"A&B&3$ IT&2(/0,J-$457689:;Mで学習すると性能向上
• 15ZがLRaLbからR;acbへ向上
ResNet-50 DenseNet-121 DenseNet-161 ResNet-50 ResNeXt-50 ResNeXt-101
mCE 76.7% 73.4% 66.4% 76.7% 68.2% 62.2%
mFR 58.0% 56.4% 46.9% 58.0% 52.4% 43.2%
mT5D 78.3% 76.8% 69.5% 78.3% 74.2% 65.9%
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to
ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur.
Right: ten examples of images with content/shape of left image and style/texture from different
paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive
of the target class, while the global shape tends to be retained. Note that within SIN, every source
image is stylized only once.
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying Ada
ImageNet images. Left: randomly selected ImageNet image of class ring-
Right: ten examples of images with content/shape of left image and style/tex
paintings. After applying AdaIN style transfer, local texture cues are no longe
of the target class, while the global shape tends to be retained. Note that within
image is stylized only once.
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaI
ImageNet images. Left: randomly selected ImageNet image of class ring-
Right: ten examples of images with content/shape of left image and style/tex
paintings. After applying AdaIN style transfer, local texture cues are no longe
of the target class, while the global shape tends to be retained. Note that within
image is stylized only once.
41"A&B&3画像 U3)F2^&'$41"A&B&3
12. 予備スライド
nX>F32A(2'$B&3Y0(+
• 畳み込みフィルターがスケール空間を超えて作用し,粗いグリッドと釜かいグリッドの
間のコミュニケーションメカニズムを提供
• 遠く離れた文脈状の手がかりを混合するための必要な深さを削減
3
c
x
y
64 64 64 64 64 64
128 128 128 128 128 128 128
256 256 256 256 256 256 256
512 512 512 512 512 512 512
c
3
3
3
3
x
y
scale
64
32
16
8
64
32
16
8
128
64
32
128
64
32
256
128
512
Standard CNN
Multigrid CNN
depth
Figure 1. Multigrid networks. Top: Standard CNN architectures conflate scale with abstraction (depth). Filters with limited receptive
field propagate information slowly across the spatial grid, necessitating the use of very deep networks to fully integrate contextual cues.
Bottom: In multigrid networks, convolutional filters act across scale space (x, y, c, s), thereby providing a communication mechanism
(Ke+, CVPR2016)
13. 予備スライド
nU3)F2^&'$41"A&B&3
• 41"A&B&3の画像にH'"4Bのスタイル変換を適用して作成
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to
ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur.
Right: ten examples of images with content/shape of left image and style/texture from different
paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive
of the target class, while the global shape tends to be retained. Note that within SIN, every source
image is stylized only once.
(Geirhos+, ICLR2019)