SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
!"#$%&'()*#+,-".('/,-"012(),
324.50#"55,02,62&&2#,
62((.70*2#5,'#8,9"(0.(4'0*2#5
!"#$%&#'()*+,-$./01",$!2&33&(2*/-$456789:;
大谷碧生(名工大玉木研)
論文紹介898:<:9<:=
序論
n画像分類能力
• %>1"#$?2,20#$,),3&1
• クエリ画像の小さな変化にも対応
• 501@>3&($?2,20#$,),3&1
• ロバストではない
n提案
• 評価基準の確立
• テスト用データセットの作成
• 41"A&B&3C5
• *0((>@320#(破損)のロバスト性
• 41"A&B&3CD
• @&(3>(E"320#(摂動)のロバスト性
• 評価指標
• 一般的な破損や摂動に対する性能を評価
Published as a conference paper at ICLR 2019
Figure 1: Our IMAGENET-C dataset consists of 15 types of
from noise, blur, weather, and digital categories. Each type o
resulting in 75 distinct corruptions. See different severity lev
face of minor input changes. Now in order to approximate C
41"A&B&3C5 41"A&B&3CD
with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom
blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast
can be high or low depending on lighting conditions and the photographed object’s color. Elastic
transformations stretch or contract small image regions. Pixelation occurs when upsampling a low-
resolution image. JPEG is a lossy image compression format which introduces compression artifacts.
Figure 2: Example frames from the
beginning (T = 0) to end (T = 30)
of some Tilt and Brightness pertur-
bation sequences.
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
The following frames in the sequence consist in the same
image but with minute Gaussian noise perturbations applied.
This sequence design is similar for the shot noise perturbation
sequence. However the remaining perturbation sequences have
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shro
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensi
can be high or low depending on lighting conditions and the photographed object’s co
transformations stretch or contract small image regions. Pixelation occurs when upsamp
resolution image. JPEG is a lossy image compression format which introduces compressi
Figure 2: Example fram
beginning (T = 0) to en
of some Tilt and Bright
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
The following frames in the sequence consist in the same
image but with minute Gaussian noise perturbations applied.
This sequence design is similar for the shot noise perturbation
関連研究
n畳み込みネットワークの脆弱性
• 41@>F,&$#02,&を適用して5F0>'$G2,20#$HD4の非ロバスト性を証明
I%0,,&2#2J-$"(K2?89:LM
• 人間の視覚がN2#&C3>#2#Aした畳み込みネットワークに優る
I!0'A&OP"("1-$"(K2?89:LM
nロバスト性の改良
• 分類器の脆弱性を減らすためにN2#&C3>#2#A
• :種類のぼかしでN2#&C3>#2#Aするだけでは不十分と判明
IG",2FQ&?2*J-$"(K2?89:RM
• ノイズの多い画像でのN2#&C3>#2#Aは>#'&(N2332#Aの原因となる
IS/&#AJ-$5GD689:RM
ロバスト性のベンチマーク(データセット)
n41"A&B&3C5
• :=種類の*0((>@320#タイプから構成
• 各*0((>@320#タイプには5段階の設定
• *0((>@320#は様々な強度で発生
n41"A&B&3CD
• 多様な*0((>@320#タイプから構成
• @&(3>(E"320#シーケンスを持つ
Common Corruptions. The first corruption type is Gaussian noise. This corruption can appear
in low-lighting conditions. Shot noise, also called Poisson noise, is electronic noise caused by the
discrete nature of light itself. Impulse noise is a color analogue of salt-and-pepper noise and can be
caused by bit errors. Defocus blur occurs when an image is out of focus. Frosted Glass Blur appears
with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom
blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of
precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects
and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast
can be high or low depending on lighting conditions and the photographed object’s color. Elastic
transformations stretch or contract small image regions. Pixelation occurs when upsampling a low-
resolution image. JPEG is a lossy image compression format which introduces compression artifacts.
IMAGENET-P Design. The second benchmark that we
propose tests the classifier’s perturbation robustness. Models
lacking in perturbation robustness produce erratic predictions
which undermines user trust. When perturbations have a high
propensity to change the model’s response, then perturbations
could also misdirect or destabilize iterative image optimization
procedures appearing in style transfer (Gatys et al., 2016), deci-
sion explanations (Fong & Vedaldi, 2017), feature visualization
(Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET-
P consists of noise, blur, weather, and digital distortions. Also
as before, the dataset has validation perturbations; has difficulty
levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64,
standard, and Inception-sized editions; and has been designed
for benchmarking not training networks. IMAGENET-P
departs from IMAGENET-C by having perturbation sequences
generated from each ImageNet validation image; examples are
in Figure 2. Each sequence contains more than 30 frames, so
we counteract an increase in dataset size and evaluation time
by using only 10 common perturbations.
Common Perturbations. Appearing more subtly than
the corruption from IMAGENET-C, the Gaussian noise
perturbation sequence begins with the clean ImageNet image.
Published as a conference paper at ICLR 2019
ノイズ
ぼかし
天候
デジタル
!"種類の破損タイプと5段階の強度
nT">,,2"#$B02,&:光量が少ない時に発生
nU/03$B02,&:光そのものの性質に起因
n41@>F,&$B02,&:ビットエラーによって発生
n!&N0*, VF>(:焦点があっていない時
nW(0,3&'$TF",,$VF>(:すりガラスによる
nX0320#$VF>(:カメラの動きが速い時
nS001$VF>(:カメラが急激に移動した時
nU#0Y:降水の一種
nW(0,3:氷の結晶が付着し,霜が発生
nW0A:霧
nV(2A/3#&,,:日中の明るさによって変化
n5#3(",3W(0,3&':被写体の色によって変化
nZF",32*:画像の小さな領域の伸縮
nD2[&F"3&:低解像度画像のアップサンプリング
nDZT:非可逆的な画像圧縮形式
Published as a conference paper at ICLR 2019
A EXAMPLE OF IMAGENET-C SEVERITIES
=段階で4#@F>,& B02,&を加えた例
ロバスト性のベンチマーク(評価指標)
n41"A&B&3C5の評価指標
• 50((>@320#$Z((0(
• :=個の50((>@320#$Z((0(の平均
• 6&F"32?&$50((>@320#$Z((0(
n41"A&B&3CDの評価指標
• フリップレート
• 1.=!$I](/"#-$89:;M
• トップ5距離の平均
CE!
"
= $
%
#$%
&
𝐸#,!
"
%
#$%
&
𝐸#,!
()*+,*-
mCE
CE!
"
= (
%
#$%
&
𝐸#,!
"
− 𝐸!./01
"
%
#$%
&
𝐸#,!
2./34/5 − 𝐸!./01
2./34/5
FP6
"
=
1
𝑚(𝑛 − 1)
%
7$%
8
%
9$:
1
1(𝑓(𝑥9
7
) ≠ 𝑓(𝑥9;%
(7)
))
= ℙ3~𝒮 (𝑓 𝑥9 ≠ 𝑓 𝑥9;% )
FR6
"
= 6
FP6
"
FP6
()*+,*-
!"#""分類器
$"#"強度(5段階)
%"#"%&''()*+&,の種類
-"#")('*('./*+&,シーケンス
)"#")('*('./*+&,の種類
実験(アーキテクチャのロバスト性)
n評価 I*0((>@320#M
• アーキテクチャの向上に伴い15Zが低下
• GTTB&3と6&,B&3の比較
• GTTB&3が劣る
• V"3*/$B0(1"F2^"320#$IVBMの利用
• ロバスト性向上
n評価 I@&(3>(E"320#M
• GTTB&3と6&,B&3の比較
• 同程度の性能
• V"3*/$B0(1"F2^"320#の利用
• ロバスト性低下
!"の利用
#$$%&'
()*")+%,-
実験(ロバスト性の向上)
nヒストグラムの均等化
• 50#3(",3$72123&'$H'"@32?&$%2,30A("1$Z_>"F2^"320#$I57H%ZM$ID2^&(J-$:;`LMで
前処理をした画像を用いて6&,B&3C=9をN2#&C3>#2#A
• 15Zの向上
nマルチスケールアーキテクチャ
• 各層でスケールを超えて特徴を伝播
• X>F32A(2'$B&3Y0(+$IP&J-$5GD689:RM
• 各層にグリッドのピラミッドを持ち,後続の層がスケールを超えて動作
• XU!B&3, IX>F32CU*"F&$!&#,&$B&3Y0(+,M$I%>"#AJ-$457689:`M
• X>F32A(2'$B&3Y0(+と同様にスケールを超えて情報を使用
• 15Zの向上
ResNet-50 ResNet-50 with
CLAHE
ResNet-50 Multigrid
network
MSDNet
mCE 76.7% 74.5%. 76.7% 73.3% 73.6%
実験(ロバスト性の向上)
n特徴集約とネットワークの拡大
• !&#,&B&3,-$6&,B&K3, IK2&J-$5GD689:RM
• *0((>@320#-$@>(3>(E"320#に対するロバスト性の向上
nU3)F2^&'$41"A&B&3$
• 41"A&B&3とU3)F2^&'$41"A&B&3$ IT&2(/0,J-$457689:;Mで学習すると性能向上
• 15ZがLRaLbからR;acbへ向上
ResNet-50 DenseNet-121 DenseNet-161 ResNet-50 ResNeXt-50 ResNeXt-101
mCE 76.7% 73.4% 66.4% 76.7% 68.2% 62.2%
mFR 58.0% 56.4% 46.9% 58.0% 52.4% 43.2%
mT5D 78.3% 76.8% 69.5% 78.3% 74.2% 65.9%
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to
ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur.
Right: ten examples of images with content/shape of left image and style/texture from different
paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive
of the target class, while the global shape tends to be retained. Note that within SIN, every source
image is stylized only once.
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying Ada
ImageNet images. Left: randomly selected ImageNet image of class ring-
Right: ten examples of images with content/shape of left image and style/tex
paintings. After applying AdaIN style transfer, local texture cues are no longe
of the target class, while the global shape tends to be retained. Note that within
image is stylized only once.
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaI
ImageNet images. Left: randomly selected ImageNet image of class ring-
Right: ten examples of images with content/shape of left image and style/tex
paintings. After applying AdaIN style transfer, local texture cues are no longe
of the target class, while the global shape tends to be retained. Note that within
image is stylized only once.
41"A&B&3画像 U3)F2^&'$41"A&B&3
まとめ
n*0((>@320#-$@>(3>(E"320#に関する包括的なベンチマークを作成
• テスト用データセット
• 41"A&B&3C5
• 41"A&B&3CD
• 評価基準
n*0((>@320#-$@>(3>(E"320#においてのロバスト性の向上を確認
• ヒストグラムの均等化
• マルチスケールアーキテクチャ
• 特徴集約とネットワークの拡大
• U3)F2^&'$41"A&B&3$
予備スライド
n50#3(",3$72123&'$H'"@32?&$%2,30A("1$Z_>"F2^"320#$I57H%ZM
• ヒストグラムを両側に平坦に伸ばす
n明るい画像
• 画素値が高い範囲に集中
n良い画像
• 画素値が全範囲に万遍なく分布
I./&$X"3/d0(+,M
予備スライド
nX>F32A(2'$B&3Y0(+
• 畳み込みフィルターがスケール空間を超えて作用し,粗いグリッドと釜かいグリッドの
間のコミュニケーションメカニズムを提供
• 遠く離れた文脈状の手がかりを混合するための必要な深さを削減
3
c
x
y
64 64 64 64 64 64
128 128 128 128 128 128 128
256 256 256 256 256 256 256
512 512 512 512 512 512 512
c
3
3
3
3
x
y
scale
64
32
16
8
64
32
16
8
128
64
32
128
64
32
256
128
512
Standard CNN
Multigrid CNN
depth
Figure 1. Multigrid networks. Top: Standard CNN architectures conflate scale with abstraction (depth). Filters with limited receptive
field propagate information slowly across the spatial grid, necessitating the use of very deep networks to fully integrate contextual cues.
Bottom: In multigrid networks, convolutional filters act across scale space (x, y, c, s), thereby providing a communication mechanism
(Ke+, CVPR2016)
予備スライド
nU3)F2^&'$41"A&B&3
• 41"A&B&3の画像にH'"4Bのスタイル変換を適用して作成
Published as a conference paper at ICLR 2019
Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to
ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur.
Right: ten examples of images with content/shape of left image and style/texture from different
paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive
of the target class, while the global shape tends to be retained. Note that within SIN, every source
image is stylized only once.
(Geirhos+, ICLR2019)

Contenu connexe

Tendances

Tendances (20)

[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
 
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
[第2回3D勉強会 研究紹介] Neural 3D Mesh Renderer (CVPR 2018)
[第2回3D勉強会 研究紹介] Neural 3D Mesh Renderer (CVPR 2018)[第2回3D勉強会 研究紹介] Neural 3D Mesh Renderer (CVPR 2018)
[第2回3D勉強会 研究紹介] Neural 3D Mesh Renderer (CVPR 2018)
 
【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ
 
これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由これからの Vision & Language ~ Acadexit した4つの理由
これからの Vision & Language ~ Acadexit した4つの理由
 
研究効率化Tips Ver.2
研究効率化Tips Ver.2研究効率化Tips Ver.2
研究効率化Tips Ver.2
 
【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields
 
敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
SIX ABEJA 講演資料 もうブラックボックスとは呼ばせない~機械学習を支援する情報
SIX ABEJA 講演資料 もうブラックボックスとは呼ばせない~機械学習を支援する情報SIX ABEJA 講演資料 もうブラックボックスとは呼ばせない~機械学習を支援する情報
SIX ABEJA 講演資料 もうブラックボックスとは呼ばせない~機械学習を支援する情報
 
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
SSII2020 [OS2-02] 教師あり事前学習を凌駕する「弱」教師あり事前学習
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
 
ディープラーニングを用いた物体認識とその周辺 ~現状と課題~ (Revised on 18 July, 2018)
ディープラーニングを用いた物体認識とその周辺 ~現状と課題~ (Revised on 18 July, 2018)ディープラーニングを用いた物体認識とその周辺 ~現状と課題~ (Revised on 18 July, 2018)
ディープラーニングを用いた物体認識とその周辺 ~現状と課題~ (Revised on 18 July, 2018)
 
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning   画像×言語の大規模基盤モ...【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning   画像×言語の大規模基盤モ...
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
 
【DL輪読会】NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics vi...
【DL輪読会】NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics vi...【DL輪読会】NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics vi...
【DL輪読会】NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics vi...
 

Similaire à 文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

SSII2014 チュートリアル資料
SSII2014 チュートリアル資料SSII2014 チュートリアル資料
SSII2014 チュートリアル資料
Masayuki Tanaka
 
20180622 munit multimodal unsupervised image-to-image translation
20180622 munit   multimodal unsupervised image-to-image translation20180622 munit   multimodal unsupervised image-to-image translation
20180622 munit multimodal unsupervised image-to-image translation
h m
 

Similaire à 文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (20)

SSII2014 チュートリアル資料
SSII2014 チュートリアル資料SSII2014 チュートリアル資料
SSII2014 チュートリアル資料
 
人が注目する箇所を当てるSaliency Detectionの最新モデル UCNet(CVPR2020)
人が注目する箇所を当てるSaliency Detectionの最新モデル UCNet(CVPR2020)人が注目する箇所を当てるSaliency Detectionの最新モデル UCNet(CVPR2020)
人が注目する箇所を当てるSaliency Detectionの最新モデル UCNet(CVPR2020)
 
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting
 
文献紹介:SlowFast Networks for Video Recognition
文献紹介:SlowFast Networks for Video Recognition文献紹介:SlowFast Networks for Video Recognition
文献紹介:SlowFast Networks for Video Recognition
 
ConditionalPointDiffusion.pdf
ConditionalPointDiffusion.pdfConditionalPointDiffusion.pdf
ConditionalPointDiffusion.pdf
 
文献紹介:Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Re...
文献紹介:Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Re...文献紹介:Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Re...
文献紹介:Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Re...
 
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...
 
文献紹介:Tell Me Where to Look: Guided Attention Inference Network
文献紹介:Tell Me Where to Look: Guided Attention Inference Network文献紹介:Tell Me Where to Look: Guided Attention Inference Network
文献紹介:Tell Me Where to Look: Guided Attention Inference Network
 
文献紹介:TinyVIRAT: Low-resolution Video Action Recognition
文献紹介:TinyVIRAT: Low-resolution Video Action Recognition文献紹介:TinyVIRAT: Low-resolution Video Action Recognition
文献紹介:TinyVIRAT: Low-resolution Video Action Recognition
 
文献紹介:Extreme Low-Resolution Activity Recognition Using a Super-Resolution-Ori...
文献紹介:Extreme Low-Resolution Activity Recognition Using a Super-Resolution-Ori...文献紹介:Extreme Low-Resolution Activity Recognition Using a Super-Resolution-Ori...
文献紹介:Extreme Low-Resolution Activity Recognition Using a Super-Resolution-Ori...
 
ハンズオン1: DIGITS によるディープラーニング入門
ハンズオン1: DIGITS によるディープラーニング入門ハンズオン1: DIGITS によるディープラーニング入門
ハンズオン1: DIGITS によるディープラーニング入門
 
文献紹介:Token Shift Transformer for Video Classification
文献紹介:Token Shift Transformer for Video Classification文献紹介:Token Shift Transformer for Video Classification
文献紹介:Token Shift Transformer for Video Classification
 
(文献紹介)深層学習による動被写体ロバストなカメラの動き推定
(文献紹介)深層学習による動被写体ロバストなカメラの動き推定(文献紹介)深層学習による動被写体ロバストなカメラの動き推定
(文献紹介)深層学習による動被写体ロバストなカメラの動き推定
 
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
 
20180622 munit multimodal unsupervised image-to-image translation
20180622 munit   multimodal unsupervised image-to-image translation20180622 munit   multimodal unsupervised image-to-image translation
20180622 munit multimodal unsupervised image-to-image translation
 
文献紹介:Video Description: A Survey of Methods, Datasets, and Evaluation Metrics
文献紹介:Video Description: A Survey of Methods, Datasets, and Evaluation Metrics文献紹介:Video Description: A Survey of Methods, Datasets, and Evaluation Metrics
文献紹介:Video Description: A Survey of Methods, Datasets, and Evaluation Metrics
 
MIRU2018 tutorial
MIRU2018 tutorialMIRU2018 tutorial
MIRU2018 tutorial
 
文献紹介:Selective Feature Compression for Efficient Activity Recognition Inference
文献紹介:Selective Feature Compression for Efficient Activity Recognition Inference文献紹介:Selective Feature Compression for Efficient Activity Recognition Inference
文献紹介:Selective Feature Compression for Efficient Activity Recognition Inference
 
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
 
Deep Learningについて
Deep LearningについてDeep Learningについて
Deep Learningについて
 

Plus de Toru Tamaki

Plus de Toru Tamaki (20)

論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
 
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
 
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 

Dernier

Dernier (10)

Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)
Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)
Observabilityは従来型の監視と何が違うのか(キンドリルジャパン社内勉強会:2022年10月27日発表)
 
新人研修 後半 2024/04/26の勉強会で発表されたものです。
新人研修 後半        2024/04/26の勉強会で発表されたものです。新人研修 後半        2024/04/26の勉強会で発表されたものです。
新人研修 後半 2024/04/26の勉強会で発表されたものです。
 
Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その32024/04/26の勉強会で発表されたものです。
 
Utilizing Ballerina for Cloud Native Integrations
Utilizing Ballerina for Cloud Native IntegrationsUtilizing Ballerina for Cloud Native Integrations
Utilizing Ballerina for Cloud Native Integrations
 
LoRaWAN スマート距離検出デバイスDS20L日本語マニュアル
LoRaWAN スマート距離検出デバイスDS20L日本語マニュアルLoRaWAN スマート距離検出デバイスDS20L日本語マニュアル
LoRaWAN スマート距離検出デバイスDS20L日本語マニュアル
 
論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
論文紹介: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
 
Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。
Amazon SES を勉強してみる その22024/04/26の勉強会で発表されたものです。
 
NewSQLの可用性構成パターン(OCHaCafe Season 8 #4 発表資料)
NewSQLの可用性構成パターン(OCHaCafe Season 8 #4 発表資料)NewSQLの可用性構成パターン(OCHaCafe Season 8 #4 発表資料)
NewSQLの可用性構成パターン(OCHaCafe Season 8 #4 発表資料)
 
知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx
知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx
知識ゼロの営業マンでもできた!超速で初心者を脱する、悪魔的学習ステップ3選.pptx
 
LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス
LoRaWANスマート距離検出センサー  DS20L  カタログ  LiDARデバイスLoRaWANスマート距離検出センサー  DS20L  カタログ  LiDARデバイス
LoRaWANスマート距離検出センサー DS20L カタログ LiDARデバイス
 

文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

  • 2. 序論 n画像分類能力 • %>1"#$?2,20#$,),3&1 • クエリ画像の小さな変化にも対応 • 501@>3&($?2,20#$,),3&1 • ロバストではない n提案 • 評価基準の確立 • テスト用データセットの作成 • 41"A&B&3C5 • *0((>@320#(破損)のロバスト性 • 41"A&B&3CD • @&(3>(E"320#(摂動)のロバスト性 • 評価指標 • 一般的な破損や摂動に対する性能を評価 Published as a conference paper at ICLR 2019 Figure 1: Our IMAGENET-C dataset consists of 15 types of from noise, blur, weather, and digital categories. Each type o resulting in 75 distinct corruptions. See different severity lev face of minor input changes. Now in order to approximate C 41"A&B&3C5 41"A&B&3CD with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast can be high or low depending on lighting conditions and the photographed object’s color. Elastic transformations stretch or contract small image regions. Pixelation occurs when upsampling a low- resolution image. JPEG is a lossy image compression format which introduces compression artifacts. Figure 2: Example frames from the beginning (T = 0) to end (T = 30) of some Tilt and Brightness pertur- bation sequences. IMAGENET-P Design. The second benchmark that we propose tests the classifier’s perturbation robustness. Models lacking in perturbation robustness produce erratic predictions which undermines user trust. When perturbations have a high propensity to change the model’s response, then perturbations could also misdirect or destabilize iterative image optimization procedures appearing in style transfer (Gatys et al., 2016), deci- sion explanations (Fong & Vedaldi, 2017), feature visualization (Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET- P consists of noise, blur, weather, and digital distortions. Also as before, the dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64, standard, and Inception-sized editions; and has been designed for benchmarking not training networks. IMAGENET-P departs from IMAGENET-C by having perturbation sequences generated from each ImageNet validation image; examples are in Figure 2. Each sequence contains more than 30 frames, so we counteract an increase in dataset size and evaluation time by using only 10 common perturbations. Common Perturbations. Appearing more subtly than the corruption from IMAGENET-C, the Gaussian noise perturbation sequence begins with the clean ImageNet image. The following frames in the sequence consist in the same image but with minute Gaussian noise perturbations applied. This sequence design is similar for the shot noise perturbation sequence. However the remaining perturbation sequences have precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shro and is rendered with the diamond-square algorithm. Brightness varies with daylight intensi can be high or low depending on lighting conditions and the photographed object’s co transformations stretch or contract small image regions. Pixelation occurs when upsamp resolution image. JPEG is a lossy image compression format which introduces compressi Figure 2: Example fram beginning (T = 0) to en of some Tilt and Bright IMAGENET-P Design. The second benchmark that we propose tests the classifier’s perturbation robustness. Models lacking in perturbation robustness produce erratic predictions which undermines user trust. When perturbations have a high propensity to change the model’s response, then perturbations could also misdirect or destabilize iterative image optimization procedures appearing in style transfer (Gatys et al., 2016), deci- sion explanations (Fong & Vedaldi, 2017), feature visualization (Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET- P consists of noise, blur, weather, and digital distortions. Also as before, the dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64, standard, and Inception-sized editions; and has been designed for benchmarking not training networks. IMAGENET-P departs from IMAGENET-C by having perturbation sequences generated from each ImageNet validation image; examples are in Figure 2. Each sequence contains more than 30 frames, so we counteract an increase in dataset size and evaluation time by using only 10 common perturbations. Common Perturbations. Appearing more subtly than the corruption from IMAGENET-C, the Gaussian noise perturbation sequence begins with the clean ImageNet image. The following frames in the sequence consist in the same image but with minute Gaussian noise perturbations applied. This sequence design is similar for the shot noise perturbation
  • 3. 関連研究 n畳み込みネットワークの脆弱性 • 41@>F,&$#02,&を適用して5F0>'$G2,20#$HD4の非ロバスト性を証明 I%0,,&2#2J-$"(K2?89:LM • 人間の視覚がN2#&C3>#2#Aした畳み込みネットワークに優る I!0'A&OP"("1-$"(K2?89:LM nロバスト性の改良 • 分類器の脆弱性を減らすためにN2#&C3>#2#A • :種類のぼかしでN2#&C3>#2#Aするだけでは不十分と判明 IG",2FQ&?2*J-$"(K2?89:RM • ノイズの多い画像でのN2#&C3>#2#Aは>#'&(N2332#Aの原因となる IS/&#AJ-$5GD689:RM
  • 4. ロバスト性のベンチマーク(データセット) n41"A&B&3C5 • :=種類の*0((>@320#タイプから構成 • 各*0((>@320#タイプには5段階の設定 • *0((>@320#は様々な強度で発生 n41"A&B&3CD • 多様な*0((>@320#タイプから構成 • @&(3>(E"320#シーケンスを持つ Common Corruptions. The first corruption type is Gaussian noise. This corruption can appear in low-lighting conditions. Shot noise, also called Poisson noise, is electronic noise caused by the discrete nature of light itself. Impulse noise is a color analogue of salt-and-pepper noise and can be caused by bit errors. Defocus blur occurs when an image is out of focus. Frosted Glass Blur appears with “frosted glass” windows or panels. Motion blur appears when a camera is moving quickly. Zoom blur occurs when a camera moves toward an object rapidly. Snow is a visually obstructive form of precipitation. Frost forms when lenses or windows are coated with ice crystals. Fog shrouds objects and is rendered with the diamond-square algorithm. Brightness varies with daylight intensity. Contrast can be high or low depending on lighting conditions and the photographed object’s color. Elastic transformations stretch or contract small image regions. Pixelation occurs when upsampling a low- resolution image. JPEG is a lossy image compression format which introduces compression artifacts. IMAGENET-P Design. The second benchmark that we propose tests the classifier’s perturbation robustness. Models lacking in perturbation robustness produce erratic predictions which undermines user trust. When perturbations have a high propensity to change the model’s response, then perturbations could also misdirect or destabilize iterative image optimization procedures appearing in style transfer (Gatys et al., 2016), deci- sion explanations (Fong & Vedaldi, 2017), feature visualization (Olah et al., 2017), and so on. Like IMAGENET-C, IMAGENET- P consists of noise, blur, weather, and digital distortions. Also as before, the dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 ⇥ 64, standard, and Inception-sized editions; and has been designed for benchmarking not training networks. IMAGENET-P departs from IMAGENET-C by having perturbation sequences generated from each ImageNet validation image; examples are in Figure 2. Each sequence contains more than 30 frames, so we counteract an increase in dataset size and evaluation time by using only 10 common perturbations. Common Perturbations. Appearing more subtly than the corruption from IMAGENET-C, the Gaussian noise perturbation sequence begins with the clean ImageNet image. Published as a conference paper at ICLR 2019 ノイズ ぼかし 天候 デジタル
  • 5. !"種類の破損タイプと5段階の強度 nT">,,2"#$B02,&:光量が少ない時に発生 nU/03$B02,&:光そのものの性質に起因 n41@>F,&$B02,&:ビットエラーによって発生 n!&N0*, VF>(:焦点があっていない時 nW(0,3&'$TF",,$VF>(:すりガラスによる nX0320#$VF>(:カメラの動きが速い時 nS001$VF>(:カメラが急激に移動した時 nU#0Y:降水の一種 nW(0,3:氷の結晶が付着し,霜が発生 nW0A:霧 nV(2A/3#&,,:日中の明るさによって変化 n5#3(",3W(0,3&':被写体の色によって変化 nZF",32*:画像の小さな領域の伸縮 nD2[&F"3&:低解像度画像のアップサンプリング nDZT:非可逆的な画像圧縮形式 Published as a conference paper at ICLR 2019 A EXAMPLE OF IMAGENET-C SEVERITIES =段階で4#@F>,& B02,&を加えた例
  • 6. ロバスト性のベンチマーク(評価指標) n41"A&B&3C5の評価指標 • 50((>@320#$Z((0( • :=個の50((>@320#$Z((0(の平均 • 6&F"32?&$50((>@320#$Z((0( n41"A&B&3CDの評価指標 • フリップレート • 1.=!$I](/"#-$89:;M • トップ5距離の平均 CE! " = $ % #$% & 𝐸#,! " % #$% & 𝐸#,! ()*+,*- mCE CE! " = ( % #$% & 𝐸#,! " − 𝐸!./01 " % #$% & 𝐸#,! 2./34/5 − 𝐸!./01 2./34/5 FP6 " = 1 𝑚(𝑛 − 1) % 7$% 8 % 9$: 1 1(𝑓(𝑥9 7 ) ≠ 𝑓(𝑥9;% (7) )) = ℙ3~𝒮 (𝑓 𝑥9 ≠ 𝑓 𝑥9;% ) FR6 " = 6 FP6 " FP6 ()*+,*- !"#""分類器 $"#"強度(5段階) %"#"%&''()*+&,の種類 -"#")('*('./*+&,シーケンス )"#")('*('./*+&,の種類
  • 7. 実験(アーキテクチャのロバスト性) n評価 I*0((>@320#M • アーキテクチャの向上に伴い15Zが低下 • GTTB&3と6&,B&3の比較 • GTTB&3が劣る • V"3*/$B0(1"F2^"320#$IVBMの利用 • ロバスト性向上 n評価 I@&(3>(E"320#M • GTTB&3と6&,B&3の比較 • 同程度の性能 • V"3*/$B0(1"F2^"320#の利用 • ロバスト性低下 !"の利用 #$$%&' ()*")+%,-
  • 8. 実験(ロバスト性の向上) nヒストグラムの均等化 • 50#3(",3$72123&'$H'"@32?&$%2,30A("1$Z_>"F2^"320#$I57H%ZM$ID2^&(J-$:;`LMで 前処理をした画像を用いて6&,B&3C=9をN2#&C3>#2#A • 15Zの向上 nマルチスケールアーキテクチャ • 各層でスケールを超えて特徴を伝播 • X>F32A(2'$B&3Y0(+$IP&J-$5GD689:RM • 各層にグリッドのピラミッドを持ち,後続の層がスケールを超えて動作 • XU!B&3, IX>F32CU*"F&$!&#,&$B&3Y0(+,M$I%>"#AJ-$457689:`M • X>F32A(2'$B&3Y0(+と同様にスケールを超えて情報を使用 • 15Zの向上 ResNet-50 ResNet-50 with CLAHE ResNet-50 Multigrid network MSDNet mCE 76.7% 74.5%. 76.7% 73.3% 73.6%
  • 9. 実験(ロバスト性の向上) n特徴集約とネットワークの拡大 • !&#,&B&3,-$6&,B&K3, IK2&J-$5GD689:RM • *0((>@320#-$@>(3>(E"320#に対するロバスト性の向上 nU3)F2^&'$41"A&B&3$ • 41"A&B&3とU3)F2^&'$41"A&B&3$ IT&2(/0,J-$457689:;Mで学習すると性能向上 • 15ZがLRaLbからR;acbへ向上 ResNet-50 DenseNet-121 DenseNet-161 ResNet-50 ResNeXt-50 ResNeXt-101 mCE 76.7% 73.4% 66.4% 76.7% 68.2% 62.2% mFR 58.0% 56.4% 46.9% 58.0% 52.4% 43.2% mT5D 78.3% 76.8% 69.5% 78.3% 74.2% 65.9% Published as a conference paper at ICLR 2019 Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur. Right: ten examples of images with content/shape of left image and style/texture from different paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive of the target class, while the global shape tends to be retained. Note that within SIN, every source image is stylized only once. Published as a conference paper at ICLR 2019 Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying Ada ImageNet images. Left: randomly selected ImageNet image of class ring- Right: ten examples of images with content/shape of left image and style/tex paintings. After applying AdaIN style transfer, local texture cues are no longe of the target class, while the global shape tends to be retained. Note that within image is stylized only once. Published as a conference paper at ICLR 2019 Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaI ImageNet images. Left: randomly selected ImageNet image of class ring- Right: ten examples of images with content/shape of left image and style/tex paintings. After applying AdaIN style transfer, local texture cues are no longe of the target class, while the global shape tends to be retained. Note that within image is stylized only once. 41"A&B&3画像 U3)F2^&'$41"A&B&3
  • 10. まとめ n*0((>@320#-$@>(3>(E"320#に関する包括的なベンチマークを作成 • テスト用データセット • 41"A&B&3C5 • 41"A&B&3CD • 評価基準 n*0((>@320#-$@>(3>(E"320#においてのロバスト性の向上を確認 • ヒストグラムの均等化 • マルチスケールアーキテクチャ • 特徴集約とネットワークの拡大 • U3)F2^&'$41"A&B&3$
  • 12. 予備スライド nX>F32A(2'$B&3Y0(+ • 畳み込みフィルターがスケール空間を超えて作用し,粗いグリッドと釜かいグリッドの 間のコミュニケーションメカニズムを提供 • 遠く離れた文脈状の手がかりを混合するための必要な深さを削減 3 c x y 64 64 64 64 64 64 128 128 128 128 128 128 128 256 256 256 256 256 256 256 512 512 512 512 512 512 512 c 3 3 3 3 x y scale 64 32 16 8 64 32 16 8 128 64 32 128 64 32 256 128 512 Standard CNN Multigrid CNN depth Figure 1. Multigrid networks. Top: Standard CNN architectures conflate scale with abstraction (depth). Filters with limited receptive field propagate information slowly across the spatial grid, necessitating the use of very deep networks to fully integrate contextual cues. Bottom: In multigrid networks, convolutional filters act across scale space (x, y, c, s), thereby providing a communication mechanism (Ke+, CVPR2016)
  • 13. 予備スライド nU3)F2^&'$41"A&B&3 • 41"A&B&3の画像にH'"4Bのスタイル変換を適用して作成 Published as a conference paper at ICLR 2019 Figure 3: Visualisation of Stylized-ImageNet (SIN), created by applying AdaIN style transfer to ImageNet images. Left: randomly selected ImageNet image of class ring-tailed lemur. Right: ten examples of images with content/shape of left image and style/texture from different paintings. After applying AdaIN style transfer, local texture cues are no longer highly predictive of the target class, while the global shape tends to be retained. Note that within SIN, every source image is stylized only once. (Geirhos+, ICLR2019)