文献紹介：R-MNet: A Perceptual Adversarial Network for Image Inpainting

!"#$%&'()(*%+,%-&./0(
)12%+3/+4/0($%&56+7(86+(
9:/;%(9<-/4<&4<;
!"#$%&!'()&*+,,'% -$,.#"/0)1",/$,2 3#+4'#.)&-$5",&6'70$#)&
8$$9:$#, ;<4&',.&=+"&;++, >'?)&6@*1ABAC
大谷碧生（名工大玉木研）
論文紹介ABACDEDFC

Introduction
n画像のInpaintingにおける重要性
• 画像のリアルさを維持すること
n提案手法
• R-Mnet
• 68@G
• #$5$#<$&('<0",H&+?$#'2+#&
• I,?'",2",Hに有効なピクセルのみを残してエンコーダ・デコーダネットワークの
最後に転送
• #$5$#<$&('<0&7+<<
• マスク領域の予測出力と入力画像上のマスクの対応する元ピクセル領域との間
の距離を測定する，特徴空間における損失関数

!"#$%"&'()*+
nニューラルネットワークでの初めてのアプローチ J!'",K)&I**1ABBLM
• 誤差逆伝播法のパラメータ学習で定式化された画像ノイズ除去
n敵対的学習の導入 JN'2%'0K)&*1NOABCPM
• ピクセル単位の再構成損失と敵対的学習を組み合わせた*+,2$Q29R,/+.$#
nN'#2"'7&*+,5+742"+,&JS"4K)&R**1ABCTM
• 非正規化された畳み込みを組み合わせた自動マスク更新
n8'2$.&*+,5+742"+, J>4K)&I**1ABCUM
• データから<+V2&('<0を自動的に学習

!,-."%
n68@Gをベースに構築
nマスクされた入力画像
• ℐ! = ℐ ⨀ 𝛭
n反転したマスク
• 𝑀" = 1 − 𝑀
n生成器による予測画像
• ℐ#"$% = 𝐺&(ℐ!M
n?#$."/2$.&('<0$.&'#$'&"('H$
• ℐ! #"$% = ℐ#"$% ⨀ 𝑀"
ℐ! #"$%
ℐ#"$%
ℐ! ℐ!
𝛭

!"/"*0"'-$0+
: Illustration of partial convolution (left) and gated convolution (middle) and Reverse-masking (right).
ℐ! #"$%
nマスクが逆になっている
n欠落している領域に焦点を
当てた畳み込みを行う
n欠落したピクセル値が予測
される
n可視ピクセルは保存される

損失関数
n生成器
• 事前学習モデルから抽出した特徴量を用いた?$#/$?24'7&7+<<
• ℒ# =
'
𝓀
∑)∈+ 𝜙) ℐ − 𝜙) ℐ#"$%
,
• #$5$#<$.&('<0&7+<<
• ℒ"- =
'
𝓀
∑)∈+ 𝜙) ℐ! #"$% − 𝜙) ℐ ⨀ 𝑀"
,
• 生成器の損失関数
• ℒ. = 1 − 𝜆 ℒ# + 𝜆ℒ"-
n識別器
• 6'<<$#<2$",&."<2',/$&7+<<&V4,/2"+,
• ℒ/ = 𝐸ℐ~2!
𝐷& ℐ − 𝐸ℐ"#$%~2&
𝐷& ℐ#"$%

実験
nデータセット
• *$7$W@9;X&J-'##'<K)&I*SOABCTM
• AEP×AEPにリサイズ
nマスキングデータセット
• X4"/0&3#'Y&.'2'<$2&JI<0'0+5)&ABCTM
• AEP×AEPにリサイズ
n比較手法
• *+,2$Q29R,/+.$#
• N*+,5
• 8*
Figure 2: Illustration of partial convolution (left) and ga
Figure 3: Sample images from Quick-Draw Dataset by
X4"/0&3#'Y&.'2'<$2
n評価指標
• ZI3&J;$4<$7K)&ABCLM
• =@R&
• N:GO
• ::I=&J6',HK)&ABB[M

結果
n定性的評価
• 他の手法と比較してリアルさがある
(a) Masked (b) CE [26] (c) PConv [23] (d) GC [41] (e) R-MNet (f) GT
Figure 4: Visual comparison of the inpainted results by CE, PConv, GC and R-MNet on CelebA-HQ [23] where Quick
Draw dataset [14] is used as masking method using mask hole-to-image ratios [0.01,0.6].
5.1. Qualitative Comparison
We carried out experiments based on similar implemen-
tations by Liu et al. [23], Pathak et al. [26] and pre-trained
model for the state of the art [41] and compared our results.
マスク画像 *R N*+,5 8* O9=G$2 8
n定量的評価
• 他の最先端の手法より優れた性能
!"#$%
(a) Masked (b) R-MNet (c) GT
Figure 6: Results of image inpainting using R-MNet-0.4 on
Places2 [43] and Paris Street View [5], where images in col-
umn (a) are the masked-image generated using the Quick-
Draw dataset [14]; images in column (b) are the results of
inpainting by our proposed method; and images in column
(c) are the ground-truth.
Table 1: Results from CelebA-HQ test dataset, where Quick
Draw dataset by Iskakov et al. [14] is used as mask-
ing method with mask hole-to-image ratios range between
[0.01,0.6]. † Lower is better. ! Higher is better.
Inpainting Method FID †
MAE †
PSNR !
SSIM !
R-MNet-0.1 26.95 33.40 38.46 0.88
Pathak et al. [26] 29.96 123.54 32.61 0.69
Liu et al. [23] 15.86 98.01 33.03 0.81
Yu et al. [41] 4.29 43.10 39.69 0.92
R-MNet-0.4 3.09 31.91 40.40 0.94
"&
'"
λ

*"/"*0"'1$0+'#)00の有効性
n生成器の損失関数
• ℒ. = 1 − 𝜆 ℒ# + 𝜆ℒ"-
n考察
• 𝜆 ]&B^[の時に最良の結果が得られた
• マスクが大きいほど知覚的な類似性を得るのに時間がかかる
• 画像全体をカバーする損失の割合が小さくなるため，ネットワークに時間がかかる
Table 3: Results from Paris Street View and Places2 us-
ing Quick Draw dataset by Iskakov et al. [14] as masking
method with mask hole-to-image ratios [0.01,0.6]. † Lower
is better. ! Higher is better.
Lrm weight FID †
MAE †
PSNR !
SSIM !
λ=0.1, R-MNet-0.1 26.95 33.40 38.46 0.88
λ=0.3, R-MNet-0.3 4.14 31.57 40.20 0.93
λ=0.4, R-MNet-0.4 3.09 31.91 40.40 0.94
λ=0.5, R-MNet-0.5 4.14 31.0 40.0 0.93
Table 3: Results from Paris Street View and Places2 us-
ing Quick Draw dataset by Iskakov et al. [14] as masking
method with mask hole-to-image ratios [0.01,0.6]. † Lower
is better. ! Higher is better.
Lrm weight FID †
MAE †
PSNR !
SSIM !
λ=0.1, R-MNet-0.1 26.95 33.40 38.46 0.88
λ=0.3, R-MNet-0.3 4.14 31.57 40.20 0.93
λ=0.4, R-MNet-0.4 3.09 31.91 40.40 0.94
λ=0.5, R-MNet-0.5 4.14 31.0 40.0 0.93
(a) Masked (b) λ = 0 (c) λ = 0.1 (d) λ = 0.4 (e) GT
(masked-image), which in turn share the same pixel level.
Additionally, when using irregular masks, these regions to
be inpainted can be seen by the algorithm as a square bound-
ing box that contains both visible and missing pixels, which
can cause the GAN to generate images that can contain
structural and textural artefacts.
To overcome this issue, we modify the usual approach
by having two inputs to our model, during the training, the
image and its associated binary mask. This allow us to have
access to the mask, at the end of our encoding/decoding
network, through the spatial preserving operation and gives
us the ability to compute the following, that will be used
during the loss computation:
• A reversed mask applied to the output image.
• Use a spatial preserving operation to get the original
masked-image.
• Use matrix operands to add the reversed mask image
back on the masked-image.
By using this method of internal masking and restoring,
our network can inpaint only the required features while
('<0$. 𝜆 ]&B 𝜆 ]&B^C 𝜆 ]&B^[ 8

まとめ
n68@Gを組み合わせてO$5$#<$&='<0",Hを使った新たな手法を提案
n様々なバイナリマスクの形状とサイズでI,?'",2",Hを実行
n構造的，テクスチャ的に一貫した画像を再構築
nマスクの逆行列演算を行いながらモデルを学習することは有益
n最先端の手法と比較して優れた性能

文献紹介：R-MNet: A Perceptual Adversarial Network for Image Inpainting

Recommandé

Recommandé

Contenu connexe

Similaire à 文献紹介：R-MNet: A Perceptual Adversarial Network for Image Inpainting

Similaire à 文献紹介：R-MNet: A Perceptual Adversarial Network for Image Inpainting (12)

Plus de Toru Tamaki

Plus de Toru Tamaki (20)

文献紹介：R-MNet: A Perceptual Adversarial Network for Image Inpainting