SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
!"#$%&'()(*%+,%-&./0(
)12%+3/+4/0($%&56+7(86+(
9:/;%(9<-/4<&4<;
!"#$%&!'()&*+,,'% -$,.#"/0)1",/$,2 3#+4'#.)&-$5",&6'70$#)&
8$$9:$#, ;<4&',.&=+"&;++, >'?)&6@*1ABAC
大谷碧生(名工大玉木研)
論文紹介ABACDEDFC
Introduction
n画像のInpaintingにおける重要性
• 画像のリアルさを維持すること
n提案手法
• R-Mnet
• 68@G
• #$5$#<$&('<0",H&+?$#'2+#&
• I,?'",2",Hに有効なピクセルのみを残してエンコーダ・デコーダネットワークの
最後に転送
• #$5$#<$&('<0&7+<<
• マスク領域の予測出力と入力画像上のマスクの対応する元ピクセル領域との間
の距離を測定する,特徴空間における損失関数
!"#$%"&'()*+
nニューラルネットワークでの初めてのアプローチ J!'",K)&I**1ABBLM
• 誤差逆伝播法のパラメータ学習で定式化された画像ノイズ除去
n敵対的学習の導入 JN'2%'0K)&*1NOABCPM
• ピクセル単位の再構成損失と敵対的学習を組み合わせた*+,2$Q29R,/+.$#
nN'#2"'7&*+,5+742"+,&JS"4K)&R**1ABCTM
• 非正規化された畳み込みを組み合わせた自動マスク更新
n8'2$.&*+,5+742"+, J>4K)&I**1ABCUM
• データから<+V2&('<0を自動的に学習
!,-."%
n68@Gをベースに構築
nマスクされた入力画像
• ℐ! = ℐ ⨀ 𝛭
n反転したマスク
• 𝑀" = 1 − 𝑀
n生成器による予測画像
• ℐ#"$% = 𝐺&(ℐ!M
n?#$."/2$.&('<0$.&'#$'&"('H$
• ℐ! #"$% = ℐ#"$% ⨀ 𝑀"
ℐ! #"$%
ℐ#"$%
ℐ! ℐ!
𝛭
!"/"*0"'-$0+
: Illustration of partial convolution (left) and gated convolution (middle) and Reverse-masking (right).
ℐ! #"$%
nマスクが逆になっている
n欠落している領域に焦点を
当てた畳み込みを行う
n欠落したピクセル値が予測
される
n可視ピクセルは保存される
損失関数
n生成器
• 事前学習モデルから抽出した特徴量を用いた?$#/$?24'7&7+<<
• ℒ# =
'
𝓀
∑)∈+ 𝜙) ℐ − 𝜙) ℐ#"$%
,
• #$5$#<$.&('<0&7+<<
• ℒ"- =
'
𝓀
∑)∈+ 𝜙) ℐ! #"$% − 𝜙) ℐ ⨀ 𝑀"
,
• 生成器の損失関数
• ℒ. = 1 − 𝜆 ℒ# + 𝜆ℒ"-
n識別器
• 6'<<$#<2$",&."<2',/$&7+<<&V4,/2"+,
• ℒ/ = 𝐸ℐ~2!
𝐷& ℐ − 𝐸ℐ"#$%~2&
𝐷& ℐ#"$%
実験
nデータセット
• *$7$W@9;X&J-'##'<K)&I*SOABCTM
• AEP×AEPにリサイズ
nマスキングデータセット
• X4"/0&3#'Y&.'2'<$2&JI<0'0+5)&ABCTM
• AEP×AEPにリサイズ
n比較手法
• *+,2$Q29R,/+.$#
• N*+,5
• 8*
Figure 2: Illustration of partial convolution (left) and ga
Figure 3: Sample images from Quick-Draw Dataset by
X4"/0&3#'Y&.'2'<$2
n評価指標
• ZI3&J;$4<$7K)&ABCLM
• =@R&
• N:GO
• ::I=&J6',HK)&ABB[M
結果
n定性的評価
• 他の手法と比較してリアルさがある
(a) Masked (b) CE [26] (c) PConv [23] (d) GC [41] (e) R-MNet (f) GT
Figure 4: Visual comparison of the inpainted results by CE, PConv, GC and R-MNet on CelebA-HQ [23] where Quick
Draw dataset [14] is used as masking method using mask hole-to-image ratios [0.01,0.6].
5.1. Qualitative Comparison
We carried out experiments based on similar implemen-
tations by Liu et al. [23], Pathak et al. [26] and pre-trained
model for the state of the art [41] and compared our results.
マスク画像 *R N*+,5 8* O9=G$2 8
n定量的評価
• 他の最先端の手法より優れた性能
!"#$%
(a) Masked (b) R-MNet (c) GT
Figure 6: Results of image inpainting using R-MNet-0.4 on
Places2 [43] and Paris Street View [5], where images in col-
umn (a) are the masked-image generated using the Quick-
Draw dataset [14]; images in column (b) are the results of
inpainting by our proposed method; and images in column
(c) are the ground-truth.
Table 1: Results from CelebA-HQ test dataset, where Quick
Draw dataset by Iskakov et al. [14] is used as mask-
ing method with mask hole-to-image ratios range between
[0.01,0.6]. † Lower is better. ! Higher is better.
Inpainting Method FID †
MAE †
PSNR !
SSIM !
R-MNet-0.1 26.95 33.40 38.46 0.88
Pathak et al. [26] 29.96 123.54 32.61 0.69
Liu et al. [23] 15.86 98.01 33.03 0.81
Yu et al. [41] 4.29 43.10 39.69 0.92
R-MNet-0.4 3.09 31.91 40.40 0.94
"&
'"
λ
*"/"*0"'1$0+'#)00の有効性
n生成器の損失関数
• ℒ. = 1 − 𝜆 ℒ# + 𝜆ℒ"-
n考察
• 𝜆 ]&B^[の時に最良の結果が得られた
• マスクが大きいほど知覚的な類似性を得るのに時間がかかる
• 画像全体をカバーする損失の割合が小さくなるため,ネットワークに時間がかかる
Table 3: Results from Paris Street View and Places2 us-
ing Quick Draw dataset by Iskakov et al. [14] as masking
method with mask hole-to-image ratios [0.01,0.6]. † Lower
is better. ! Higher is better.
Lrm weight FID †
MAE †
PSNR !
SSIM !
λ=0.1, R-MNet-0.1 26.95 33.40 38.46 0.88
λ=0.3, R-MNet-0.3 4.14 31.57 40.20 0.93
λ=0.4, R-MNet-0.4 3.09 31.91 40.40 0.94
λ=0.5, R-MNet-0.5 4.14 31.0 40.0 0.93
Table 3: Results from Paris Street View and Places2 us-
ing Quick Draw dataset by Iskakov et al. [14] as masking
method with mask hole-to-image ratios [0.01,0.6]. † Lower
is better. ! Higher is better.
Lrm weight FID †
MAE †
PSNR !
SSIM !
λ=0.1, R-MNet-0.1 26.95 33.40 38.46 0.88
λ=0.3, R-MNet-0.3 4.14 31.57 40.20 0.93
λ=0.4, R-MNet-0.4 3.09 31.91 40.40 0.94
λ=0.5, R-MNet-0.5 4.14 31.0 40.0 0.93
(a) Masked (b) λ = 0 (c) λ = 0.1 (d) λ = 0.4 (e) GT
(masked-image), which in turn share the same pixel level.
Additionally, when using irregular masks, these regions to
be inpainted can be seen by the algorithm as a square bound-
ing box that contains both visible and missing pixels, which
can cause the GAN to generate images that can contain
structural and textural artefacts.
To overcome this issue, we modify the usual approach
by having two inputs to our model, during the training, the
image and its associated binary mask. This allow us to have
access to the mask, at the end of our encoding/decoding
network, through the spatial preserving operation and gives
us the ability to compute the following, that will be used
during the loss computation:
• A reversed mask applied to the output image.
• Use a spatial preserving operation to get the original
masked-image.
• Use matrix operands to add the reversed mask image
back on the masked-image.
By using this method of internal masking and restoring,
our network can inpaint only the required features while
('<0$. 𝜆 ]&B 𝜆 ]&B^C 𝜆 ]&B^[ 8
まとめ
n68@Gを組み合わせてO$5$#<$&='<0",Hを使った新たな手法を提案
n様々なバイナリマスクの形状とサイズでI,?'",2",Hを実行
n構造的,テクスチャ的に一貫した画像を再構築
nマスクの逆行列演算を行いながらモデルを学習することは有益
n最先端の手法と比較して優れた性能

Contenu connexe

Similaire à 文献紹介:R-MNet: A Perceptual Adversarial Network for Image Inpainting

SSII2014 チュートリアル資料
SSII2014 チュートリアル資料SSII2014 チュートリアル資料
SSII2014 チュートリアル資料
Masayuki Tanaka
 

Similaire à 文献紹介:R-MNet: A Perceptual Adversarial Network for Image Inpainting (12)

文献紹介:Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
文献紹介:Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection文献紹介:Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
文献紹介:Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
 
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
 
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
 
Tokyo r27
Tokyo r27Tokyo r27
Tokyo r27
 
文献紹介:SlowFast Networks for Video Recognition
文献紹介:SlowFast Networks for Video Recognition文献紹介:SlowFast Networks for Video Recognition
文献紹介:SlowFast Networks for Video Recognition
 
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
文献紹介:VideoMix: Rethinking Data Augmentation for Video Classification
 
SSII2014 チュートリアル資料
SSII2014 チュートリアル資料SSII2014 チュートリアル資料
SSII2014 チュートリアル資料
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoRRとStanでクラウドセットアップ時間を分析してみたら #TokyoR
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
 
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting
文献紹介:Learnable Gated Temporal Shift Module for Free-form Video Inpainting
 
文献紹介:Learning Video Stabilization Using Optical Flow
文献紹介:Learning Video Stabilization Using Optical Flow文献紹介:Learning Video Stabilization Using Optical Flow
文献紹介:Learning Video Stabilization Using Optical Flow
 
Image interpolation
Image interpolationImage interpolation
Image interpolation
 
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...
文献紹介:2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Reco...
 

Plus de Toru Tamaki

Plus de Toru Tamaki (20)

論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
論文紹介:Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Groun...
 
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
論文紹介:Selective Structured State-Spaces for Long-Form Video Understanding
 
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
論文紹介:Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Gene...
 
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
論文紹介:Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...
 
論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet論文紹介:Automated Classification of Model Errors on ImageNet
論文紹介:Automated Classification of Model Errors on ImageNet
 
論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey論文紹介:Semantic segmentation using Vision Transformers: A survey
論文紹介:Semantic segmentation using Vision Transformers: A survey
 
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
論文紹介:MOSE: A New Dataset for Video Object Segmentation in Complex Scenes
 
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
論文紹介:MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Acti...
 
論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation論文紹介:Tracking Anything with Decoupled Video Segmentation
論文紹介:Tracking Anything with Decoupled Video Segmentation
 
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
論文紹介:Real-Time Evaluation in Online Continual Learning: A New Hope
 
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
 
論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning論文紹介:Multitask Vision-Language Prompt Tuning
論文紹介:Multitask Vision-Language Prompt Tuning
 
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies論文紹介:MovieCLIP: Visual Scene Recognition in Movies
論文紹介:MovieCLIP: Visual Scene Recognition in Movies
 
論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA論文紹介:Discovering Universal Geometry in Embeddings with ICA
論文紹介:Discovering Universal Geometry in Embeddings with ICA
 
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
論文紹介:Efficient Video Action Detection with Token Dropout and Context Refinement
 
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
論文紹介:Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Lo...
 
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
論文紹介:MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Lon...
 
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
論文紹介:Revealing the unseen: Benchmarking video action recognition under occlusion
 
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
論文紹介:Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
 
論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion論文紹介:Spatio-Temporal Action Detection Under Large Motion
論文紹介:Spatio-Temporal Action Detection Under Large Motion
 

文献紹介:R-MNet: A Perceptual Adversarial Network for Image Inpainting

  • 1. !"#$%&'()(*%+,%-&./0( )12%+3/+4/0($%&56+7(86+( 9:/;%(9<-/4<&4<; !"#$%&!'()&*+,,'% -$,.#"/0)1",/$,2 3#+4'#.)&-$5",&6'70$#)& 8$$9:$#, ;<4&',.&=+"&;++, >'?)&6@*1ABAC 大谷碧生(名工大玉木研) 論文紹介ABACDEDFC
  • 2. Introduction n画像のInpaintingにおける重要性 • 画像のリアルさを維持すること n提案手法 • R-Mnet • 68@G • #$5$#<$&('<0",H&+?$#'2+#& • I,?'",2",Hに有効なピクセルのみを残してエンコーダ・デコーダネットワークの 最後に転送 • #$5$#<$&('<0&7+<< • マスク領域の予測出力と入力画像上のマスクの対応する元ピクセル領域との間 の距離を測定する,特徴空間における損失関数
  • 3. !"#$%"&'()*+ nニューラルネットワークでの初めてのアプローチ J!'",K)&I**1ABBLM • 誤差逆伝播法のパラメータ学習で定式化された画像ノイズ除去 n敵対的学習の導入 JN'2%'0K)&*1NOABCPM • ピクセル単位の再構成損失と敵対的学習を組み合わせた*+,2$Q29R,/+.$# nN'#2"'7&*+,5+742"+,&JS"4K)&R**1ABCTM • 非正規化された畳み込みを組み合わせた自動マスク更新 n8'2$.&*+,5+742"+, J>4K)&I**1ABCUM • データから<+V2&('<0を自動的に学習
  • 4. !,-."% n68@Gをベースに構築 nマスクされた入力画像 • ℐ! = ℐ ⨀ 𝛭 n反転したマスク • 𝑀" = 1 − 𝑀 n生成器による予測画像 • ℐ#"$% = 𝐺&(ℐ!M n?#$."/2$.&('<0$.&'#$'&"('H$ • ℐ! #"$% = ℐ#"$% ⨀ 𝑀" ℐ! #"$% ℐ#"$% ℐ! ℐ! 𝛭
  • 5. !"/"*0"'-$0+ : Illustration of partial convolution (left) and gated convolution (middle) and Reverse-masking (right). ℐ! #"$% nマスクが逆になっている n欠落している領域に焦点を 当てた畳み込みを行う n欠落したピクセル値が予測 される n可視ピクセルは保存される
  • 6. 損失関数 n生成器 • 事前学習モデルから抽出した特徴量を用いた?$#/$?24'7&7+<< • ℒ# = ' 𝓀 ∑)∈+ 𝜙) ℐ − 𝜙) ℐ#"$% , • #$5$#<$.&('<0&7+<< • ℒ"- = ' 𝓀 ∑)∈+ 𝜙) ℐ! #"$% − 𝜙) ℐ ⨀ 𝑀" , • 生成器の損失関数 • ℒ. = 1 − 𝜆 ℒ# + 𝜆ℒ"- n識別器 • 6'<<$#<2$",&."<2',/$&7+<<&V4,/2"+, • ℒ/ = 𝐸ℐ~2! 𝐷& ℐ − 𝐸ℐ"#$%~2& 𝐷& ℐ#"$%
  • 7. 実験 nデータセット • *$7$W@9;X&J-'##'<K)&I*SOABCTM • AEP×AEPにリサイズ nマスキングデータセット • X4"/0&3#'Y&.'2'<$2&JI<0'0+5)&ABCTM • AEP×AEPにリサイズ n比較手法 • *+,2$Q29R,/+.$# • N*+,5 • 8* Figure 2: Illustration of partial convolution (left) and ga Figure 3: Sample images from Quick-Draw Dataset by X4"/0&3#'Y&.'2'<$2 n評価指標 • ZI3&J;$4<$7K)&ABCLM • =@R& • N:GO • ::I=&J6',HK)&ABB[M
  • 8. 結果 n定性的評価 • 他の手法と比較してリアルさがある (a) Masked (b) CE [26] (c) PConv [23] (d) GC [41] (e) R-MNet (f) GT Figure 4: Visual comparison of the inpainted results by CE, PConv, GC and R-MNet on CelebA-HQ [23] where Quick Draw dataset [14] is used as masking method using mask hole-to-image ratios [0.01,0.6]. 5.1. Qualitative Comparison We carried out experiments based on similar implemen- tations by Liu et al. [23], Pathak et al. [26] and pre-trained model for the state of the art [41] and compared our results. マスク画像 *R N*+,5 8* O9=G$2 8 n定量的評価 • 他の最先端の手法より優れた性能 !"#$% (a) Masked (b) R-MNet (c) GT Figure 6: Results of image inpainting using R-MNet-0.4 on Places2 [43] and Paris Street View [5], where images in col- umn (a) are the masked-image generated using the Quick- Draw dataset [14]; images in column (b) are the results of inpainting by our proposed method; and images in column (c) are the ground-truth. Table 1: Results from CelebA-HQ test dataset, where Quick Draw dataset by Iskakov et al. [14] is used as mask- ing method with mask hole-to-image ratios range between [0.01,0.6]. † Lower is better. ! Higher is better. Inpainting Method FID † MAE † PSNR ! SSIM ! R-MNet-0.1 26.95 33.40 38.46 0.88 Pathak et al. [26] 29.96 123.54 32.61 0.69 Liu et al. [23] 15.86 98.01 33.03 0.81 Yu et al. [41] 4.29 43.10 39.69 0.92 R-MNet-0.4 3.09 31.91 40.40 0.94 "& '" λ
  • 9. *"/"*0"'1$0+'#)00の有効性 n生成器の損失関数 • ℒ. = 1 − 𝜆 ℒ# + 𝜆ℒ"- n考察 • 𝜆 ]&B^[の時に最良の結果が得られた • マスクが大きいほど知覚的な類似性を得るのに時間がかかる • 画像全体をカバーする損失の割合が小さくなるため,ネットワークに時間がかかる Table 3: Results from Paris Street View and Places2 us- ing Quick Draw dataset by Iskakov et al. [14] as masking method with mask hole-to-image ratios [0.01,0.6]. † Lower is better. ! Higher is better. Lrm weight FID † MAE † PSNR ! SSIM ! λ=0.1, R-MNet-0.1 26.95 33.40 38.46 0.88 λ=0.3, R-MNet-0.3 4.14 31.57 40.20 0.93 λ=0.4, R-MNet-0.4 3.09 31.91 40.40 0.94 λ=0.5, R-MNet-0.5 4.14 31.0 40.0 0.93 Table 3: Results from Paris Street View and Places2 us- ing Quick Draw dataset by Iskakov et al. [14] as masking method with mask hole-to-image ratios [0.01,0.6]. † Lower is better. ! Higher is better. Lrm weight FID † MAE † PSNR ! SSIM ! λ=0.1, R-MNet-0.1 26.95 33.40 38.46 0.88 λ=0.3, R-MNet-0.3 4.14 31.57 40.20 0.93 λ=0.4, R-MNet-0.4 3.09 31.91 40.40 0.94 λ=0.5, R-MNet-0.5 4.14 31.0 40.0 0.93 (a) Masked (b) λ = 0 (c) λ = 0.1 (d) λ = 0.4 (e) GT (masked-image), which in turn share the same pixel level. Additionally, when using irregular masks, these regions to be inpainted can be seen by the algorithm as a square bound- ing box that contains both visible and missing pixels, which can cause the GAN to generate images that can contain structural and textural artefacts. To overcome this issue, we modify the usual approach by having two inputs to our model, during the training, the image and its associated binary mask. This allow us to have access to the mask, at the end of our encoding/decoding network, through the spatial preserving operation and gives us the ability to compute the following, that will be used during the loss computation: • A reversed mask applied to the output image. • Use a spatial preserving operation to get the original masked-image. • Use matrix operands to add the reversed mask image back on the masked-image. By using this method of internal masking and restoring, our network can inpaint only the required features while ('<0$. 𝜆 ]&B 𝜆 ]&B^C 𝜆 ]&B^[ 8