Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
1. Multi-Domain Image Completion
for Random Missing Input Data
Stanford University, NVIDIA, and National Institutes of Health
Yonsei University Severance Hospital CCIDS
Choi Dongmin
2. Introduction
• Multi-domain images could provide complementary knowledge
- ex. Four MRI modalities (T1, T1CE, T2, FLAIR) provide distinct features to locate tumor
boundaries from different diagnosis perspective
- ex. Person re-identification across different cameras or times
• However, some image domains might be missing in practice
- Solution 1. Nearest neighbor approach : lack of semantic consistency
- Solution 2. Generative models
• ReMIC (Representational disentanglement schemes for Multi-domain
Image Completion)
- -to- image completion framework
- utilized for the high-level task by joint training (ex. segmentation)
- completes the missing domains given random distributed numbers of visible domains
- consistent performance improvement on three datasets
n n
3. Related Works
Image-to-Image
Translation
J.Y Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017
- Impressive performance via cycle-consistency loss
- only 1-to-1 mapping
• CycleGAN
4. Related Works
Image-to-Image
Translation
Y Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018
J Yoon et al. RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using GANs. ICML 2018
- Multi-domain image generation
- only 1-to- mapping (generation is always conditioned on the single input
image as the only source domain)
n
• StarGAN & RadialGAN
StarGAN RadialGAN
5. Related Works
Image-to-Image
Translation
D Lee et al. CollaGAN: Collaborative GAN for Missing Image Data Imputation. CVPR 2019
- Collaborative model to incorporate multiple domains for generating one
missing domain
- only -to-1 mappingn
• CollaGAN
7. Related Works
Learning
Disentangled
Representations
• Learning Disentangled Representations
- to capture the full distribution of possible outputs by introducing a random style code
- to transfer information across domains for adaptation
- InfoGAN and -VAE learn the disentangled representation in unsupervised mannerβ
Xi Chen et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. NIPS 2016
I Higgins et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017
https://www.slideshare.net/NaverEngineering/ss-96581209
8. Related Works
Learning
Disentangled
Representations
• DRIT and MUNIT
- disentangles content and attribute features in image translation
- However, only 1-to-1 translation
H.Y Lee et al. Diverse image-to- image translation via disentangled representations. ECCV 2018
X Huang et al. Multimodal Unsupervised Image-to-Image Translation. ECCV 2018
9. Related Works
Learning
Disentangled
Representations
• Liu et al
- tackles multi-domain learning cross-domain latent code
- less discussion about the domain-specific style code
Liu et al. A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation. NIPS 2018
10. Related Works Medical
Image Synthesis
• Previous works also discuss how to extract representations from multi-
modalities especially for segmentation with missing modalities
- However, fuse the features from multiple modalities but not from the perspective of
representation disentanglement
V Nguyen et al. Cross-domain synthesis of medical images using efficient location-sensitive deep network. MICCAI 2015
M Havaei et al. HeMIS: Hetero-Modal Image Segmentation. MICCAI 2016
A Chartsias et al.Multimodal mr synthesis via modality-invariant latent representation. IEEE transactions on medical imaging 2017
12. Method
- Image decomposition
: Shared content structure (skeleton) + Unique characteristics (flesh)
- Missing image reconstruction during the testing
: Shared skeleton from available domains + Sampled flesh from the learned model
Style code (Domain-specific)
- Style encoder Es
i (xi) = si (1 ≤ i ≤ N)
Content code (Shared)
- Content encoder Ec
(x1, x2, …, xN) = c
13. Method
- Content codes visualization (randomly selected 8 out of 256 channels) of BraTs
: Various focuses on different anatomical structures (ex. tumor, brain, skull) are
demonstrated by different channel-wise feature maps
Input images
14. Method
- Generation : Style codes from a prior distribution + Content Code
-
si c
Gi(c, si) = ˜xi
Image Generation Process
15. Method
Segmentation Branch
- Segmentation generator after content codes
- Assumption : The content codes contain essential image structure information
- Joint training (generation loss + segmentation Dice loss)
: adaptively learn how to generate missing images
GS
17. Method
Total Loss
Training Loss
Adversarial
(λadv = 1)
Image
Consistency
(λx
cyc = 10)
Style Latent
Consistency
(λs
cyc = 1)
Reconstruction
(λrec = 20)
Content Latent
Consistency
(λc
cyc = 1)
Segmentation
(λseg = 1)
18. Experiments
• BraTS 2018 dataset
- Multi-modal brain MRI with four modalities : T1, T1Gd, T2, FLAIR
- Following CollaGAN, 218 training and 28 testing samples randomly selected
- A set of 2D slices (40,148 training / 5,340 test) extracted from 3D volumes
- Resized to 256 256
- Three tumor categories
: Enhancing tumor (ET), tumor core (TC), and whole tumor (WT)
×
D Lee et al. CollaGAN: Collaborative GAN for Missing Image Data Imputation. CVPR 2019
B.H Menze et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE TMI
19. Experiments
• ProstateX dataset
- Multi-parametric prostate MR scans for 98 subjects : T2, ADC, HighB
- 78 training and 20 testing samples randomly selected
- A set of 2D slices (3,540 training / 840 test) extracted from 3D volumes
- Resized to 256 256
- Prostate regions are manually labeled as the whole prostate (WP)
×
G Litjens et al. Computer-aided detection of prostate cancer in MRI. IEEE TMI
20. Experiments
• RaFD (Radboud Faces Database)
- Eight facial expressions
: neutral, angry, contemptuous, disgusted, fearful, happy, sand, and surprised
- Following StarGAN, adopt image from three camera angles with three gaze
directions
- 3,888 training (54 participants) / 936 test (13 participants)
- Cropped with the face in the center and Resized to 128 128×
Y Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018
http://www.socsci.ru.nl:8180/RaFD2/RaFD
21. Results
• Multi-Domain Image Completion on Domains
- Only One Missing Domain ( -to-1)
* Training : the one missing domain is randomly distributed
* Testing : Fix the one missing domain and generate outputs only on that
- More than One Missing Domains ( -to- )
* Training : randomly selected visible domains
* Testing : Fix while these domains are randomly selected visible
domains. Evaluate all the generated images
- Evaluation metrics
* NRMSE (Normalized Root Mean Squared Error)
* SSIM (Structural Similarity)
* PSNR (Peak Signal-to-Noise Ratio)
N
n
n n
k (k ∈ {1,…, N − 1})
k k
N
22. Results
• Multi-Domain Image Completion
- Comparison with MUNIT, StarGAN, and CollaGAN
- ReMIC w/o Recon : ReMIC without reconstruction loss (single missing domain)
- ReMIC-Random : random visible domains (multiple missing domains)(k = * ) k
23. Results
• Multi-Domain Image Completion
- Comparison with MUNIT, StarGAN, and CollaGAN
- ReMIC w/o Recon : ReMIC without reconstruction loss (single missing domain)
- ReMIC-Random : random visible domains (multiple missing domains)(k = * ) k
28. Results
• Multi-Domain Segmentation
- Oracle : fully supervised 2D U-Net variation without missing images
- Oracle+* : the missing images generated from the “*” method
with the pre-trained “Oracle” model (All : without any missing domains)
- ReMIC+Seg : separate content encoders for image generation and
segmentation tasks
- ReMIC+Joint : sharing the weights of content encoder for the two tasks
29. Conclusion
• A general framework for multi-domain image completion, given
that one or more input domains are missing
• Learning shared content and domain-specific style encoding
across multiple domains
• Well generalized to both natural and medical images
• Extended for a unified image generation and segmentation
framework for missing-domain segmentation task
30. Question
• According to this paper, “different modalities provide distinct
features to locate tumor boundaries from differential diagnosis
perspectives”.
But ReMIC uses a content code, which encodes the shared
skeleton, as an input for the connected segmentation generator.
Isn’t is a contradiction?
31. ICLR 2020 Reviews
• The main contribution is representational disentanglement,
namely the content and style separation, but there is no explicit
evidence that this separation is really happened
• Evaluation on high-resolution dataset such as CelebHQ and other
conventional metrics such as FID
https://openreview.net/forum?id=rkg_wREYDS
32. Appendix. A : Implementation Details
• A.1 Hyperparameters
- Adam optimizer
- Batch size 1 and 100,000 iterations
- Style code dimension : 8
- During testing, a fixed style code of 0.5 in each dimension
• A.2 Network Architectures (Check details in paper)
- ReMIC is developed on the backbone of MUNIT
- Unified Content Encoder : Down-sampling module + Residual Blocks (IN)
- Style Encoder : Down-sampling module + Residual Blocks + GAP + FC
- Generator : Four residual blocks + Up-sampling + AdaIN*
- Discriminator : Four convolutional blocks
- Segmentor : U-Net shaped network
(β1 = 0.5, β2 = 0.999)
X Huang et al. Arbitrary style transfer in real-time with adaptive instance normalization. ICCV 2017
33. Appendix. C : Extended Ablation Study and
Results for Multi-domain Segmentation
• C.4 Analysis of missing-domain segmentation results