SlideShare une entreprise Scribd logo
Paper Summary :
Disentangling by Factorising
Jun-sik Choi
Department of Brain and Cognitive Engineering,
Korea University
November 26, 2019
Overview of paper [2]
To enhancing the disentangled representation, Factor-VAE is
proposed.
Factor-VAE enhances disentanglement by encouraging the
distribution of representations to be factorial (independent
accross the dimensions).
Factor-VAE provides a better trade-off between
disentanglement and reconstruction quality than β-VAE [1].
Also, a new disentnaglement metirc is proposed.
Unsupervised Disentangled Representation
Disentangled Representation
a representation where a change in one dimension corresponds
to a change in one factor of variation, while being relatively
invariant to changes in other factors. [3]
Why disentangled representation matters?[4]
Data can be represented in more interpretable and semantic
manner.
Learned disentangled representations are more transferrable.
Why disentangled representation in unsupervised manner
1. Humans are able to learn factors of variation unsupervised.
2. Labels are costly as obtaining them requires a human in the
loop.
3. Labels assigned by humans might be inconsistent or leave out
the factors that are difficult for humans to identify.
Factor-VAE
Goal
Obtain a better trade-off between disentnaglement and
reconstruction, which is one drawback of β-VAE [1].
How?
Factor-VAE augments the VAE objective with a penalty that
encourages the marginal distribution of representations to be
factorial without substantially affecting the quality of
reconstructions.
The penalty is expressed as a KL divergence between the
marginal distribution and the product of its marginals,
optimized by a discriminator network following the divergence
minimisation view of GANs.
Trade-off between Disentanglement and Reconstruction in
beta-VAE I
Notations and assumptions
- Observations: x(i)
∈ X, i = 1, . . . , N
- Underlying generative factors: f = (f1, . . . , fK )
- Latent code that models f : z ∈ Rd
- p(z) = N(0, I), decoder: pθ(x|z), encoder: qθ(z|x)
Disentanglement of Representation
- Variational posterior for an observation:
qθ(z|x) =
d
j=1
N zj |µj (x), σ2
j (x)
can be seen as the distribution of representation corresponding
to the data point x.
Trade-off between Disentanglement and Reconstruction in
beta-VAE II
- Marginal posterior and disentanglement
q(z) = Epdata (x)[q(z|x)] =
1
N
N
i=1
q z|x(i)
A disentangled represent would have each zj correspond to
precisely one underlying factor fk , so we want q(z) be
independently factorized:
q(z) =
d
j=1
q (zj )
Trade-off between Disentanglement and Reconstruction in
beta-VAE III
Further Decomposition of β-VAE objective
- The β-VAE objective:
1
N
N
i=1
Eq(z|x(i)
) log p x(i)
|z − βKL q z|x(i)
p(z)
is a lower bound of Epdata (x) log p x(i)
Where,
Eq(z|x(i)
) log p x(i)
|z : negative reconstruction error
KL q z|x(i)
p(z) : complexity penalty.
Trade-off between Disentanglement and Reconstruction in
beta-VAE IV
- The KL term can be further decomposed as:
Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z))
proof
Epdata(x)[KL(q(z|x) p(z))]
= Epdata(x)Eq(z|x) log q(z|x)
p(z)
= Epdata(x)Eq(z|x) log q(z|x)
q(z)
q(z)
p(z)
= Epdata(x)Eq(z|x) log q(z|x)
q(z) + log q(z)
p(z)
= Epdata(x)[KL(q(z|x) q(z))] + Eq(x,z) log q(z)
p(z)
= Iq(x; z) + Eq(z) log q(z)
p(z)
= Iq(x; z) + KL(q(z) p(z))
Trade-off between Disentanglement and Reconstruction in
beta-VAE V
Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z))
- When increasing penalty for complexity by setting β > 1,
KL(q(z) p(z)) and I(x; z) are both penalized.
- Penalizing KL(q(z) p(z)) makes q(z) to be factorized as prior
p(z).
- Penalizing I(x; z) reduces amount of information about x
stored in z, which lead to poor reconstruction.
Total Correlation Penalty I
Factor-VAE objective
1
N
N
i=1
Eq(z|x(i)
) log p x(i)
|z −KL q z|x(i)
p(z)
− γKL(q(z) ¯q(z))
where, ¯q(z) := d
j=1 q (zj ) is a lower bound on the marginal
log likelihood Epdata(x)[log p(x)] and directly encourages
independence in the code distribution.
Total correlation [5] KL(q(z) ¯q(z))
A popular measure of dependence for multiple random
variables.
As both q(z)and ¯q(z) are intractable, an alternative approach
for optimizing total correlation is required.
Total Correlation
Total Correlation Penalty II
Alternative way to optimize total correlation
1. Sample q z|x(i)
with uniformly sampled x(i)
.
2. Generate d samples from q(z) and ignoring all but one
dimension for each sample.
Or,
1. Sample a batch from q(z)
2. Randomly permuting across the batch for eatch latent
dimension.
As long as the batch is large enough, the distribution of these
samples will closely approximate ¯q(z).
Total Correlation Penalty III
Minimization of KL divergence
By training a classifier (Discriminator), approximate the density
ratio that arises in the KL term (Density-ratio trick [6]).
TC(z) = KL(q(z) ¯q(z)) = Eq(z) log
q(z)
¯q(z)
≈ Eq(z) log
D(z)
1 − D(z)
The discriminator and VAE trained jointly.
The discriminator is trained to classify between samples from
q(z) and ¯q(z).
Total Correlation Penalty IV
Total Correlation Penalty V
Metric for Disentanglement I
Disentanglement metric proposed in [1]
Weaknesses
1. The metric is sensitive to hyperparameters of the linear
classifier optimization.
2. Learned representations can be a linear combination of several
dimensions, so using linear classifier is inppropiate.
3. The metric has a failure mode. When only K − 1 factors out
of K factors are disentangled, the classifier still gives 100%
accuracy.
Metric for Disentanglement II
Proposed metric for disentanglement
1. Choose a factor k and generate data with this factor fixed, but
all other factors varying randomly.
2. Obtain their representations.
3. Normalize each dimension by its empirical standard deviation s
over the full data (or a large enough random subset).
4. Take the empirical variance Var z
(l)
d /sd in each dimension of
normalized representations.
5. The target index k and index of dimension with the lowest
variance are fed to the majority-vote classifier.
If the representation is perfectly disentangled, the variance of
dimension corresponding to the fixed factor will be 0.
Metric for Disentanglement III
As representations are normalized, the argmin Varl z
(l)
d /sd is
invariant to rescaling of the representations in each dimension.
Majority-vote classification1
1. For each L samples, one vote (ai , bi ),
ai ∈ {1, . . . , D} , bi ∈ {1, . . . , K} is achieved.
2. Given M votes (ai , bi )M
i=1, Voting matrix
Vjk =
M
i=1 I (ai = j, bi = k) is achieved.
3. Then, the majority vote classifier is defined to be
C(j) = arg maxk Vjk .
4. In other words, C(j) is the index of generative factor k which
produces largest number of lowest variance for latent
dimension j.
5. The metric is the accuracy of the classifier
ΣD
j=1VjC(j)
Σj Σk Vjk
.
Note that for majority-vote classifier, there are no optimisation
hyperparameters to tune, and the resulting classifier is a
deterministic function of the training data.
Metric for Disentanglement IV
Comparison between metrics ([1, 2])
1. New disentanglement metric of [2] is much less sensitive to
hyperparameters than old metric of [1].
2. Old metric is very sensitive to number of iterations, and metric
is constantly improves with more iterations.
1
Please refer the code [Link] for more details.
Experiments I
Datasets
Dataset with known generative factors
1. 2D Shapes dataset[7] with n : 737,280, dim : 64 × 64
fk : shape(3), scale(6), orientation(40), x-position(32),
y-position(32)
2. 3D Shales dataset[8] with n : 480,000, dim : 64 × 64 × 3
fk : shape(4), scale(8), orientation(15), floor color(10), wall
color(10), object color(10)
Dataset with unknown generative factors
1. 3D Faces dataset[9] with n : 239,840, dim : 64 × 64 × 3
2. 3D Chairs dataset[10] with n : 86,366, dim : 64 × 64 × 3
3. CelebA dataset (Cropped)[11] with n : 202,599,
dim : 64 × 64 × 3
Experiments II
Effect of γ compared to β in β-VAE
Experiments III
Relationship between γ and reconstruction error
Experiments IV
Total correlation
Experiments V
Latent Traversal - 2D Shapes Dataset
Experiments VI
Latent Traversal - 3D Shapes Dataset
Experiments VII
Latent Traversal - 3D Chair Dataset
Experiments VIII
Latent Traversal - 3D Faces and CelebA
Conclusion
This work introduces FactorVAE, a novel method for
disentangled representation.
A new disentanglement metric is prorposed.
Limitations
Low total correlation is necessary but not sufficient for
disentangling of independent factors of variation. (When all
but one of the latent dimension were to collapse to prior,
TC=0 but not disentangled.)
The proposed metric requires to generate samples holding one
factor fixed, which is not always possible. (When training set
does not cover all possible factors)
The metric is also unsuitable for data with non-independent
factors of variation.
References
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot,
M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae:
Learning basic visual concepts with a constrained variational
framework.,” ICLR, vol. 2, no. 5, p. 6, 2017.
H. Kim and A. Mnih, “Disentangling by factorising,” arXiv
preprint arXiv:1802.05983, 2018.
Y. Bengio, A. Courville, and P. Vincent, “Representation
learning: A review and new perspectives,” IEEE transactions on
pattern analysis and machine intelligence, vol. 35, no. 8,
pp. 1798–1828, 2013.
B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J.
Gershman, “Building machines that learn and think like
people,” Behavioral and Brain Sciences, vol. 40, no. 2017,
2017.
S. Watanabe, “Information theoretical analysis of multivariate
correlation,” IBM Journal of research and development, vol. 4,
Total Correlation
Definition
For a given n random variables {X1, X2, . . . , Xn},
Total correlation is defined as the KL divergence from the joint
distribution p(X1, . . . , Xn) to the independent distribution of
p(X1)p(X2) · · · p(Xn).
TC (X1, X2, . . . , Xn) ≡ DKL [p (X1, . . . , Xn) p (X1) p (X2) · · · p (Xn)]
TC (X1, X2, . . . , Xn) =
n
i=1
H (Xi ) − H (X1, X2, . . . , Xn)
= The amount of information shared
among the variables in the set.
A near-zero TC indicates that the variables in the group are
essentially statistically independent.
Back

Contenu connexe

Tendances

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
WithTheBest
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
Jakub Bartczuk
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
Universitat Politècnica de Catalunya
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Information-Theoretic Metric Learning
Information-Theoretic Metric LearningInformation-Theoretic Metric Learning
Information-Theoretic Metric LearningKoji Matsuda
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
Mostafa G. M. Mostafa
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
Antonio Tejero de Pablos
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
Shyam Krishna Khadka
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptx
MAHMOUD729246
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
Basel RWA and IRB Shortfall
Basel RWA and IRB ShortfallBasel RWA and IRB Shortfall
Basel RWA and IRB Shortfall
Asif Rajani
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
Jason Anderson
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
Gabriel Peyré
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Yunjey Choi
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
Taku Yoshioka
 
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
DaeJin Kim
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
Mark Chang
 

Tendances (20)

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Information-Theoretic Metric Learning
Information-Theoretic Metric LearningInformation-Theoretic Metric Learning
Information-Theoretic Metric Learning
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptx
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Basel RWA and IRB Shortfall
Basel RWA and IRB ShortfallBasel RWA and IRB Shortfall
Basel RWA and IRB Shortfall
 
Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
 
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)Research Trends in Editing image using GAN (TAGAN, Editable GAN)
Research Trends in Editing image using GAN (TAGAN, Editable GAN)
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 

Similaire à Paper Summary of Disentangling by Factorising (Factor-VAE)

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
Prac ex'cises 3[1].5
Prac ex'cises 3[1].5Prac ex'cises 3[1].5
Prac ex'cises 3[1].5
Forensic Pathology
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
Alexander Decker
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
NBER
 
Exponential lindley additive failure rate model
Exponential lindley additive failure rate modelExponential lindley additive failure rate model
Exponential lindley additive failure rate model
eSAT Journals
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
Xi (Shay) Zhang, PhD
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
Frank Nielsen
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
Ankit Tewari
 
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM MethodsA Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
ijdmtaiir
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
Frank Nielsen
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
Krish_ver2
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
Lin JiaMing
 
4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt
PaoloOchengco
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
The Statistical and Applied Mathematical Sciences Institute
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
IAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
prjpublications
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
Riccardo Satta
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
eSAT Journals
 

Similaire à Paper Summary of Disentangling by Factorising (Factor-VAE) (20)

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Prac ex'cises 3[1].5
Prac ex'cises 3[1].5Prac ex'cises 3[1].5
Prac ex'cises 3[1].5
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Exponential lindley additive failure rate model
Exponential lindley additive failure rate modelExponential lindley additive failure rate model
Exponential lindley additive failure rate model
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM MethodsA Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 

Dernier

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 

Dernier (20)

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 

Paper Summary of Disentangling by Factorising (Factor-VAE)

  • 1. Paper Summary : Disentangling by Factorising Jun-sik Choi Department of Brain and Cognitive Engineering, Korea University November 26, 2019
  • 2. Overview of paper [2] To enhancing the disentangled representation, Factor-VAE is proposed. Factor-VAE enhances disentanglement by encouraging the distribution of representations to be factorial (independent accross the dimensions). Factor-VAE provides a better trade-off between disentanglement and reconstruction quality than β-VAE [1]. Also, a new disentnaglement metirc is proposed.
  • 3. Unsupervised Disentangled Representation Disentangled Representation a representation where a change in one dimension corresponds to a change in one factor of variation, while being relatively invariant to changes in other factors. [3] Why disentangled representation matters?[4] Data can be represented in more interpretable and semantic manner. Learned disentangled representations are more transferrable. Why disentangled representation in unsupervised manner 1. Humans are able to learn factors of variation unsupervised. 2. Labels are costly as obtaining them requires a human in the loop. 3. Labels assigned by humans might be inconsistent or leave out the factors that are difficult for humans to identify.
  • 4. Factor-VAE Goal Obtain a better trade-off between disentnaglement and reconstruction, which is one drawback of β-VAE [1]. How? Factor-VAE augments the VAE objective with a penalty that encourages the marginal distribution of representations to be factorial without substantially affecting the quality of reconstructions. The penalty is expressed as a KL divergence between the marginal distribution and the product of its marginals, optimized by a discriminator network following the divergence minimisation view of GANs.
  • 5. Trade-off between Disentanglement and Reconstruction in beta-VAE I Notations and assumptions - Observations: x(i) ∈ X, i = 1, . . . , N - Underlying generative factors: f = (f1, . . . , fK ) - Latent code that models f : z ∈ Rd - p(z) = N(0, I), decoder: pθ(x|z), encoder: qθ(z|x) Disentanglement of Representation - Variational posterior for an observation: qθ(z|x) = d j=1 N zj |µj (x), σ2 j (x) can be seen as the distribution of representation corresponding to the data point x.
  • 6. Trade-off between Disentanglement and Reconstruction in beta-VAE II - Marginal posterior and disentanglement q(z) = Epdata (x)[q(z|x)] = 1 N N i=1 q z|x(i) A disentangled represent would have each zj correspond to precisely one underlying factor fk , so we want q(z) be independently factorized: q(z) = d j=1 q (zj )
  • 7. Trade-off between Disentanglement and Reconstruction in beta-VAE III Further Decomposition of β-VAE objective - The β-VAE objective: 1 N N i=1 Eq(z|x(i) ) log p x(i) |z − βKL q z|x(i) p(z) is a lower bound of Epdata (x) log p x(i) Where, Eq(z|x(i) ) log p x(i) |z : negative reconstruction error KL q z|x(i) p(z) : complexity penalty.
  • 8. Trade-off between Disentanglement and Reconstruction in beta-VAE IV - The KL term can be further decomposed as: Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z)) proof Epdata(x)[KL(q(z|x) p(z))] = Epdata(x)Eq(z|x) log q(z|x) p(z) = Epdata(x)Eq(z|x) log q(z|x) q(z) q(z) p(z) = Epdata(x)Eq(z|x) log q(z|x) q(z) + log q(z) p(z) = Epdata(x)[KL(q(z|x) q(z))] + Eq(x,z) log q(z) p(z) = Iq(x; z) + Eq(z) log q(z) p(z) = Iq(x; z) + KL(q(z) p(z))
  • 9. Trade-off between Disentanglement and Reconstruction in beta-VAE V Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z)) - When increasing penalty for complexity by setting β > 1, KL(q(z) p(z)) and I(x; z) are both penalized. - Penalizing KL(q(z) p(z)) makes q(z) to be factorized as prior p(z). - Penalizing I(x; z) reduces amount of information about x stored in z, which lead to poor reconstruction.
  • 10. Total Correlation Penalty I Factor-VAE objective 1 N N i=1 Eq(z|x(i) ) log p x(i) |z −KL q z|x(i) p(z) − γKL(q(z) ¯q(z)) where, ¯q(z) := d j=1 q (zj ) is a lower bound on the marginal log likelihood Epdata(x)[log p(x)] and directly encourages independence in the code distribution. Total correlation [5] KL(q(z) ¯q(z)) A popular measure of dependence for multiple random variables. As both q(z)and ¯q(z) are intractable, an alternative approach for optimizing total correlation is required. Total Correlation
  • 11. Total Correlation Penalty II Alternative way to optimize total correlation 1. Sample q z|x(i) with uniformly sampled x(i) . 2. Generate d samples from q(z) and ignoring all but one dimension for each sample. Or, 1. Sample a batch from q(z) 2. Randomly permuting across the batch for eatch latent dimension. As long as the batch is large enough, the distribution of these samples will closely approximate ¯q(z).
  • 12. Total Correlation Penalty III Minimization of KL divergence By training a classifier (Discriminator), approximate the density ratio that arises in the KL term (Density-ratio trick [6]). TC(z) = KL(q(z) ¯q(z)) = Eq(z) log q(z) ¯q(z) ≈ Eq(z) log D(z) 1 − D(z) The discriminator and VAE trained jointly. The discriminator is trained to classify between samples from q(z) and ¯q(z).
  • 15. Metric for Disentanglement I Disentanglement metric proposed in [1] Weaknesses 1. The metric is sensitive to hyperparameters of the linear classifier optimization. 2. Learned representations can be a linear combination of several dimensions, so using linear classifier is inppropiate. 3. The metric has a failure mode. When only K − 1 factors out of K factors are disentangled, the classifier still gives 100% accuracy.
  • 16. Metric for Disentanglement II Proposed metric for disentanglement 1. Choose a factor k and generate data with this factor fixed, but all other factors varying randomly. 2. Obtain their representations. 3. Normalize each dimension by its empirical standard deviation s over the full data (or a large enough random subset). 4. Take the empirical variance Var z (l) d /sd in each dimension of normalized representations. 5. The target index k and index of dimension with the lowest variance are fed to the majority-vote classifier. If the representation is perfectly disentangled, the variance of dimension corresponding to the fixed factor will be 0.
  • 17. Metric for Disentanglement III As representations are normalized, the argmin Varl z (l) d /sd is invariant to rescaling of the representations in each dimension. Majority-vote classification1 1. For each L samples, one vote (ai , bi ), ai ∈ {1, . . . , D} , bi ∈ {1, . . . , K} is achieved. 2. Given M votes (ai , bi )M i=1, Voting matrix Vjk = M i=1 I (ai = j, bi = k) is achieved. 3. Then, the majority vote classifier is defined to be C(j) = arg maxk Vjk . 4. In other words, C(j) is the index of generative factor k which produces largest number of lowest variance for latent dimension j. 5. The metric is the accuracy of the classifier ΣD j=1VjC(j) Σj Σk Vjk . Note that for majority-vote classifier, there are no optimisation hyperparameters to tune, and the resulting classifier is a deterministic function of the training data.
  • 18. Metric for Disentanglement IV Comparison between metrics ([1, 2]) 1. New disentanglement metric of [2] is much less sensitive to hyperparameters than old metric of [1]. 2. Old metric is very sensitive to number of iterations, and metric is constantly improves with more iterations. 1 Please refer the code [Link] for more details.
  • 19. Experiments I Datasets Dataset with known generative factors 1. 2D Shapes dataset[7] with n : 737,280, dim : 64 × 64 fk : shape(3), scale(6), orientation(40), x-position(32), y-position(32) 2. 3D Shales dataset[8] with n : 480,000, dim : 64 × 64 × 3 fk : shape(4), scale(8), orientation(15), floor color(10), wall color(10), object color(10) Dataset with unknown generative factors 1. 3D Faces dataset[9] with n : 239,840, dim : 64 × 64 × 3 2. 3D Chairs dataset[10] with n : 86,366, dim : 64 × 64 × 3 3. CelebA dataset (Cropped)[11] with n : 202,599, dim : 64 × 64 × 3
  • 20. Experiments II Effect of γ compared to β in β-VAE
  • 21. Experiments III Relationship between γ and reconstruction error
  • 23. Experiments V Latent Traversal - 2D Shapes Dataset
  • 24. Experiments VI Latent Traversal - 3D Shapes Dataset
  • 25. Experiments VII Latent Traversal - 3D Chair Dataset
  • 26. Experiments VIII Latent Traversal - 3D Faces and CelebA
  • 27. Conclusion This work introduces FactorVAE, a novel method for disentangled representation. A new disentanglement metric is prorposed. Limitations Low total correlation is necessary but not sufficient for disentangling of independent factors of variation. (When all but one of the latent dimension were to collapse to prior, TC=0 but not disentangled.) The proposed metric requires to generate samples holding one factor fixed, which is not always possible. (When training set does not cover all possible factors) The metric is also unsuitable for data with non-independent factors of variation.
  • 28. References I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework.,” ICLR, vol. 2, no. 5, p. 6, 2017. H. Kim and A. Mnih, “Disentangling by factorising,” arXiv preprint arXiv:1802.05983, 2018. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences, vol. 40, no. 2017, 2017. S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM Journal of research and development, vol. 4,
  • 29. Total Correlation Definition For a given n random variables {X1, X2, . . . , Xn}, Total correlation is defined as the KL divergence from the joint distribution p(X1, . . . , Xn) to the independent distribution of p(X1)p(X2) · · · p(Xn). TC (X1, X2, . . . , Xn) ≡ DKL [p (X1, . . . , Xn) p (X1) p (X2) · · · p (Xn)] TC (X1, X2, . . . , Xn) = n i=1 H (Xi ) − H (X1, X2, . . . , Xn) = The amount of information shared among the variables in the set. A near-zero TC indicates that the variables in the group are essentially statistically independent. Back