Volodymyr Lyubinets “Generative models for images”

Generative models for images
Lviv Data Science Club, July 19, 2019

● Given observable variable X and a target variable Y, generative model is a statistical model of
joint probability distribution X × Y.
● X, Y can be anything from number sequences to images.
Generative Models

● Why images:
Generative Models

● Generating new samples from target distribution.
● Generating samples with particular properties.
● Style transfer.
● Artifact removal.
● … many more.
Generative Models: use cases

Agenda
◉ Autoencoders: Variational Autoencoder
◉ GANs: Cycle GAN
◉ Recent research results: Fader,
PairedCycleGAN, SIMS

● Neural network that aims to learn efficient data representation.
● Often is represented as two networks - encoder and decoder.
● Typically we aim to have Decoder(Encoder(x)) be as close to x as possible.
● Encoder(x) is the “compressed” representation of x.
Autoencoder

● Since encoding size is specifically made to be small, this forces network to capture the most
important details.
● For example: noise removal from MNIST.
● Unlike GANs, they are easy to train, but can produce
blurry images, especially with L2 losses.
● What about generation of new samples?
Autoencoder

● Autoencoder where we split latent representation into mean and variance.
● Enforce mean and variance to be as close to G(0, 1) as possible.
● VAE_loss = content_loss(generated, real) + KL(latent, G(0, 1))
Variational Autoencoder

● Reparameterization trick.
Variational Autoencoder
Picture credit: Bielievtsov

● Layers used.
Variational Autoencoder: code
for MNIST

● Encode and decode.
for MNIST

● Forward step
for MNIST

● Loss function:
for MNIST

● Training loop:
for MNIST

Variational Autoencoder:
interpolation & results

Generative
Adversarial
Networks:
CycleGAN
2

● Two networks - generator and discriminator that compete in a 2-player zero-sum game.
● Generator aims to generate realistic samples, discriminator aims to distinguish them from real
ones.
GAN

● GAN objective (D(x) represents probability that x came from real data):
● TLDR: GANs allow to derive a good loss function in cases when it is unknown,
GAN

● Problem: given two domains, find a way to map between them.
● Combine GANs with an idea of cycle consistency.
● Given two domains X and Y two functions F that maps X -> Y, and G that maps Y-> X, ensure
that F(G(x)) ≃ x and G(F(y)) ≃ y.
CycleGAN

● Cycle consistency loss:
● GAN loss:
● Total loss:
CycleGAN

● Identity mapping trick: for F: X-> Y, we also expect F(y) ≃ y (and G(x) ≃ x).
CycleGAN
< Monet <-> Photo

Link: https://www.youtube.com/watch?v=Fea4kZq0oFQ
CycleGAN: face to face

● UNSUPERVISED!!!
● Highly dependent on network architectures - e.g. when I trained CycleGAN for faces, results
were much worse with several sequential ResNet blocks than with U-NET.
● Can be finicky to train (w.r.t. hyperparameters).
● There are better (and simpler) alternatives for style transfer.
CycleGAN

● Solves the task of makeup transfer (but can be other variations of style/attribute) transfer.
● Key idea: Unlike regular GAN where we expect f(g(x)) ≃ x and g(f(x)) ≃ x, here the functions are
“asymmetric” - here we train a G function that transfers makeup from image y to image x, and
function F that “cleans” the face. We expect G( F(y), G(x, y) ) = y, F(G(x, y)) = x.
● Two discriminators for faces with and without makeup - regular adversarial loss.
PairCycleGAN (CVPR 18)

● Network used for G:
● Losses:
1) Adversarial losses for G and F
2) “Cycle” losses using L1
3) Using L1 cycle only for “style” leads to blurry results, so an extra adv. loss is added.
Final loss is a linear combination of the above (with coefficients)
PairCycleGAN

● The paper employs many other hacks such as using different networks for different parts of the
face.
PairCycleGAN

● Goal: Modify particular attributes of an image (e.g. has / doesn’t have glasses)
Fader Networks (NIPS 17)

● Network with 3 components.
● Key idea: adversary should not be able to guess original attribute of x.
Fader Networks

● Given a semantic layout, generate a realistic image
● Key idea: Don’t generate new patterns, just reuse the ones from the training set and repaint
new regions with them using a NN.
Semi-parametric image
synthesis

● Takes up to 3 minutes to generate a single image (on a GPU)!
● Likely the best picture quality so far
synthesis (CVPR 18)

synthesis

● Can GANs actually create something new?
- One paper at CVPR 2018 argues that’s not really the case.
● Can we achieve great results without defining the structure explicitly?
- Many recent results rely on explicitly defining tasks.
● How much compute power will be required to get realistic results in practical applications?
Open Questions

● Variational Autoencoder paper: https://arxiv.org/pdf/1312.6114.pdf
● GAN paper: https://arxiv.org/pdf/1406.2661.pdf
● CycleGAN paper: https://arxiv.org/pdf/1703.10593.pdf
● SIMS paper: https://arxiv.org/pdf/1804.10992.pdf
● Fader paper: https://arxiv.org/pdf/1706.00409.pdf
● VAE MNIST code:
https://github.com/vlyubin/ml/blob/master/autoencoders_and_gans/vae_mnist.py
Links

Any questions ?
You can reach me
◉ vlyubin@gmail.com
◉ linkedin.com/in/lyubinets
Thanks!

Volodymyr Lyubinets “Generative models for images”

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Volodymyr Lyubinets “Generative models for images”

Similaire à Volodymyr Lyubinets “Generative models for images” (20)

Plus de Lviv Startup Club

Plus de Lviv Startup Club (20)

Dernier

Dernier (20)

Volodymyr Lyubinets “Generative models for images”