2. ● Given observable variable X and a target variable Y, generative model is a statistical model of
joint probability distribution X × Y.
● X, Y can be anything from number sequences to images.
Generative Models
4. ● Generating new samples from target distribution.
● Generating samples with particular properties.
● Style transfer.
● Artifact removal.
● … many more.
Generative Models: use cases
7. ● Neural network that aims to learn efficient data representation.
● Often is represented as two networks - encoder and decoder.
● Typically we aim to have Decoder(Encoder(x)) be as close to x as possible.
● Encoder(x) is the “compressed” representation of x.
Autoencoder
8. ● Since encoding size is specifically made to be small, this forces network to capture the most
important details.
● For example: noise removal from MNIST.
● Unlike GANs, they are easy to train, but can produce
blurry images, especially with L2 losses.
● What about generation of new samples?
Autoencoder
9. ● Autoencoder where we split latent representation into mean and variance.
● Enforce mean and variance to be as close to G(0, 1) as possible.
● VAE_loss = content_loss(generated, real) + KL(latent, G(0, 1))
Variational Autoencoder
18. ● Two networks - generator and discriminator that compete in a 2-player zero-sum game.
● Generator aims to generate realistic samples, discriminator aims to distinguish them from real
ones.
GAN
19. ● GAN objective (D(x) represents probability that x came from real data):
● TLDR: GANs allow to derive a good loss function in cases when it is unknown,
GAN
20. ● Problem: given two domains, find a way to map between them.
● Combine GANs with an idea of cycle consistency.
● Given two domains X and Y two functions F that maps X -> Y, and G that maps Y-> X, ensure
that F(G(x)) ≃ x and G(F(y)) ≃ y.
CycleGAN
26. ● UNSUPERVISED!!!
● Highly dependent on network architectures - e.g. when I trained CycleGAN for faces, results
were much worse with several sequential ResNet blocks than with U-NET.
● Can be finicky to train (w.r.t. hyperparameters).
● There are better (and simpler) alternatives for style transfer.
CycleGAN
28. ● Solves the task of makeup transfer (but can be other variations of style/attribute) transfer.
● Key idea: Unlike regular GAN where we expect f(g(x)) ≃ x and g(f(x)) ≃ x, here the functions are
“asymmetric” - here we train a G function that transfers makeup from image y to image x, and
function F that “cleans” the face. We expect G( F(y), G(x, y) ) = y, F(G(x, y)) = x.
● Two discriminators for faces with and without makeup - regular adversarial loss.
PairCycleGAN (CVPR 18)
29. ● Network used for G:
● Losses:
1) Adversarial losses for G and F
2) “Cycle” losses using L1
3) Using L1 cycle only for “style” leads to blurry results, so an extra adv. loss is added.
Final loss is a linear combination of the above (with coefficients)
PairCycleGAN
30. ● The paper employs many other hacks such as using different networks for different parts of the
face.
PairCycleGAN
31. ● Goal: Modify particular attributes of an image (e.g. has / doesn’t have glasses)
Fader Networks (NIPS 17)
32. ● Network with 3 components.
● Key idea: adversary should not be able to guess original attribute of x.
Fader Networks
33. ● Given a semantic layout, generate a realistic image
● Key idea: Don’t generate new patterns, just reuse the ones from the training set and repaint
new regions with them using a NN.
Semi-parametric image
synthesis
34. ● Takes up to 3 minutes to generate a single image (on a GPU)!
● Likely the best picture quality so far
Semi-parametric image
synthesis (CVPR 18)
36. ● Can GANs actually create something new?
- One paper at CVPR 2018 argues that’s not really the case.
● Can we achieve great results without defining the structure explicitly?
- Many recent results rely on explicitly defining tasks.
● How much compute power will be required to get realistic results in practical applications?
Open Questions