Deep Generative Learning for All

Universitat Politècnica de Catalunya
Universitat Politècnica de CatalunyaAssociate Professor at Universitat Politècnica de Catalunya à Universitat Politècnica de Catalunya
Deep Generative
Learning for All
(a.k.a. The GenAI Hype)
Xavier Giro-i-Nieto
@DocXavi
xavigiro.upc@gmail.com
Associate Professor (on leave)
Universitat Politècnica de Catalunya
Institut de Robòtica Industrial
ELLIS Unit Barcelona
Spring 2020
[Summer School website]
2
Acknowledgements
Santiago Pascual
santi.pascual@upc.edu
@santty128
PhD 2019
Universitat Politecnica de Catalunya
Technical University of Catalonia
Albert Pumarola
apumarola@iri.upc.edu
@AlbertPumarola
PhD 2021
Universitat Politècnica de Catalunya
Technical University of Catalonia
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Gerard I. Gállego
PhD Student
Universitat Politècnica de Catalunya
gerard.ion.gallego@upc.edu
@geiongallego
3
Acknowledgements
Eduard Ramon
Applied Scientist
Amazon Barcelona
@eram1205
Wentong Liao
Applied Scientist
Amazon Barcelona
Ciprian Corneanu
Applied Scientist
Amazon Seattle
Laia Tarrés
PhD Student
Universitat Politècnica de Catalunya
laia.tarres@upc.edu
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Image generation
5
#StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and
Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
6
#DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022.
Image generation
7
#DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional
Image Generation with CLIP Latents." 2022. [blog]
Text-to-Image generation
8
Text-to-Video generation
#Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al.
"Make-a-video: Text-to-video generation without text-video data." arXiv 2022.
“A dog wearing a Superhero
outfit with red cape flying
through the sky”
Synthetic labels to train discriminative models
9
#BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio
Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
Video Super-resolution
10
#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for
GAN-based video generation. ACM Transactions on Graphics 2020.
Human Motion Transfer
11
#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
Speech Enhancement
12
Recover lost information/add enhancing details by learning the natural distribution of audio
samples.
original
enhanced
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
14
Discriminative vs Generative Models
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
a. Pθ
(Y|X): Discriminative Models
b. Pθ
(X): Generative Models
c. Pθ
(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(Y|X): Discriminative Models
16
Slide credit:
Albert Pumarola (UPC 2019)
Classification Regression
Text Prob. of being a Potential Customer
Image
Audio Speech Translation
Jim Carrey
What Language?
X=Data
Y=Labels
θ = Model parameters
Discriminative Modeling
Pθ
(Y|X)
17
0.01
0.09
0.9
input
Network (θ) output
class
Figure credit: Javier Ruiz (UPC TelecomBCN)
Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’
inputs.
Pθ
(Y | X = [pixel1
, pixel2
, …, pixel784
])
Pθ
(Y|X): Discriminative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
19
Slide Concept: Albert Pumarola (UPC 2019)
Pθ
(X): Generative Models
Classification Regression Generative
Text Prob. of being a Potential Customer
“What about Ron magic?” offered Ron.
To Harry, Ron was loud, slow and soft
bird. Harry did not like to think about
birds.
Image
Audio Language Translation
Music Composer and Interpreter
MuseNet Sample
Jim Carrey
What Language?
Discriminative Modeling
Pθ
(Y|X)
Generative Modeling
Pθ
(X)
X=Data
Y=Labels
θ = Model parameters
Each real sample xi
comes from
an M-dimensional probability
distribution P(X).
X = {x1
, x2
, …, xN
}
Pθ
(X): Generative Models
21
1) We want our model with parameters θ to output samples with distribution
Pθ
(X), matching the distribution of our training data P(X).
2) We can sample points from Pθ
(X) plausibly looking how P(X) distributed.
P(X)
Distribution of training data
Pλ,μ,σ
(X)
Distribution of training data
Example: Gaussian Mixture Models (GMM)
Pθ
(X): Generative Models
22
What are the parameters θ we need to estimate in deep neural networks ?
θ = (weights & biases)
output
Network (θ)
?
Pθ
(X): Generative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(X|Y): Conditioned Generative Models
Joint probabilities P(X|Y) to
model conditioning variables on
the generative process:
X = {x1
, x2
, …, xN
}
Y = {y1
, y2
, …, yN
}
DOG
CAT
TRUCK
PIZZA
THRILLER
SCI-FI
HISTORY
/aa/
/e/
/o/
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. Generative Adversarial Networks (GANs)
b. Auto-regressive
c. Variational Autoencoders (VAEs)
d. Diffusion
Our learned model should be able to make up new samples from the distribution,
not just copy and paste existing samples!
26
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
Sampling
Philip Isola, Generative Models of Images. MIT 2023.
Sampling
Slide concept: Albert Pumarola (UPC 2019)
Learn
Sample Out
Training Dataset
Generated Samples
Feature
space
Manifold Pθ
(X)
“Model the data distribution so that we can sample new points out of the
distribution”
Sampling
Sampling
z
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sampling
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sample z from a known prior, for example, a multivariate normal distribution N(0, I).
Example: dim(z)=2
x’
z
Slide concept: Albert Pumarola (UPC 2019)
Learn
Training Dataset
Interpolated Samples
Feature
space
Manifold Pθ
(X)
Traversing the learned manifold through interpolation.
Interpolation
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
35
Credit: Santiago Pascual [slides] [video]
36
Generator & Discriminator
We have two modules: Generator (G) and Discriminator (D).
● They “fight” against each other during training→ Adversarial Learning
D’s goal:
Classify between real
samples and those
produced by G.
G’s goal:
Fool D to
missclassify.
Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
37
Discriminator
Discriminator network D → binary classifier between real (x) and generated (x’).
samples.
Generated (1)
Discriminator
(θ)
x’
Discriminator
(θ)
x Real (0)
38
Generator
Real world
samples
Database
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
z
Generator & Discriminator
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
100
100
FAKE: It’s
not even
green
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
There is no
watermark
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
Watermark
should be
rounded
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
After enough iterations, and if the counterfeiter is good enough (in terms of G network it
means “has enough parameters”), the police should be confused.
REAL?
FAKE?
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Adversarial Training
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
Alternate between training the discriminator and generator
Neural Network
Neural Network
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
1. Fix generator weights, draw samples from both real world and generated images
2. Train discriminator to distinguish between real world and generated images
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: Generator
1. Fix discriminator weights
2. Sample from generator by injecting noise.
3. Backprop error through discriminator to update generator weights
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
Generated
Adversarial Training: Generator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
generator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: How to make it work ?
Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016.
NeurIPS Barcelona 2016
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Non-Conditional GANs
51
Slide credit: Víctor Garcia
Discriminator
D(·)
Generator
G(·)
Real World
Random
seed (z)
Real/Generated
52
Conditional GANs (cGAN)
Slide credit: Víctor Garcia
Conditional Adversarial Networks
Real World
Real/Generated
Condition
Discriminator
D(·)
Generator
G(·)
53
Learn more about GANs
Ian Goodfellow.
NeurIPS Barcelona 2016.
Mihaela Rosca & Jeff Donahue.
UCL x Deepmind 2020.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Manifold Pθ
(X)
Encode Decode
“Generate”
56
Auto-Encoder (AE)
z
Feature
space
● Learns Pθ
(X) with a reconstruction loss.
● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
57
Auto-Encoder (AE)
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
?
58
Auto-Encoder (AE)
No, because the noise (or encoded noise) would be out of the learned manifold.
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
60
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal
distribution.
Encode
Encode
Loss term to follow a normal
distribution N(0, I).
61
Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145
Maths 101: Multivariate normal distribution
62
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑).
Encode
z
Decode Reconstruction
loss term.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
z
Encode Decode
Challenge:
We cannot backprop through sampling of because “Sampling” is not differentiable!
64
Reparametrization Trick
z
Solution: Reparameterization trick
Sample and define z from it, multiplying by and summing
65
Reparametrization Trick
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Generative behaviour
z
67
How can we now generate new samples once the underlying generating
distribution is learned ?
z1
We can sample from our prior N(0,I), discarding the encoder path.
z2
z3
68
Generative behaviour
69
Generative behaviour
N(0, I)
Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a
powerful non-linear function g(z).
70
Generative behaviour
#NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
71
Walking around z manifold dimensions gives us spontaneous generation of
samples with different shapes, poses, identities, lightning, etc..
Generative behaviour
Learn more about VAEs
72
Andriy Mnih (UCL - Deepmind 2020)
Max Welling - University of Amsterdam (2020)
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Forward Diffusion Process
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Denoising Autoencoder (DAE)
Encode Decode
“Generate”
#DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust
features with denoising autoencoders." ICML 2008.
Philip Isola, Generative Models of Images. MIT 2023.
Reverse Denoising process
Data Manifold Pθ
(x0
)
x0
xT
Noise
Image
Network learns to
denoise step by step
CNN
U-net
Reverse Denoising process
What is the dimension of the latent variable in diffusion models ?
Same dimensionality as the diffused data.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Motivation
PixelRNN
An RNN predicts the probability of each sample xi
with a categorical output
distribution: Softmax
83
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
PixelRNN
84
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
Why are not all completions identical ?
(aka how can AR offer a generative behaviour ?)
PixelCNN
85
#PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with
pixelcnn decoders. NeurIPS 2016.
Wavenet
86
Wavenet used dilated convolutions to produce synthetic audio, sample by
sample, conditioned over by receptive field of size T:
#Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
Auto-regressive (at test).
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Text completion
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Condition Generated completions
In a shocking finding, scientist
discovered a herd of unicorns
living in a remote, previously
unexplored valley, in the Andes
Mountains. Even more surprising to
the researchers was the fact that
the unicorns spoke perfect
English.
The scientist named the population,
after their distinctive horn, Ovid’s
Unicorn. These four-horned, silver-white
unicorns were previously unknown to
science.
Now, after almost two centuries, the
mystery of what sparked this odd
phenomenon is finally solved.
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
GPT-2/3 can also solve tasks for which it was not trained for (zero-shot
learning).
Text Reading Comprehension
The 2008 Summer Olympics torch relay was run from March 24
until August 8, 2008, prior to the 2008 Summer Olympics,
with the theme of “one world, one dream”. Plans for the
relay were announced on April 26, 2007, in Beijing, China.
The relay, also called by the organizers as the “Journey of
Harmony”, lasted 129 days and carried the torch 137,000 km
(85,000 mi) – the longest distance of any Olympic torch
relay since the tradition was started ahead of the 1936
Summer Olympics.
After being lit at the birthplace of the Olympic Games in
Olympia, Greece on March 24, the torch traveled to the
Panathinaiko Stadium in Athens, and then to Beijing,
arriving on March 31. From Beijing, the torch was following
a route passing through six continents. The torch has
visited cities along the Silk Road, symbolizing ancient
links between China and the rest of the world. The relay
also included an ascent with the flame to the top of Mount
Everest on the border of Nepal and Tibet, China from the
Chinese side, which was closed specially for the event.
Q: What was the theme?
A: “one world, one dream”.
Q: What was the length of the race?
A: 137,000 km
Q: Was it larger than previous ones?
A: No
Q: Where did the race begin?
A: Olympia, Greece
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Zero-shot task performances
(GPT-2 was never trained for these tasks)
#iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML
2020.
GPT-2 / GPT-3
#ChatGPT [blog]
#GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog]
ChatGPT / GPT-4
Discussion
Learn more about AR models
Nal Kalchbrenner, Mediterranean Machine Learning
Summer School 2022.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
97
Source: David Foster
Recommended books
Interview of David Foster for Machine
Learning Street Talk (2023)
Recommended courses
Deep Unsupervised Learning
(UC Berkeley CS294-158-SP2020)
1 sur 99

Recommandé

Semantic segmentation with Convolutional Neural Network Approaches par
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
1.2K vues29 diapositives
MobileNet - PR044 par
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
8.5K vues26 diapositives
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기 par
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
23.1K vues82 diapositives
Deep convolutional neural fields for depth estimation from a single image par
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageWei Yang
2.2K vues27 diapositives
Deep Learning and the state of AI / 2016 par
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
15.3K vues91 diapositives
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I... par
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
2.1K vues34 diapositives

Contenu connexe

Tendances

Domain adaptation par
Domain adaptationDomain adaptation
Domain adaptationTomoya Koike
330 vues14 diapositives
Convolutional Neural Network par
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
267 vues10 diapositives
Nonlinear dimension reduction par
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
2.8K vues36 diapositives
Depth estimation do we need to throw old things away par
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayNAVER Engineering
1.7K vues119 diapositives
Explicit Density Models par
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
563 vues63 diapositives
Pr045 deep lab_semantic_segmentation par
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
2.7K vues78 diapositives

Tendances(20)

Nonlinear dimension reduction par Yan Xu
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
Yan Xu2.8K vues
Depth estimation do we need to throw old things away par NAVER Engineering
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
NAVER Engineering1.7K vues
Explicit Density Models par Sangwoo Mo
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo563 vues
Pr045 deep lab_semantic_segmentation par Taeoh Kim
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
Taeoh Kim2.7K vues
Faster R-CNN - PR012 par Jinwon Lee
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee9.4K vues
An introduction on normalizing flows par Grigoris C
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
Grigoris C299 vues
Deep Learning: Recurrent Neural Network (Chapter 10) par Larry Guo
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo3.8K vues
Super resolution in deep learning era - Jaejun Yoo par JaeJun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo1.8K vues
PR-132: SSD: Single Shot MultiBox Detector par Jinwon Lee
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee3.5K vues
You Only Look Once: Unified, Real-Time Object Detection par DADAJONJURAKUZIEV
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
Camera-Based Road Lane Detection by Deep Learning II par Yu Huang
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
Yu Huang812 vues
Meta-Learning with Memory-Augmented Neural Networks (MANN) par Yeonsu Kim
Meta-Learning with Memory-Augmented Neural Networks (MANN)Meta-Learning with Memory-Augmented Neural Networks (MANN)
Meta-Learning with Memory-Augmented Neural Networks (MANN)
Yeonsu Kim327 vues
PR-395: Variational Image Compression with a Scale Hyperprior par Hyeongmin Lee
PR-395: Variational Image Compression with a Scale HyperpriorPR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale Hyperprior
Hyeongmin Lee286 vues
Support Vector Machine without tears par Ankit Sharma
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma24.9K vues
Intro to Deep Learning for Computer Vision par Christoph Körner
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
Christoph Körner2.2K vues

Similaire à Deep Generative Learning for All

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ... par
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
515 vues66 diapositives
GAN - Theory and Applications par
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
9.5K vues41 diapositives
EuroSciPy 2019 - GANs: Theory and Applications par
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
1.1K vues41 diapositives
Lecture17 xing fei-fei par
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-feiTianlu Wang
417 vues120 diapositives
Adversarial examples in deep learning (Gregory Chatel) par
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)MeetupDataScienceRoma
1.3K vues39 diapositives
Using model-based statistical inference to learn about evolution par
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolutionErick Matsen
1.9K vues73 diapositives

Similaire à Deep Generative Learning for All(20)

EuroSciPy 2019 - GANs: Theory and Applications par Emanuele Ghelfi
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
Emanuele Ghelfi1.1K vues
Lecture17 xing fei-fei par Tianlu Wang
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
Tianlu Wang417 vues
Using model-based statistical inference to learn about evolution par Erick Matsen
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolution
Erick Matsen1.9K vues
Distributed Meta-Analysis System par jarising
Distributed Meta-Analysis SystemDistributed Meta-Analysis System
Distributed Meta-Analysis System
jarising8.3K vues
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B... par NTNU
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU459 vues
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B... par Albert Orriols-Puig
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo... par Codiax
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Codiax161 vues
ISBA 2022 Susie Bayarri lecture par Pierre Jacob
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
Pierre Jacob448 vues
Striving to Demystify Bayesian Computational Modelling par Marco Wirthlin
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
Marco Wirthlin280 vues
Dirty data science machine learning on non-curated data par Gael Varoquaux
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Gael Varoquaux20K vues

Plus de Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto par
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
290 vues94 diapositives
The Transformer - Xavier Giró - UPC Barcelona 2021 par
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021Universitat Politècnica de Catalunya
258 vues53 diapositives
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI... par
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
183 vues92 diapositives
Open challenges in sign language translation and production par
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya
187 vues83 diapositives
Generation of Synthetic Referring Expressions for Object Segmentation in Videos par
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
522 vues42 diapositives
Discovery and Learning of Navigation Goals from Pixels in Minecraft par
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya
193 vues40 diapositives

Plus de Universitat Politècnica de Catalunya(20)

Dernier

ALGAL PRODUCTS.pptx par
ALGAL PRODUCTS.pptxALGAL PRODUCTS.pptx
ALGAL PRODUCTS.pptxRASHMI M G
7 vues17 diapositives
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... par
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...SwagatBehera9
5 vues36 diapositives
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... par
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...InsideScientific
115 vues62 diapositives
Assessment and Evaluation GROUP 3.pdf par
Assessment and Evaluation GROUP 3.pdfAssessment and Evaluation GROUP 3.pdf
Assessment and Evaluation GROUP 3.pdfkimberlyndelgado18
10 vues10 diapositives
Factors affecting fluorescence and phosphorescence.pptx par
Factors affecting fluorescence and phosphorescence.pptxFactors affecting fluorescence and phosphorescence.pptx
Factors affecting fluorescence and phosphorescence.pptxSamarthGiri1
7 vues11 diapositives
Note on the Riemann Hypothesis par
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesisvegafrank2
8 vues20 diapositives

Dernier(20)

Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... par SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 vues
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... par InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific115 vues
Factors affecting fluorescence and phosphorescence.pptx par SamarthGiri1
Factors affecting fluorescence and phosphorescence.pptxFactors affecting fluorescence and phosphorescence.pptx
Factors affecting fluorescence and phosphorescence.pptx
SamarthGiri17 vues
Note on the Riemann Hypothesis par vegafrank2
Note on the Riemann HypothesisNote on the Riemann Hypothesis
Note on the Riemann Hypothesis
vegafrank28 vues
Exploring the nature and synchronicity of early cluster formation in the Larg... par Sérgio Sacani
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...
Sérgio Sacani1.4K vues
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... par ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI9 vues
selection of preformed arch wires during the alignment stage of preadjusted o... par MaherFouda1
selection of preformed arch wires during the alignment stage of preadjusted o...selection of preformed arch wires during the alignment stage of preadjusted o...
selection of preformed arch wires during the alignment stage of preadjusted o...
MaherFouda17 vues
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... par ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI6 vues
Oral_Presentation_by_Fatma (2).pdf par fatmaalmrzqi
Oral_Presentation_by_Fatma (2).pdfOral_Presentation_by_Fatma (2).pdf
Oral_Presentation_by_Fatma (2).pdf
fatmaalmrzqi8 vues
별헤는 사람들 2023년 12월호 전명원 교수 자료 par sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople68 vues
2. Natural Sciences and Technology Author Siyavula.pdf par ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa11 vues

Deep Generative Learning for All

  • 1. Deep Generative Learning for All (a.k.a. The GenAI Hype) Xavier Giro-i-Nieto @DocXavi xavigiro.upc@gmail.com Associate Professor (on leave) Universitat Politècnica de Catalunya Institut de Robòtica Industrial ELLIS Unit Barcelona Spring 2020 [Summer School website]
  • 2. 2 Acknowledgements Santiago Pascual santi.pascual@upc.edu @santty128 PhD 2019 Universitat Politecnica de Catalunya Technical University of Catalonia Albert Pumarola apumarola@iri.upc.edu @AlbertPumarola PhD 2021 Universitat Politècnica de Catalunya Technical University of Catalonia Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Gerard I. Gállego PhD Student Universitat Politècnica de Catalunya gerard.ion.gallego@upc.edu @geiongallego
  • 3. 3 Acknowledgements Eduard Ramon Applied Scientist Amazon Barcelona @eram1205 Wentong Liao Applied Scientist Amazon Barcelona Ciprian Corneanu Applied Scientist Amazon Seattle Laia Tarrés PhD Student Universitat Politècnica de Catalunya laia.tarres@upc.edu
  • 4. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 5. Image generation 5 #StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
  • 6. 6 #DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022. Image generation
  • 7. 7 #DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional Image Generation with CLIP Latents." 2022. [blog] Text-to-Image generation
  • 8. 8 Text-to-Video generation #Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al. "Make-a-video: Text-to-video generation without text-video data." arXiv 2022. “A dog wearing a Superhero outfit with red cape flying through the sky”
  • 9. Synthetic labels to train discriminative models 9 #BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
  • 10. Video Super-resolution 10 #TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics 2020.
  • 11. Human Motion Transfer 11 #EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
  • 12. Speech Enhancement 12 Recover lost information/add enhancing details by learning the natural distribution of audio samples. original enhanced
  • 13. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 14. 14 Discriminative vs Generative Models Philip Isola, Generative Models of Images. MIT 2023.
  • 15. Outline 1. Motivation 2. Discriminative vs Generative Models a. Pθ (Y|X): Discriminative Models b. Pθ (X): Generative Models c. Pθ (X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 16. Pθ (Y|X): Discriminative Models 16 Slide credit: Albert Pumarola (UPC 2019) Classification Regression Text Prob. of being a Potential Customer Image Audio Speech Translation Jim Carrey What Language? X=Data Y=Labels θ = Model parameters Discriminative Modeling Pθ (Y|X)
  • 17. 17 0.01 0.09 0.9 input Network (θ) output class Figure credit: Javier Ruiz (UPC TelecomBCN) Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’ inputs. Pθ (Y | X = [pixel1 , pixel2 , …, pixel784 ]) Pθ (Y|X): Discriminative Models
  • 18. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 19. 19 Slide Concept: Albert Pumarola (UPC 2019) Pθ (X): Generative Models Classification Regression Generative Text Prob. of being a Potential Customer “What about Ron magic?” offered Ron. To Harry, Ron was loud, slow and soft bird. Harry did not like to think about birds. Image Audio Language Translation Music Composer and Interpreter MuseNet Sample Jim Carrey What Language? Discriminative Modeling Pθ (Y|X) Generative Modeling Pθ (X) X=Data Y=Labels θ = Model parameters
  • 20. Each real sample xi comes from an M-dimensional probability distribution P(X). X = {x1 , x2 , …, xN } Pθ (X): Generative Models
  • 21. 21 1) We want our model with parameters θ to output samples with distribution Pθ (X), matching the distribution of our training data P(X). 2) We can sample points from Pθ (X) plausibly looking how P(X) distributed. P(X) Distribution of training data Pλ,μ,σ (X) Distribution of training data Example: Gaussian Mixture Models (GMM) Pθ (X): Generative Models
  • 22. 22 What are the parameters θ we need to estimate in deep neural networks ? θ = (weights & biases) output Network (θ) ? Pθ (X): Generative Models
  • 23. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 24. Pθ (X|Y): Conditioned Generative Models Joint probabilities P(X|Y) to model conditioning variables on the generative process: X = {x1 , x2 , …, xN } Y = {y1 , y2 , …, yN } DOG CAT TRUCK PIZZA THRILLER SCI-FI HISTORY /aa/ /e/ /o/
  • 25. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. Generative Adversarial Networks (GANs) b. Auto-regressive c. Variational Autoencoders (VAEs) d. Diffusion
  • 26. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 26 Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow) Sampling
  • 27. Philip Isola, Generative Models of Images. MIT 2023. Sampling
  • 28. Slide concept: Albert Pumarola (UPC 2019) Learn Sample Out Training Dataset Generated Samples Feature space Manifold Pθ (X) “Model the data distribution so that we can sample new points out of the distribution” Sampling
  • 29. Sampling z Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ)
  • 30. Sampling Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ) Sample z from a known prior, for example, a multivariate normal distribution N(0, I). Example: dim(z)=2 x’ z
  • 31. Slide concept: Albert Pumarola (UPC 2019) Learn Training Dataset Interpolated Samples Feature space Manifold Pθ (X) Traversing the learned manifold through interpolation. Interpolation
  • 32. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 33. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 34. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 35. 35 Credit: Santiago Pascual [slides] [video]
  • 36. 36 Generator & Discriminator We have two modules: Generator (G) and Discriminator (D). ● They “fight” against each other during training→ Adversarial Learning D’s goal: Classify between real samples and those produced by G. G’s goal: Fool D to missclassify. Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
  • 37. 37 Discriminator Discriminator network D → binary classifier between real (x) and generated (x’). samples. Generated (1) Discriminator (θ) x’ Discriminator (θ) x Real (0)
  • 39. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 40. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: It’s not even green Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 41. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: There is no watermark Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 42. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: Watermark should be rounded Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 43. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. After enough iterations, and if the counterfeiter is good enough (in terms of G network it means “has enough parameters”), the police should be confused. REAL? FAKE? Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 44. Adversarial Training Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated Alternate between training the discriminator and generator Neural Network Neural Network Figure: Kevin McGuinness (DCU)
  • 45. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated 1. Fix generator weights, draw samples from both real world and generated images 2. Train discriminator to distinguish between real world and generated images Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU)
  • 46. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 47. Adversarial Training: Generator 1. Fix discriminator weights 2. Sample from generator by injecting noise. 3. Backprop error through discriminator to update generator weights Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) Generated
  • 48. Adversarial Training: Generator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the generator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 49. Adversarial Training: How to make it work ? Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016. NeurIPS Barcelona 2016
  • 50. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive
  • 51. Non-Conditional GANs 51 Slide credit: Víctor Garcia Discriminator D(·) Generator G(·) Real World Random seed (z) Real/Generated
  • 52. 52 Conditional GANs (cGAN) Slide credit: Víctor Garcia Conditional Adversarial Networks Real World Real/Generated Condition Discriminator D(·) Generator G(·)
  • 53. 53 Learn more about GANs Ian Goodfellow. NeurIPS Barcelona 2016. Mihaela Rosca & Jeff Donahue. UCL x Deepmind 2020.
  • 54. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 55. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 56. Manifold Pθ (X) Encode Decode “Generate” 56 Auto-Encoder (AE) z Feature space ● Learns Pθ (X) with a reconstruction loss. ● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
  • 57. 57 Auto-Encoder (AE) Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ? ?
  • 58. 58 Auto-Encoder (AE) No, because the noise (or encoded noise) would be out of the learned manifold. Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ?
  • 59. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 60. 60 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal distribution. Encode Encode Loss term to follow a normal distribution N(0, I).
  • 61. 61 Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145 Maths 101: Multivariate normal distribution
  • 62. 62 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑). Encode z Decode Reconstruction loss term.
  • 63. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 64. z Encode Decode Challenge: We cannot backprop through sampling of because “Sampling” is not differentiable! 64 Reparametrization Trick
  • 65. z Solution: Reparameterization trick Sample and define z from it, multiplying by and summing 65 Reparametrization Trick
  • 66. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive
  • 67. Generative behaviour z 67 How can we now generate new samples once the underlying generating distribution is learned ?
  • 68. z1 We can sample from our prior N(0,I), discarding the encoder path. z2 z3 68 Generative behaviour
  • 69. 69 Generative behaviour N(0, I) Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a powerful non-linear function g(z).
  • 70. 70 Generative behaviour #NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
  • 71. 71 Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identities, lightning, etc.. Generative behaviour
  • 72. Learn more about VAEs 72 Andriy Mnih (UCL - Deepmind 2020) Max Welling - University of Amsterdam (2020)
  • 73. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 74. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 75. Forward Diffusion Process Philip Isola, Generative Models of Images. MIT 2023.
  • 76. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 77. Denoising Autoencoder (DAE) Encode Decode “Generate” #DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust features with denoising autoencoders." ICML 2008.
  • 78. Philip Isola, Generative Models of Images. MIT 2023. Reverse Denoising process
  • 79. Data Manifold Pθ (x0 ) x0 xT Noise Image Network learns to denoise step by step CNN U-net Reverse Denoising process What is the dimension of the latent variable in diffusion models ? Same dimensionality as the diffused data.
  • 80. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 81. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 83. PixelRNN An RNN predicts the probability of each sample xi with a categorical output distribution: Softmax 83 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
  • 84. PixelRNN 84 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016. Why are not all completions identical ? (aka how can AR offer a generative behaviour ?)
  • 85. PixelCNN 85 #PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with pixelcnn decoders. NeurIPS 2016.
  • 86. Wavenet 86 Wavenet used dilated convolutions to produce synthetic audio, sample by sample, conditioned over by receptive field of size T: #Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
  • 87. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018) #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. Auto-regressive (at test).
  • 88. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 89. Text completion #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Condition Generated completions In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
  • 90. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. GPT-2/3 can also solve tasks for which it was not trained for (zero-shot learning). Text Reading Comprehension The 2008 Summer Olympics torch relay was run from March 24 until August 8, 2008, prior to the 2008 Summer Olympics, with the theme of “one world, one dream”. Plans for the relay were announced on April 26, 2007, in Beijing, China. The relay, also called by the organizers as the “Journey of Harmony”, lasted 129 days and carried the torch 137,000 km (85,000 mi) – the longest distance of any Olympic torch relay since the tradition was started ahead of the 1936 Summer Olympics. After being lit at the birthplace of the Olympic Games in Olympia, Greece on March 24, the torch traveled to the Panathinaiko Stadium in Athens, and then to Beijing, arriving on March 31. From Beijing, the torch was following a route passing through six continents. The torch has visited cities along the Silk Road, symbolizing ancient links between China and the rest of the world. The relay also included an ascent with the flame to the top of Mount Everest on the border of Nepal and Tibet, China from the Chinese side, which was closed specially for the event. Q: What was the theme? A: “one world, one dream”. Q: What was the length of the race? A: 137,000 km Q: Was it larger than previous ones? A: No Q: Where did the race begin? A: Olympia, Greece
  • 91. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Zero-shot task performances (GPT-2 was never trained for these tasks)
  • 92. #iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML 2020. GPT-2 / GPT-3
  • 93. #ChatGPT [blog] #GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog] ChatGPT / GPT-4
  • 95. Learn more about AR models Nal Kalchbrenner, Mediterranean Machine Learning Summer School 2022.
  • 96. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 98. Recommended books Interview of David Foster for Machine Learning Street Talk (2023)
  • 99. Recommended courses Deep Unsupervised Learning (UC Berkeley CS294-158-SP2020)