What can a statistician expect from GANs?

GANs from a statistical point of view
Maxime Sangnier
International workshop Machine Learning & Artiﬁcial Intelligence
September 17, 2018
Sorbonne Université, CNRS, LPSM, LIP6, Paris, France
Joint work with Gérard Biau1
, Benoît Cadre2
and Ugo Tanielian1,3
1
Sorbonne Université, CNRS, LPSM, Paris, France
2
ENS Rennes, Univ Rennes, CNRS, IRMAR, Rennes, France
3
Criteo, Paris, France

Contributors
Gérard Biau (Sorbonne Université) Benoît Cadre (ENS Rennes)
Ugo Tanielian (Sorbonne Université & Criteo) 1

Motivation
Generative models aim at generating artiﬁcial contents.
• Images:
• merchandising;
• painting;
• art;
• super-resolution and denoising;
• text to image.
• Movies:
• pose to movie;
• Audio:
• speech synthesis ;
• music.
2

Painting
Interactive GAN.1
1
J.-Y. Zhu et al. “Generative Visual Manipulation on the Natural Image Manifold”. In: European
Conference on Computer Vision. 2016.
5

Superresolution
SuperResolution GAN.2
2
C. Ledig et al. “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial
Network”. In: arXiv:1609.04802 [cs, stat] (2016).
6

Text-to-image
Stacked GAN.3
3
H. Zhang et al. “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative
Adversarial Networks”. In: arXiv:1612.03242 [cs, stat] (2016).
7

Movies
Everybody Dance Now.4
4
C. Chan et al. “Everybody Dance Now”. In: arXiv:1808.07371 [cs] (2018).
8

Speech synthesis
WaveNet by DeepMind.
9

Motivation
• Outstanding image generation and extrapolation5
.
• And even more I’m not aware of. . .
5
T. Karras et al. “Progressive Growing of GANs for Improved Quality, Stability, and Variation”. In:
International Conference on Learning Representations. 2018.
10

Motivation
• Outstanding image generation and extrapolation5
.
• And even more I’m not aware of. . .
Generative models are used for:
• exploring unseen realities;
• providing many answers to a single question.
5
Karras et al., “Progressive Growing of GANs for Improved Quality, Stability, and Variation”.
10

Generate from data
X1, . . . , Xn i.i.d. according to an unknown density p on E ⊆ Rd
.
How to sample according to p ?
11

Generate from data
.
Naive approach
1. estimate p by ˆp;
2. sample according to ˆp.
11

Generate from data
.
Naive approach
1. estimate p by ˆp;
2. sample according to ˆp.
Drawbacks
• both problems are difficult in themselves;
• we cannot define a realistic parametric statistical model;
• non-parametric density estimation inefficient in high dimension;
• this approach violates Vapnik’s principle:
When solving a problem of interest, do not solve a more general
problem as an intermediate step.
11

Some generative methods
METHOD DENSITY-
FREE
FLEXIBILITY SIMPLE
SAMPLING
Autoregressive models (WaveNet6
)
Nonlinear independent components analy-
sis (Real NVP7
)
Variational autoencoders8
Boltzmann machines9
Generative stochastic networks10
Generative adversarial networks
6
A.v.d. Oord et al. “WaveNet: A Generative Model for Raw Audio”. In: arXiv:1609.03499 [cs]
(2016).
7
L. Dinh, J. Sohl-Dickstein, and S. Bengio. “Density estimation using Real NVP”. . In:
arXiv:1605.08803 [cs, stat] (2016).
8
D.P. Kingma and M. Welling. “Auto-Encoding Variational Bayes”. In: International Conference on
Learning Representations. 2013.
9
S.E. Fahlman, G.E. Hinton, and T.J. Sejnowski. “Massively Parallel Architectures for AI: Netl,
Thistle, and Boltzmann Machines”. In: Proceedings of the Third AAAI Conference on Artiﬁcial
Intelligence. 1983.
10
Y. Bengio et al. “Deep Generative Stochastic Networks Trainable by Backprop”. In: International
Conference on Machine Learning. 2014. 12

A direct approach
Cornerstone: don’t estimate p .
11
I. Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Information Processing
Systems. 2014.
13

A direct approach
General procedure:
• sample U1, . . . , Un i.i.d. thanks to a
parametric model;
• compare X1, . . . , Xn and U1, . . . , Un
and update the model.
11
Goodfellow et al., “Generative Adversarial Nets”.
13

A direct approach
General procedure:
• sample U1, . . . , Un i.i.d. thanks to a
parametric model;
• compare X1, . . . , Xn and U1, . . . , Un
and update the model.
GANs11
follow this principle.
11
13

Generating a random sample
Inverse transform sampling
• S: scalar random variable;
• FS: cumulative distribution function of S;
• Z ∼ U([0, 1]).
• F−1
S (Z)
d
= S.
14

Inverse transform sampling
• S: scalar random variable;
• FS: cumulative distribution function of S;
• Z ∼ U([0, 1]).
• F−1
S (Z)
d
= S.
Generators
• X1, . . . , Xn i.i.d. according to a density p on E ⊆ Rd
, dominated by a
known measure µ.
• G = {Gθ : Rd
→ E}θ∈Θ, Θ ⊂ Rp
: parametric family of generators
(d d);
• Z1, . . . , Zn random vectors from Rd
(typically U([0, 1]d
));
• Ui = Gθ(Zi ): generated sample;
• P = {pθ}θ∈Θ: associated family of densities with by deﬁnition
Gθ(Z1)
d
= pθdµ.
14

Remarks
• Each pθ is a candidate to represent p .
15

Remarks
• The statistical model P = {pθ}θ∈Θ is just a mathematical tool for the
analysis.
15

Remarks
analysis.
• It is not assumed that p belongs to P.
15

Remarks
analysis.
• It is not assumed that p belongs to P.
• In GANs: Gθ is a neural network with p weights, stored in θ ∈ Rp
.
15

Comparing two samples
The next step
• The procedure should drive θ such that Gθ(Z1)
d
= X1.
• Need to confront Gθ(Z1), . . . , Gθ(Zn) to X1, . . . , Xn in order to update θ.
16

Comparing two samples
The next step
• The procedure should drive θ such that Gθ(Z1)
d
= X1.
• Need to confront Gθ(Z1), . . . , Gθ(Zn) to X1, . . . , Xn in order to update θ.
Supervised learning
• Both samples have same distribution as soon as we cannot distinguish
them.
• This is a classiﬁcation problem:
Class Y = 0 Class Y = 1
Gθ(Z1), . . . , Gθ(Zn) X1, . . . , Xn
16

Adversarial principle
Discriminator
• D a family of functions from E to [0, 1]: the discriminators.
17

Discriminator
• Choose D ∈ D such that for any x ∈ E,
D(x) ≥ 1/2 =⇒ true observation (1)
D(x) < 1/2 =⇒ fake (generated) point. (2)
17

Discriminator
• Assume {(X1, 1), . . . , (Xn, 1), (Gθ(Z1), 0), . . . , (Gθ(Zn), 0)} i.i.d. with
same distribution as (X, Y).
17

Discriminator
• Classiﬁcation model: Y|X = x ∼ B(D(x)), i.e. P(Y = 1|X = x) = D(x).
17

Discriminator
• Classiﬁcation model: Y|X = x ∼ B(D(x)), i.e. P(Y = 1|X = x) = D(x).
• Maximum (conditional) likelihood estimation:
sup
D∈D
n
i=1
D(Xi ) ×
n
i=1
(1 − D(Gθ(Zi ))) or sup
D∈D
ˆL(θ, D),
with
ˆL(θ, D) =
1
n
n
i=1
ln(D(Xi )) +
n
i=1
ln(1 − D(Gθ(Zi ))) .
17

Generator
• supD∈D
ˆL(θ, D) acts like a divergence between the distributions of
Gθ(Z1), . . . , Gθ(Zn) and X1, . . . , Xn.
18

Generator
• supD∈D
Gθ(Z1), . . . , Gθ(Zn) and X1, . . . , Xn.
• Minimum divergence estimation:
inf
θ∈Θ
sup
D∈D
ˆL(θ, D) .
or
inf
θ∈Θ
sup
D∈D
n
i=1
ln(D(Xi )) +
n
i=1
ln(1 − D(Gθ(Zi ))).
18

Generator
• supD∈D
Gθ(Z1), . . . , Gθ(Zn) and X1, . . . , Xn.
• Minimum divergence estimation:
inf
θ∈Θ
sup
D∈D
ˆL(θ, D) .
or
inf
θ∈Θ
sup
D∈D
n
i=1
ln(D(Xi )) +
n
i=1
ln(1 − D(Gθ(Zi ))).
• Adversarial, minimax or zero-sum game.
18

The GAN Zoo
Avinash Hindupur’s Github. 19

The GAN Zoo
Curbing the discriminator
• least squares12
:
inf
D∈D
n
i=1
(D(Xi ) − 1)2
+
n
i=1
D(Gθ(Zi ))2
, inf
θ∈Θ
n
i=1
(D(Gθ(Zi )) − 1)2
.
• asymmetric hinge13
:
inf
D∈D
−
n
i=1
D(Xi ) +
n
i=1
max (0, 1 − D(Gθ(Zi ))) , inf
θ∈Θ
−
n
i=1
D(Gθ(Zi )).
12
X. Mao et al. “Least Squares Generative Adversarial Networks”. In: IEEE International
Conference on Computer Vision. 2017.
13
J. Zhao, M. Mathieu, and Y. LeCun. “Energy-based Generative Adversarial Network”. In:
International Conference on Learning Representations. 2017.
20

The GAN Zoo
Metrics as minimax games
• Maximum mean discrepancy14
and Wasserstein15
:
inf
θ∈Θ
sup
T∈T
Tp dµ − Tpθdµ.
• f-divergences16
:
inf
θ∈Θ
sup
T∈T
Tp dµ − (f ◦ T)pθdµ.
With T being a prescribed class of functions and f the convex conjugate of
a lower-semicontinuous function f.
14
G.K. Dziugaite, D.M. Roy, and Z. Ghahramani. “Training generative neural networks via Maximum
Mean Discrepancy optimization”. In: Proceedings of the Thirty-First Conference on Uncertainty in
Artiﬁcial Intelligence. 2015; Y. Li, K. Swersky, and R. Zemel. “Generative Moment Matching
Networks”. In: International Conference on Machine Learning. 2015.
15
M. Arjovsky, S. Chintala, and L. Bottou. “Wasserstein Generative Adversarial Networks”. In:
International Conference on Machine Learning. 2017.
16
S. Nowozin, B. Cseke, and R. Tomioka. “f-GAN: Training Generative Neural Samplers using
Variational Divergence Minimization”. In: Neural Information Processing Systems. June 2016.
21

Roadmap
• Minimum divergence estimation: uniqueness of minimizers.
• Approximation properties: importance of the family of discriminators on
the quality of the approximation
• Statistical analysis: consistency and rate of convergence.
22

Kullback-Leibler and Jensen divergences
Kullback-Leibler
• For P Q probability measures on E:
DKL(P Q) = ln
dP
dQ
dP.
• Properties:
DKL(P Q) ≥ 0 DKL(P Q) = 0 ⇐⇒ P = Q.
• If p = dP
dµ
and q = dQ
dµ
:
DKL(P Q) = p ln
p
q
dµ.
• DKL is not symmetric and
deﬁned only for P Q.
23

Kullback-Leibler and Jensen divergences
Jensen-Shannon
• For P and Q probability measures on E:
DJS(P, Q) =
1
2
DKL P
P + Q
2
+
1
2
DKL Q
P + Q
2
.
• Property:
0 ≤ DJS(P, Q) ≤ ln 2.
• (P, Q) → DJS(P, Q) is a distance.
24

GAN and Jensen-Shannon divergence
GANs
• Empirical criteria:
ˆL(θ, D) =
1
n
n
i=1
ln(D(Xi )) +
n
i=1
ln(1 − D(Gθ(Zi ))) .
• Problem:
inf
θ∈Θ
sup
D∈D
ˆL(θ, D) .
25

GANs
• Empirical criteria:
ˆL(θ, D) =
1
n
n
i=1
ln(D(Xi )) +
n
i=1
ln(1 − D(Gθ(Zi ))) .
• Problem:
inf
θ∈Θ
sup
D∈D
ˆL(θ, D) .
Ideal GANs
• Population version of the criteria:
L(θ, D) = ln(D)p dµ + ln(1 − D)pθdµ.
• No constraint: D = D∞, set of all functions from E to [0, 1].
• Problem:
inf
θ∈Θ
sup
D∈D∞
L(θ, D) .
25

From GAN to JS divergence
• Criteria:
sup
D∈D∞
L(θ, D) = sup
D∈D∞
[ln(D)p + ln(1 − D)pθ] dµ
≤ sup
D∈D∞
[ln(D)p + ln(1 − D)pθ] dµ.
26

• Criteria:
sup
D∈D∞
L(θ, D) = sup
D∈D∞
≤ sup
D∈D∞
• Optimal discriminator:
Dθ =
p
p + pθ
,
with convention 0/0 = 0.
26

• Criteria:
sup
D∈D∞
L(θ, D) = sup
D∈D∞
≤ sup
D∈D∞
Dθ =
p
p + pθ
,
• Optimal criteria:
sup
D∈D∞
L(θ, D) = L(θ, Dθ ) = 2DJS(p , pθ) − ln 4.
26

• Criteria:
sup
D∈D∞
L(θ, D) = sup
D∈D∞
≤ sup
D∈D∞
Dθ =
p
p + pθ
,
• Optimal criteria:
sup
D∈D∞
L(θ, D) = L(θ, Dθ ) = 2DJS(p , pθ) − ln 4.
• Problem:
inf
θ∈Θ
sup
D∈D∞
L(θ, D) = inf
θ∈Θ
L(θ, Dθ ) = 2 inf
θ∈Θ
DJS(p , pθ) − ln 4.
26

The quest for Dθ
Numerical approach
• Big n, big D: try to approximate Dθ with arg maxD∈D
ˆL(θ, D).
• Close to divergence minimization: supD∈D
ˆL(θ, D) ≈ 2DJS(p , pθ) − ln 4.
17
27

The quest for Dθ
Numerical approach
• Big n, big D: try to approximate Dθ with arg maxD∈D
ˆL(θ, D).
• Close to divergence minimization: supD∈D
ˆL(θ, D) ≈ 2DJS(p , pθ) − ln 4.
Theorem
Let θ ∈ Θ and Aθ = {p = pθ = 0}.
If µ(Aθ) = 0, then
{Dθ } = arg maxD∈D∞
L(θ, D).
If µ(Aθ) > 0, then Dθ is unique only on EAθ.
Completes Proposition 1 in17
.
17
27

Oracle parameter
• Oracle parameter regarding the Jensen-Shannon divergence:
θ ∈ arg minθ∈Θ L(θ, Dθ ) = arg minθ∈Θ DJS(p , pθ).
• Gθ is the ideal generator.
• If p ∈ P,
p = pθ DJS(p , pθ ) = 0 Dθ =
1
2
.
• What if p /∈ P? Existence and uniqueness of θ ?
28

Oracle parameter
• Oracle parameter regarding the Jensen-Shannon divergence:
θ ∈ arg minθ∈Θ L(θ, Dθ ) = arg minθ∈Θ DJS(p , pθ).
• Gθ is the ideal generator.
• If p ∈ P,
p = pθ DJS(p , pθ ) = 0 Dθ =
1
2
.
• What if p /∈ P? Existence and uniqueness of θ ?
Theorem
Assume that P is a convex and compact set for the JS distance.
If p > 0 µ-almost everywhere, then there exists ¯p ∈ P such that
{¯p} = arg minp∈P DJS(p , p).
In addition, if the model P is identiﬁable, then there exists θ ∈ Θ such
mathematical
{θ } = arg minθ∈Θ L(θ, Dθ ).
28

Oracle parameter
Existence and uniqueness
• Compactness of P and continuity of DJS(p , ·).
• p > 0 µ-a.e. enables strict convexity of DJS(p , ·).
29

Oracle parameter
Compactness of P with respect to the JS distance
1. Θ compact and P convex.
2. For all x ∈ E, θ ∈ Θ → pθ(x) is continuous.
3. sup(θ,θ )∈Θ2 |pθ ln pθ | ∈ L1
(µ).
29

Oracle parameter
Compactness of P with respect to the JS distance
1. Θ compact and P convex.
2. For all x ∈ E, θ ∈ Θ → pθ(x) is continuous.
3. sup(θ,θ )∈Θ2 |pθ ln pθ | ∈ L1
(µ).
Identifiability
High-dimensional parametric setting often misspecified =⇒ identifiability
not satisfied.
29

From JS divergence to likelihood
GAN = JS divergence
• GANs don’t minimize the Jensen-Shannon divergence.
• Considering supD∈D∞
L(θ, D) means knowing Dθ = p
p +pθ
, thus knowing
p .
30

From JS divergence to likelihood
GAN = JS divergence
• GANs don’t minimize the Jensen-Shannon divergence.
• Considering supD∈D∞
L(θ, D) means knowing Dθ = p
p +pθ
, thus knowing
p .
Parametrized discriminators
• D = {Dα}α∈Λ, Λ ⊂ Rq
: parametric family of discriminators.
• Likelihood-type problem with two parametric families:
inf
θ∈Θ
sup
α∈Λ
L(θ, Dα).
• Likelihood parameter:
¯θ ∈ arg minθ∈Θ sup
α∈Λ
L(θ, Dα).
• How close the best candidate p¯θ is to the ideal density pθ ?
• How does it depend on the capability of D to approximate Dθ ?
30

Approximation result
(Hε) There exist ε > 0, m ∈ (0, 1/2) and D ∈ D ∩ L2
(µ) such that
m ≤ D ≤ 1 − m and D − D¯θ 2 ≤ ε.
31

(µ) such that
m ≤ D ≤ 1 − m and D − D¯θ 2 ≤ ε.
Theorem
Assume that, for some M > 0, p ≤ M and p¯θ ≤ M.
Then, under Assumption (Hε) with ε < 1/(2M), there exists a constant
c1 > 0 (depending only upon m and M) such that
DJS(p , p¯θ) − min
θ∈Θ
DJS(p , pθ) ≤ c1ε2
.
31

(µ) such that
m ≤ D ≤ 1 − m and D − D¯θ 2 ≤ ε.
Theorem
Assume that, for some M > 0, p ≤ M and p¯θ ≤ M.
Then, under Assumption (Hε) with ε < 1/(2M), there exists a constant
c1 > 0 (depending only upon m and M) such that
DJS(p , p¯θ) − min
θ∈Θ
.
Remarks
As soon as the class D becomes richer:
• minimizing supα∈Λ L(θ, Dα) over Θ helps minimizing DJS(p , pθ).
• since under some assumptions {pθ } = arg minpθ:θ∈Θ DJS(p , pθ), p¯θ
comes closer to pθ . 31

The estimation problem
Estimator
ˆθ ∈ arg minθ∈Θ sup
α∈Λ
ˆL(θ, α),
where
ˆL(θ, α) =
1
n
n
i=1
ln(Dα(Xi )) +
n
i=1
ln(1 − Dα(Gθ(Zi ))) .
32

Estimator
α∈Λ
ˆL(θ, α),
where
ˆL(θ, α) =
1
n
n
i=1
ln(Dα(Xi )) +
n
i=1
ln(1 − Dα(Gθ(Zi ))) .
(Hreg) Regularity conditions of order 1 on the models (Gθ, pθ and Dα).
Existence
Under (Hreg), ˆθ exists (and so for ¯θ).
32

Estimator
α∈Λ
ˆL(θ, α),
where
ˆL(θ, α) =
1
n
n
i=1
ln(Dα(Xi )) +
n
i=1
ln(1 − Dα(Gθ(Zi ))) .
Existence
Under (Hreg), ˆθ exists (and so for ¯θ).
Questions
• How far DJS(p , pˆθ) is from minθ∈Θ DJS(p , pθ) = DJS(p , pθ )?
• Does ˆθ converge towards ¯θ as n → ∞?
• What is the asymptotic distribution of ˆθ − ¯θ?
32

Non-asymptotic bound on the JS divergence
(Hε) There exist ε > 0, m ∈ (0, 1/2) such that for all θ ∈ Θ, there exists
D ∈ D with m ≤ D ≤ 1 − m and D − Dθ 2 ≤ ε.
33

Theorem
Assume that, for some M > 0, p ≤ M and pθ ≤ M for all θ ∈ Θ.
Then, under Assumptions (Hreg) and (Hε) with ε < 1/(2M), there exist two
constants c1 > 0 (depending only upon m and M) and c2 such that
E DJS(p , pˆθ) − min
θ∈Θ
+ c2
1
√
n
.
33

Theorem
Assume that, for some M > 0, p ≤ M and pθ ≤ M for all θ ∈ Θ.
Then, under Assumptions (Hreg) and (Hε) with ε < 1/(2M), there exist two
constants c1 > 0 (depending only upon m and M) and c2 such that
E DJS(p , pˆθ) − min
θ∈Θ
+ c2
1
√
n
.
Remarks
• Under (Hreg), {ˆL(θ, α) − L(θ, α)}θ∈Θ,α∈Λ is a subgaussian process for
· /
√
n.
• Dudley’s inequality: E supθ∈Θ,α∈Λ |ˆL(θ, α) − L(θ, α)| = O 1√
n
.
• c2 scales as p + q =⇒ loose bound in the usual over-parametrized
regime (LSUN, FACES:
√
n ≈ 1000 p + q ≈ 1500000).
33

Illustration
Setting
• p (x) = e−x/s
s(1+e−x/s)2 , x ∈ R : logistic density (x ∈ R).
• Gθ and Dα are two fully connected neural networks.
• Z ∼ U([0, 1]): scalar noise.
• n = 100000 (1/
√
n is negligible) and 30 replications.
34

Illustration
Setting
• Generator depth: 3.
• Discriminator depth: 2 then 5.
35

Convergence of ˆθ
Existence
Under (Hreg), ¯θ and ¯α ∈ arg minα∈Λ L(¯θ, α) exist.
36

Convergence of ˆθ
Existence
(H1) The pair (¯θ, ¯α) is unique and belongs to int(Θ) × int(Λ).
Theorem
Under Assumptions (Hreg) and (H1),
ˆθ
a.s.
→ ¯θ and ˆα
a.s.
→ ¯α.
36

Convergence of ˆθ
Existence
(H1) The pair (¯θ, ¯α) is unique and belongs to int(Θ) × int(Λ).
Theorem
Under Assumptions (Hreg) and (H1),
ˆθ
a.s.
→ ¯θ and ˆα
a.s.
→ ¯α.
Remarks
• Convergence of ˆθ comes from supθ∈Θ,α∈Λ |ˆL(θ, α) − L(θ, α)|
a.s.
→ 0.
• It does not need uniqueness of ¯α.
• Convergence of ˆα comes from that of ˆθ.
36

Illustration
Setting
• Three models:
1. Laplace: p (x) = 1
3
e−
2|x|
3 vs pθ(x) = 1√
2πθ
e
− x2
2θ2 .
2. Claw: p (x) = pclaw(x) vs pθ(x) = 1√
2πθ
e
− x2
2θ2 .
3. Exponential: p (x) = e−x 1R+ vs pθ(x) = 1
θ
1[0,θ](x).
• Gθ: generalized inverse of the cdf of pθ.
• Z ∼ U([0, 1]): scalar noise.
• Dα =
pα1
pα1
+pα0
.
• n = 10 to 10000 and 200 replications.
37

Illustration
Claw vs Gaussian Exponential vs Uniform
38

Central limit theorem
(Hloc) Local smoothness conditions around (¯θ, ¯α) (such that Hessians are
invertible).
Theorem
Under Assumptions (Hreg), (H1) and (Hloc),
√
n(ˆθ − ¯θ)
d
→ N(0, Σ).
39

Central limit theorem
(Hloc) Local smoothness conditions around (¯θ, ¯α) (such that Hessians are
invertible).
Theorem
Under Assumptions (Hreg), (H1) and (Hloc),
√
n(ˆθ − ¯θ)
d
→ N(0, Σ).
Remark
One has Σ 2 = O(p3
q4
), which suggests that ˆθ has a large dispersion
around ¯θ in the over-parametrized regime.
39

Illustration
Histograms of
√
n(ˆθ − ¯θ):
Claw vs Gaussian Exponential vs Uniform
40

Take-home message
A ﬁrst step for understanding GANs
• From data to sampling.
• The richness of the class of discriminators D controls the gap between
GANs and the JS divergence.
• The generator parameters θ are asymptotically normal with rate
√
n.
41

Take-home message
A ﬁrst step for understanding GANs
• From data to sampling.
• The richness of the class of discriminators D controls the gap between
GANs and the JS divergence.
• The generator parameters θ are asymptotically normal with rate
√
n.
Future investigations
1. Impact of the latent variable Z (dimension, distribution) and the networks
(number of layers in Gθ, dimensionality of Θ) on the performance of
GANs (currently it is assumed p µ, pθ µ =⇒ information on the
supporting manifold of p ).
2. How much assumptions (Hε) and (Hε) are satisﬁed for neural nets as
discriminators?
3. Over-parametrized regime: convergence of distributions instead of
parameters.
41

What can a statistician expect from GANs?

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

What can a statistician expect from GANs?