SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Bayesian Posterior Inference
in the
Big Data Arena
Max Welling
Anoop Korattikara
1vrijdag 4 juli 14
Outline
• Introduction
• Stochastic Variational Inference
– Variational Inference 101
– Stochastic Variational Inference
– Deep Generative Models with SVB
• MCMC with mini-batches
– MCMC 101
– MCMC using noisy gradients
– MCMC using noisy Metropolis-Hastings
– Theoretical results
2vrijdag 4 juli 14
Big Data (mine is bigger than yours)
Square	
  Kilometer	
  Array	
  (SKA)	
  produces	
  1	
  Exabyte	
  per	
  day	
  by	
  2024…	
  
(interested	
  to	
  do	
  approximate	
  inference	
  on	
  this	
  data,	
  talk	
  to	
  me)
3vrijdag 4 juli 14
Introduction
Why	
  do	
  we	
  need	
  posterior	
  inference	
  if	
  the	
  datasets	
  are	
  BIG?	
  
4vrijdag 4 juli 14
p>>N
	
  Big	
  data	
  may	
  mean	
  large	
  p,	
  small	
  n
Gene	
  expression	
  data
fMRI	
  data
5
5vrijdag 4 juli 14
Planning
Planning	
  against	
  uncertainty	
  needs	
  probabili=es	
  
6
6vrijdag 4 juli 14
Little data inside Big data
Not	
  every	
  data-­‐case	
  carries	
  informa=on	
  about	
  every	
  model	
  component	
  
New	
  user	
  with	
  no	
  raGngs
(cold	
  start	
  problem)
7
7vrijdag 4 juli 14
1943:	
  First	
  NN
(+/-­‐	
  N=10)
1988:	
  NetTalk
(+/-­‐	
  N=20K)
2009:	
  Hinton’s	
  
Deep	
  Belief	
  Net
(+/-­‐	
  N=10M)
2013:	
  Google/Y!	
  
(N=+/-­‐	
  10B)
Big	
  Models!
Models	
  grow	
  faster	
  than	
  useful	
  informa=on	
  in	
  data
8
8vrijdag 4 juli 14
Two Ingredients for Big Data Bayes
Any	
  big	
  data	
  posterior	
  inference	
  algorithm	
  should:
1. easily	
  run	
  on	
  a	
  distributed	
  architecture.
2. only	
  use	
  a	
  small	
  mini-­‐batch	
  of	
  the	
  data	
  at	
  every	
  itera=on.
	
  	
  	
  	
  	
  	
  
9vrijdag 4 juli 14
Bayesian Posterior Inference
Variational Sampling
Variational Family Q
All probability distributions
• DeterminisGc
• Biased	
  
• Local	
  minima
• Easy	
  to	
  assess	
  convergence
• StochasGc	
  (sample	
  error)
• Unbiased
• Hard	
  to	
  mix	
  between	
  modes
• Hard	
  to	
  assess	
  convergence
10vrijdag 4 juli 14
Variational Bayes
11
Hinton	
  &	
  van	
  Camp	
  (1993)
Neal	
  &	
  Hinton	
  (1999)
Saul	
  &	
  Jordan	
  (1996)
Saul,	
  Jaakkola	
  &	
  Jordan	
  (1996)
ATas	
  (1999,2000)	
  
Wiegerinck	
  (2000)
Ghahramani	
  &	
  Beal	
  (2000,2001)
Coordinate	
  descent	
  on	
  Q
P
Q
(Bishop,	
  PaYern	
  Recogni[on	
  
and	
  Machine	
  Learning)
11vrijdag 4 juli 14
Stochastic VB Hoffman,	
  Blei	
  &	
  Bach,	
  2010
Stochas=c	
  natural	
  gradient	
  descent	
  on	
  Q	
  	
  
12
• P	
  and	
  Q	
  in	
  exponenGal	
  family.
• Q	
  factorized:
• At	
  every	
  iteraGon:	
  subsample	
  n<<N	
  data-­‐cases:
• solve	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  analyGcally.
• update	
  parameter	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  using	
  stochas=c	
  natural	
  gradient	
  descent.
12vrijdag 4 juli 14
General SVB
very	
  high	
  variance
sample
13
subsample	
  X	
  
(ignoring	
  latent	
  variables	
  Z)
13vrijdag 4 juli 14
Reparameterization Trick
14
-­‐Varia[onal	
  Bayesian	
  Inference	
  with	
  Stochas[c	
  Search	
  [D.M.	
  Blei,	
  M.I.	
  Jordan	
  and	
  J.W.	
  Paisley,	
  2012]
-­‐Fixed-­‐Form	
  Varia[onal	
  Posterior	
  Approxima[on	
  through	
  Stochas[c	
  Linear	
  Regression	
  [T.	
  Salimans	
  and	
  A.	
  Knowles,	
  2013].
-­‐Black	
  Box	
  Varia[onal	
  Inference.	
  [R.	
  Ranganath,	
  S.	
  Gerrish	
  and	
  D.M.	
  Blei.	
  2013]
-­‐Stochas[c	
  Varia[onal	
  Inference	
  [M.D.	
  Hoffman,	
  D.	
  Blei,	
  C.	
  Wang	
  and	
  J.	
  Paisley,	
  2013]
-­‐Es[ma[ng	
  or	
  propaga[ng	
  gradients	
  through	
  stochas[c	
  neurons.	
  [Y.	
  Bengio,	
  2013].
-­‐Neural	
  Varia[onal	
  Inference	
  and	
  Learning	
  in	
  Belief	
  Networks.	
  [A.	
  Mnih	
  and	
  K.	
  Gregor,	
  2014]
Kingma	
  2013,	
  Bengio	
  2013,	
  Kingma	
  &	
  W.	
  2014
Other	
  solu=ons	
  to	
  solve	
  the	
  same	
  "large	
  variance	
  problem":
14vrijdag 4 juli 14
Reparameterization Trick
14
-­‐Varia[onal	
  Bayesian	
  Inference	
  with	
  Stochas[c	
  Search	
  [D.M.	
  Blei,	
  M.I.	
  Jordan	
  and	
  J.W.	
  Paisley,	
  2012]
-­‐Fixed-­‐Form	
  Varia[onal	
  Posterior	
  Approxima[on	
  through	
  Stochas[c	
  Linear	
  Regression	
  [T.	
  Salimans	
  and	
  A.	
  Knowles,	
  2013].
-­‐Black	
  Box	
  Varia[onal	
  Inference.	
  [R.	
  Ranganath,	
  S.	
  Gerrish	
  and	
  D.M.	
  Blei.	
  2013]
-­‐Stochas[c	
  Varia[onal	
  Inference	
  [M.D.	
  Hoffman,	
  D.	
  Blei,	
  C.	
  Wang	
  and	
  J.	
  Paisley,	
  2013]
-­‐Es[ma[ng	
  or	
  propaga[ng	
  gradients	
  through	
  stochas[c	
  neurons.	
  [Y.	
  Bengio,	
  2013].
-­‐Neural	
  Varia[onal	
  Inference	
  and	
  Learning	
  in	
  Belief	
  Networks.	
  [A.	
  Mnih	
  and	
  K.	
  Gregor,	
  2014]
Kingma	
  2013,	
  Bengio	
  2013,	
  Kingma	
  &	
  W.	
  2014
Other	
  solu=ons	
  to	
  solve	
  the	
  same	
  "large	
  variance	
  problem":
Talk Monday June 23, 15:20
In Track F (Deep Learning II)
14vrijdag 4 juli 14
Auto Encoding Variational Bayes
Both	
  P(X|Z)	
  and	
  Q(Z|X)	
  are	
  general	
  models	
  
(e.g.	
  deep	
  neural	
  net)
Kingma	
  &	
  W.,	
  2013,	
  Rezende	
  et	
  al	
  2014
15
The	
  Helmholtz	
  machine	
  
Wake/Sleep	
  algorithm
Dayan,	
  Hinton,	
  Neal,	
  Zemel,	
  1995
Z
X
Q(Z|X)	
  
P(X|Z)P(Z)
15vrijdag 4 juli 14
The VB Landscape
SVB SSVB
AEVBFSSVB
Stochas[c
Varia[onal	
  Bayes
Auto-­‐Encoding
Varia[onal	
  Bayes
Structured	
  Stoch.
Varia[onal	
  Bayes
Fully	
  Struc.	
  Stoch.
Varia[onal	
  Bayes (ICML	
  2015)
16vrijdag 4 juli 14
Variational Auto-Encoder
(with 2 latent variables)
17
17vrijdag 4 juli 14
Variational Auto-Encoder
(with 2 latent variables)
17
17vrijdag 4 juli 14
Face Model
18vrijdag 4 juli 14
Face Model
18vrijdag 4 juli 14
Semi-supervised Model
Z
X
Y
Q(Y,Z|X)	
  =	
  Q(Z|Y,X)Q(Y|X)
Analogies:	
  Fix	
  Z,	
  vary	
  Y,	
  sample	
  X|Z,Y
P(X,Z,Y)	
  =	
  P(X|Z,Y)P(Y)P(Z)
Kingma,	
  Rezende,	
  Mohamed,	
  Wierstra,	
  W.,	
  2014
19vrijdag 4 juli 14
REFERENCES	
  SVB:
-­‐Prac[cal	
  Varia[onal	
  Inference	
  for	
  Neural	
  Networks	
  [Alex	
  Graves,	
  2011]
-­‐Varia[onal	
  Bayesian	
  Inference	
  with	
  Stochas[c	
  Search	
  [D.M.	
  Blei,	
  M.I.	
  Jordan	
  and	
  J.W.	
  Paisley,	
  2012]
-­‐Fixed-­‐Form	
  Varia[onal	
  Posterior	
  Approxima[on	
  through	
  Stochas[c	
  Linear	
  Regression.	
  Bayesian	
  Analysis	
  [T.	
  Salimans	
  and	
  A.	
  Knowles,	
  2013].
-­‐Black	
  Box	
  Varia[onal	
  Inference.	
  [R.	
  Ranganath,	
  S.	
  Gerrish	
  and	
  D.M.	
  Blei.	
  2013]
-­‐Stochas[c	
  Varia[onal	
  Inference	
  [M.D.	
  Hoffman,	
  D.	
  Blei,	
  C.	
  Wang	
  and	
  J.	
  Paisley,	
  2013]
-­‐Stochas[c	
  Structured	
  Mean	
  Field	
  Varia[onal	
  Inference	
  [MaYhew	
  Hoffman,	
  	
  2013]
-­‐Doubly	
  Stochas/c	
  Varia/onal	
  Bayes	
  for	
  non-­‐Conjugate	
  Inference	
  [M.	
  K.	
  Titsias	
  and	
  M.	
  Lázaro-­‐Gredilla,	
  2014]
REFERENCES	
  STOCHASTIC	
  BACKPROP	
  AND	
  DEEP	
  GENERATIVE	
  MODELS
-­‐Fast	
  Gradient-­‐Based	
  Inference	
  with	
  Con[nuous	
  Latent	
  Variable	
  Models	
  in	
  Auxiliary	
  Form.	
  [D.P.	
  Kingma,	
  2013].
-­‐Es[ma[ng	
  or	
  propaga[ng	
  gradients	
  through	
  stochas[c	
  neurons.	
  [Y.	
  Bengio,	
  2013].
-­‐Auto-­‐Encoding	
  Varia[onal	
  Bayes	
  [D.P.	
  Kingma	
  and	
  M.	
  W.,	
  2013].
-­‐Semi-­‐supervised	
  Learning	
  with	
  Deep	
  Genera[ve	
  Models	
  [D.P.	
  Kingma,	
  D.J.	
  Rezende,	
  S.	
  Mohamed,	
  M.	
  W.,	
  2014]
-­‐Efficient	
  Gradient-­‐Based	
  Inference	
  through	
  Transforma/ons	
  between	
  Bayes	
  Nets	
  and	
  Neural	
  Nets	
  [D.P.	
  Kingma	
  and	
  M.	
  W.,	
  2014]
-­‐Deep	
  Genera/ve	
  Stochas/c	
  Networks	
  Trainable	
  by	
  Backprop	
  [Y.	
  Bengio,	
  E.	
  Laufer,	
  G.	
  Alain,	
  J,	
  Yosinski,	
  2014]
-­‐Stochas/c	
  Back-­‐propaga/on	
  and	
  Approximate	
  Inference	
  in	
  Deep	
  Genera/ve	
  Models	
  [D.J.	
  Rezende,	
  S.	
  Mohamed	
  and	
  D.	
  Wierstra,	
  2014]
-­‐Deep	
  AutoRegressive	
  Networks	
  [K.	
  Gregor,	
  A.	
  Mnih	
  and	
  D.	
  Wierstra,	
  2014].
-­‐Neural	
  Varia/onal	
  Inference	
  and	
  Learning	
  in	
  Belief	
  Networks.	
  [A.	
  Mnih	
  and	
  K.	
  Gregor,	
  2014].
References: Lots of action at ICML 2014!
20vrijdag 4 juli 14
Sampling 101 – Why MCMC?
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
• Make steps by perturbing previous sample
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
• Make steps by perturbing previous sample
• Probability of visiting a state is equal to P(θ|X)
21vrijdag 4 juli 14
Sampling 101 – What is MCMC?
.	
  .	
  .	
  
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
.	
  .	
  .	
  
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
.	
  .	
  .	
   .	
  .	
  .	
  
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
Auto correlation time
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
Auto correlation time
High	
  τ	
   Low	
  	
  τ
22vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Propose
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
Is the new state
more probable?
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
Is it easy to come back
to the current state?
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
O
(N
)
For Bayesian Posterior Inference,
1) Burn-in is unnecessarily slow.
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
O
(N
)
For Bayesian Posterior Inference,
2) is too high.
1) Burn-in is unnecessarily slow.
23vrijdag 4 juli 14
Approximate MCMC
Low
Variance
( Fast )
High Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
24vrijdag 4 juli 14
Approximate MCMC
Low
Variance
( Fast )
High Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
24vrijdag 4 juli 14
Approximate MCMC
Low
Variance
( Fast )
High
Variance
( Slow )
High Bias Low Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
x
x
x
Decreasing ϵ
24vrijdag 4 juli 14
Minimizing Risk
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
Given finite sampling
time, ϵ=0 is not the
optimal setting.
25vrijdag 4 juli 14
Designing fast MCMC samplers
Propose Accept/Reject
O(N)
26vrijdag 4 juli 14
Designing fast MCMC samplers
Propose Accept/Reject
O(N)
Method 1
Develop an approximate
accept/reject test that uses
only a fraction of the data
26vrijdag 4 juli 14
Designing fast MCMC samplers
Method 2
Develop a proposal with
acceptance probability ≈ 1
and avoid the expensive
accept/reject test
Propose Accept/Reject
O(N)
Method 1
Develop an approximate
accept/reject test that uses
only a fraction of the data
26vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
θt+1 is then accepted /rejected using a Metropolis-Hastings test
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
θt+1 is then accepted /rejected using a Metropolis-Hastings test
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
Avoid expensive Metropolis-Hastings test by keeping ε small
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
Avoid expensive Metropolis-Hastings test by keeping ε small
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
SGLD & Optimization
OptimizationLarge ε
28
28vrijdag 4 juli 14
SGLD & Optimization
Optimization
Small ε
29
29vrijdag 4 juli 14
The SGLD Knob
Burn-in Biased Exact
Decrease ϵ over time
Low
Variance
( Fast )
High
Variance
( Slow )
High Bias Low Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
x
x
x
30vrijdag 4 juli 14
Demo: SGLD
31
31vrijdag 4 juli 14
Demo: SGLD
31
31vrijdag 4 juli 14

Contenu connexe

Dernier

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 

Dernier (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Deep generative learning_icml_part1

  • 1. Bayesian Posterior Inference in the Big Data Arena Max Welling Anoop Korattikara 1vrijdag 4 juli 14
  • 2. Outline • Introduction • Stochastic Variational Inference – Variational Inference 101 – Stochastic Variational Inference – Deep Generative Models with SVB • MCMC with mini-batches – MCMC 101 – MCMC using noisy gradients – MCMC using noisy Metropolis-Hastings – Theoretical results 2vrijdag 4 juli 14
  • 3. Big Data (mine is bigger than yours) Square  Kilometer  Array  (SKA)  produces  1  Exabyte  per  day  by  2024…   (interested  to  do  approximate  inference  on  this  data,  talk  to  me) 3vrijdag 4 juli 14
  • 4. Introduction Why  do  we  need  posterior  inference  if  the  datasets  are  BIG?   4vrijdag 4 juli 14
  • 5. p>>N  Big  data  may  mean  large  p,  small  n Gene  expression  data fMRI  data 5 5vrijdag 4 juli 14
  • 6. Planning Planning  against  uncertainty  needs  probabili=es   6 6vrijdag 4 juli 14
  • 7. Little data inside Big data Not  every  data-­‐case  carries  informa=on  about  every  model  component   New  user  with  no  raGngs (cold  start  problem) 7 7vrijdag 4 juli 14
  • 8. 1943:  First  NN (+/-­‐  N=10) 1988:  NetTalk (+/-­‐  N=20K) 2009:  Hinton’s   Deep  Belief  Net (+/-­‐  N=10M) 2013:  Google/Y!   (N=+/-­‐  10B) Big  Models! Models  grow  faster  than  useful  informa=on  in  data 8 8vrijdag 4 juli 14
  • 9. Two Ingredients for Big Data Bayes Any  big  data  posterior  inference  algorithm  should: 1. easily  run  on  a  distributed  architecture. 2. only  use  a  small  mini-­‐batch  of  the  data  at  every  itera=on.             9vrijdag 4 juli 14
  • 10. Bayesian Posterior Inference Variational Sampling Variational Family Q All probability distributions • DeterminisGc • Biased   • Local  minima • Easy  to  assess  convergence • StochasGc  (sample  error) • Unbiased • Hard  to  mix  between  modes • Hard  to  assess  convergence 10vrijdag 4 juli 14
  • 11. Variational Bayes 11 Hinton  &  van  Camp  (1993) Neal  &  Hinton  (1999) Saul  &  Jordan  (1996) Saul,  Jaakkola  &  Jordan  (1996) ATas  (1999,2000)   Wiegerinck  (2000) Ghahramani  &  Beal  (2000,2001) Coordinate  descent  on  Q P Q (Bishop,  PaYern  Recogni[on   and  Machine  Learning) 11vrijdag 4 juli 14
  • 12. Stochastic VB Hoffman,  Blei  &  Bach,  2010 Stochas=c  natural  gradient  descent  on  Q     12 • P  and  Q  in  exponenGal  family. • Q  factorized: • At  every  iteraGon:  subsample  n<<N  data-­‐cases: • solve                                              analyGcally. • update  parameter                        using  stochas=c  natural  gradient  descent. 12vrijdag 4 juli 14
  • 13. General SVB very  high  variance sample 13 subsample  X   (ignoring  latent  variables  Z) 13vrijdag 4 juli 14
  • 14. Reparameterization Trick 14 -­‐Varia[onal  Bayesian  Inference  with  Stochas[c  Search  [D.M.  Blei,  M.I.  Jordan  and  J.W.  Paisley,  2012] -­‐Fixed-­‐Form  Varia[onal  Posterior  Approxima[on  through  Stochas[c  Linear  Regression  [T.  Salimans  and  A.  Knowles,  2013]. -­‐Black  Box  Varia[onal  Inference.  [R.  Ranganath,  S.  Gerrish  and  D.M.  Blei.  2013] -­‐Stochas[c  Varia[onal  Inference  [M.D.  Hoffman,  D.  Blei,  C.  Wang  and  J.  Paisley,  2013] -­‐Es[ma[ng  or  propaga[ng  gradients  through  stochas[c  neurons.  [Y.  Bengio,  2013]. -­‐Neural  Varia[onal  Inference  and  Learning  in  Belief  Networks.  [A.  Mnih  and  K.  Gregor,  2014] Kingma  2013,  Bengio  2013,  Kingma  &  W.  2014 Other  solu=ons  to  solve  the  same  "large  variance  problem": 14vrijdag 4 juli 14
  • 15. Reparameterization Trick 14 -­‐Varia[onal  Bayesian  Inference  with  Stochas[c  Search  [D.M.  Blei,  M.I.  Jordan  and  J.W.  Paisley,  2012] -­‐Fixed-­‐Form  Varia[onal  Posterior  Approxima[on  through  Stochas[c  Linear  Regression  [T.  Salimans  and  A.  Knowles,  2013]. -­‐Black  Box  Varia[onal  Inference.  [R.  Ranganath,  S.  Gerrish  and  D.M.  Blei.  2013] -­‐Stochas[c  Varia[onal  Inference  [M.D.  Hoffman,  D.  Blei,  C.  Wang  and  J.  Paisley,  2013] -­‐Es[ma[ng  or  propaga[ng  gradients  through  stochas[c  neurons.  [Y.  Bengio,  2013]. -­‐Neural  Varia[onal  Inference  and  Learning  in  Belief  Networks.  [A.  Mnih  and  K.  Gregor,  2014] Kingma  2013,  Bengio  2013,  Kingma  &  W.  2014 Other  solu=ons  to  solve  the  same  "large  variance  problem": Talk Monday June 23, 15:20 In Track F (Deep Learning II) 14vrijdag 4 juli 14
  • 16. Auto Encoding Variational Bayes Both  P(X|Z)  and  Q(Z|X)  are  general  models   (e.g.  deep  neural  net) Kingma  &  W.,  2013,  Rezende  et  al  2014 15 The  Helmholtz  machine   Wake/Sleep  algorithm Dayan,  Hinton,  Neal,  Zemel,  1995 Z X Q(Z|X)   P(X|Z)P(Z) 15vrijdag 4 juli 14
  • 17. The VB Landscape SVB SSVB AEVBFSSVB Stochas[c Varia[onal  Bayes Auto-­‐Encoding Varia[onal  Bayes Structured  Stoch. Varia[onal  Bayes Fully  Struc.  Stoch. Varia[onal  Bayes (ICML  2015) 16vrijdag 4 juli 14
  • 18. Variational Auto-Encoder (with 2 latent variables) 17 17vrijdag 4 juli 14
  • 19. Variational Auto-Encoder (with 2 latent variables) 17 17vrijdag 4 juli 14
  • 22. Semi-supervised Model Z X Y Q(Y,Z|X)  =  Q(Z|Y,X)Q(Y|X) Analogies:  Fix  Z,  vary  Y,  sample  X|Z,Y P(X,Z,Y)  =  P(X|Z,Y)P(Y)P(Z) Kingma,  Rezende,  Mohamed,  Wierstra,  W.,  2014 19vrijdag 4 juli 14
  • 23. REFERENCES  SVB: -­‐Prac[cal  Varia[onal  Inference  for  Neural  Networks  [Alex  Graves,  2011] -­‐Varia[onal  Bayesian  Inference  with  Stochas[c  Search  [D.M.  Blei,  M.I.  Jordan  and  J.W.  Paisley,  2012] -­‐Fixed-­‐Form  Varia[onal  Posterior  Approxima[on  through  Stochas[c  Linear  Regression.  Bayesian  Analysis  [T.  Salimans  and  A.  Knowles,  2013]. -­‐Black  Box  Varia[onal  Inference.  [R.  Ranganath,  S.  Gerrish  and  D.M.  Blei.  2013] -­‐Stochas[c  Varia[onal  Inference  [M.D.  Hoffman,  D.  Blei,  C.  Wang  and  J.  Paisley,  2013] -­‐Stochas[c  Structured  Mean  Field  Varia[onal  Inference  [MaYhew  Hoffman,    2013] -­‐Doubly  Stochas/c  Varia/onal  Bayes  for  non-­‐Conjugate  Inference  [M.  K.  Titsias  and  M.  Lázaro-­‐Gredilla,  2014] REFERENCES  STOCHASTIC  BACKPROP  AND  DEEP  GENERATIVE  MODELS -­‐Fast  Gradient-­‐Based  Inference  with  Con[nuous  Latent  Variable  Models  in  Auxiliary  Form.  [D.P.  Kingma,  2013]. -­‐Es[ma[ng  or  propaga[ng  gradients  through  stochas[c  neurons.  [Y.  Bengio,  2013]. -­‐Auto-­‐Encoding  Varia[onal  Bayes  [D.P.  Kingma  and  M.  W.,  2013]. -­‐Semi-­‐supervised  Learning  with  Deep  Genera[ve  Models  [D.P.  Kingma,  D.J.  Rezende,  S.  Mohamed,  M.  W.,  2014] -­‐Efficient  Gradient-­‐Based  Inference  through  Transforma/ons  between  Bayes  Nets  and  Neural  Nets  [D.P.  Kingma  and  M.  W.,  2014] -­‐Deep  Genera/ve  Stochas/c  Networks  Trainable  by  Backprop  [Y.  Bengio,  E.  Laufer,  G.  Alain,  J,  Yosinski,  2014] -­‐Stochas/c  Back-­‐propaga/on  and  Approximate  Inference  in  Deep  Genera/ve  Models  [D.J.  Rezende,  S.  Mohamed  and  D.  Wierstra,  2014] -­‐Deep  AutoRegressive  Networks  [K.  Gregor,  A.  Mnih  and  D.  Wierstra,  2014]. -­‐Neural  Varia/onal  Inference  and  Learning  in  Belief  Networks.  [A.  Mnih  and  K.  Gregor,  2014]. References: Lots of action at ICML 2014! 20vrijdag 4 juli 14
  • 24. Sampling 101 – Why MCMC? 21vrijdag 4 juli 14
  • 25. Sampling 101 – Why MCMC? 21vrijdag 4 juli 14
  • 26. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions 21vrijdag 4 juli 14
  • 27. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo 21vrijdag 4 juli 14
  • 28. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo • Make steps by perturbing previous sample 21vrijdag 4 juli 14
  • 29. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo • Make steps by perturbing previous sample • Probability of visiting a state is equal to P(θ|X) 21vrijdag 4 juli 14
  • 30. Sampling 101 – What is MCMC? .  .  .   22vrijdag 4 juli 14
  • 31. Sampling 101 – What is MCMC? .  .  .   22vrijdag 4 juli 14
  • 32. Sampling 101 – What is MCMC? .  .  .   .  .  .   22vrijdag 4 juli 14
  • 33. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 34. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 35. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 36. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 37. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 Auto correlation time 22vrijdag 4 juli 14
  • 38. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 Auto correlation time High  τ   Low    τ 22vrijdag 4 juli 14
  • 39. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) 23vrijdag 4 juli 14
  • 40. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Propose 23vrijdag 4 juli 14
  • 41. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose 23vrijdag 4 juli 14
  • 42. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose Is the new state more probable? 23vrijdag 4 juli 14
  • 43. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose Is it easy to come back to the current state? 23vrijdag 4 juli 14
  • 44. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose 23vrijdag 4 juli 14
  • 45. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose O (N ) For Bayesian Posterior Inference, 1) Burn-in is unnecessarily slow. 23vrijdag 4 juli 14
  • 46. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose O (N ) For Bayesian Posterior Inference, 2) is too high. 1) Burn-in is unnecessarily slow. 23vrijdag 4 juli 14
  • 47. Approximate MCMC Low Variance ( Fast ) High Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x 24vrijdag 4 juli 14
  • 48. Approximate MCMC Low Variance ( Fast ) High Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x 24vrijdag 4 juli 14
  • 49. Approximate MCMC Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x Decreasing ϵ 24vrijdag 4 juli 14
  • 50. Minimizing Risk 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 51. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 52. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 53. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 54. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 Given finite sampling time, ϵ=0 is not the optimal setting. 25vrijdag 4 juli 14
  • 55. Designing fast MCMC samplers Propose Accept/Reject O(N) 26vrijdag 4 juli 14
  • 56. Designing fast MCMC samplers Propose Accept/Reject O(N) Method 1 Develop an approximate accept/reject test that uses only a fraction of the data 26vrijdag 4 juli 14
  • 57. Designing fast MCMC samplers Method 2 Develop a proposal with acceptance probability ≈ 1 and avoid the expensive accept/reject test Propose Accept/Reject O(N) Method 1 Develop an approximate accept/reject test that uses only a fraction of the data 26vrijdag 4 juli 14
  • 58. Stochastic Gradient Langevin Dynamics W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 59. Stochastic Gradient Langevin Dynamics Langevin Dynamics θt+1 is then accepted /rejected using a Metropolis-Hastings test W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 60. Stochastic Gradient Langevin Dynamics Langevin Dynamics θt+1 is then accepted /rejected using a Metropolis-Hastings test W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 61. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 62. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test Avoid expensive Metropolis-Hastings test by keeping ε small W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 63. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test Avoid expensive Metropolis-Hastings test by keeping ε small W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 64. SGLD & Optimization OptimizationLarge ε 28 28vrijdag 4 juli 14
  • 65. SGLD & Optimization Optimization Small ε 29 29vrijdag 4 juli 14
  • 66. The SGLD Knob Burn-in Biased Exact Decrease ϵ over time Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x 30vrijdag 4 juli 14