SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Bayesian Posterior Inference
in the
Big Data Arena
Max Welling
Anoop Korattikara
1vrijdag 4 juli 14
Outline
• Introduction
• Stochastic Variational Inference
– Variational Inference 101
– Stochastic Variational Inference
– Deep Generative Models with SVB
• MCMC with mini-batches
– MCMC 101
– MCMC using noisy gradients
– MCMC using noisy Metropolis-Hastings
– Theoretical results
2vrijdag 4 juli 14
Big Data (mine is bigger than yours)
Square	
  Kilometer	
  Array	
  (SKA)	
  produces	
  1	
  Exabyte	
  per	
  day	
  by	
  2024…	
  
(interested	
  to	
  do	
  approximate	
  inference	
  on	
  this	
  data,	
  talk	
  to	
  me)
3vrijdag 4 juli 14
Introduction
Why	
  do	
  we	
  need	
  posterior	
  inference	
  if	
  the	
  datasets	
  are	
  BIG?	
  
4vrijdag 4 juli 14
p>>N
	
  Big	
  data	
  may	
  mean	
  large	
  p,	
  small	
  n
Gene	
  expression	
  data
fMRI	
  data
5
5vrijdag 4 juli 14
Planning
Planning	
  against	
  uncertainty	
  needs	
  probabili=es	
  
6
6vrijdag 4 juli 14
Little data inside Big data
Not	
  every	
  data-­‐case	
  carries	
  informa=on	
  about	
  every	
  model	
  component	
  
New	
  user	
  with	
  no	
  raGngs
(cold	
  start	
  problem)
7
7vrijdag 4 juli 14
1943:	
  First	
  NN
(+/-­‐	
  N=10)
1988:	
  NetTalk
(+/-­‐	
  N=20K)
2009:	
  Hinton’s	
  
Deep	
  Belief	
  Net
(+/-­‐	
  N=10M)
2013:	
  Google/Y!	
  
(N=+/-­‐	
  10B)
Big	
  Models!
Models	
  grow	
  faster	
  than	
  useful	
  informa=on	
  in	
  data
8
8vrijdag 4 juli 14
Two Ingredients for Big Data Bayes
Any	
  big	
  data	
  posterior	
  inference	
  algorithm	
  should:
1. easily	
  run	
  on	
  a	
  distributed	
  architecture.
2. only	
  use	
  a	
  small	
  mini-­‐batch	
  of	
  the	
  data	
  at	
  every	
  itera=on.
	
  	
  	
  	
  	
  	
  
9vrijdag 4 juli 14
Bayesian Posterior Inference
Variational Sampling
Variational Family Q
All probability distributions
• DeterminisGc
• Biased	
  
• Local	
  minima
• Easy	
  to	
  assess	
  convergence
• StochasGc	
  (sample	
  error)
• Unbiased
• Hard	
  to	
  mix	
  between	
  modes
• Hard	
  to	
  assess	
  convergence
10vrijdag 4 juli 14
Variational Bayes
11
Hinton	
  &	
  van	
  Camp	
  (1993)
Neal	
  &	
  Hinton	
  (1999)
Saul	
  &	
  Jordan	
  (1996)
Saul,	
  Jaakkola	
  &	
  Jordan	
  (1996)
ATas	
  (1999,2000)	
  
Wiegerinck	
  (2000)
Ghahramani	
  &	
  Beal	
  (2000,2001)
Coordinate	
  descent	
  on	
  Q
P
Q
(Bishop,	
  PaYern	
  Recogni[on	
  
and	
  Machine	
  Learning)
11vrijdag 4 juli 14
Stochastic VB Hoffman,	
  Blei	
  &	
  Bach,	
  2010
Stochas=c	
  natural	
  gradient	
  descent	
  on	
  Q	
  	
  
12
• P	
  and	
  Q	
  in	
  exponenGal	
  family.
• Q	
  factorized:
• At	
  every	
  iteraGon:	
  subsample	
  n<<N	
  data-­‐cases:
• solve	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  analyGcally.
• update	
  parameter	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  using	
  stochas=c	
  natural	
  gradient	
  descent.
12vrijdag 4 juli 14
General SVB
very	
  high	
  variance
sample
13
subsample	
  X	
  
(ignoring	
  latent	
  variables	
  Z)
13vrijdag 4 juli 14
Reparameterization Trick
14
-­‐Varia[onal	
  Bayesian	
  Inference	
  with	
  Stochas[c	
  Search	
  [D.M.	
  Blei,	
  M.I.	
  Jordan	
  and	
  J.W.	
  Paisley,	
  2012]
-­‐Fixed-­‐Form	
  Varia[onal	
  Posterior	
  Approxima[on	
  through	
  Stochas[c	
  Linear	
  Regression	
  [T.	
  Salimans	
  and	
  A.	
  Knowles,	
  2013].
-­‐Black	
  Box	
  Varia[onal	
  Inference.	
  [R.	
  Ranganath,	
  S.	
  Gerrish	
  and	
  D.M.	
  Blei.	
  2013]
-­‐Stochas[c	
  Varia[onal	
  Inference	
  [M.D.	
  Hoffman,	
  D.	
  Blei,	
  C.	
  Wang	
  and	
  J.	
  Paisley,	
  2013]
-­‐Es[ma[ng	
  or	
  propaga[ng	
  gradients	
  through	
  stochas[c	
  neurons.	
  [Y.	
  Bengio,	
  2013].
-­‐Neural	
  Varia[onal	
  Inference	
  and	
  Learning	
  in	
  Belief	
  Networks.	
  [A.	
  Mnih	
  and	
  K.	
  Gregor,	
  2014]
Kingma	
  2013,	
  Bengio	
  2013,	
  Kingma	
  &	
  W.	
  2014
Other	
  solu=ons	
  to	
  solve	
  the	
  same	
  "large	
  variance	
  problem":
14vrijdag 4 juli 14
Reparameterization Trick
14
-­‐Varia[onal	
  Bayesian	
  Inference	
  with	
  Stochas[c	
  Search	
  [D.M.	
  Blei,	
  M.I.	
  Jordan	
  and	
  J.W.	
  Paisley,	
  2012]
-­‐Fixed-­‐Form	
  Varia[onal	
  Posterior	
  Approxima[on	
  through	
  Stochas[c	
  Linear	
  Regression	
  [T.	
  Salimans	
  and	
  A.	
  Knowles,	
  2013].
-­‐Black	
  Box	
  Varia[onal	
  Inference.	
  [R.	
  Ranganath,	
  S.	
  Gerrish	
  and	
  D.M.	
  Blei.	
  2013]
-­‐Stochas[c	
  Varia[onal	
  Inference	
  [M.D.	
  Hoffman,	
  D.	
  Blei,	
  C.	
  Wang	
  and	
  J.	
  Paisley,	
  2013]
-­‐Es[ma[ng	
  or	
  propaga[ng	
  gradients	
  through	
  stochas[c	
  neurons.	
  [Y.	
  Bengio,	
  2013].
-­‐Neural	
  Varia[onal	
  Inference	
  and	
  Learning	
  in	
  Belief	
  Networks.	
  [A.	
  Mnih	
  and	
  K.	
  Gregor,	
  2014]
Kingma	
  2013,	
  Bengio	
  2013,	
  Kingma	
  &	
  W.	
  2014
Other	
  solu=ons	
  to	
  solve	
  the	
  same	
  "large	
  variance	
  problem":
Talk Monday June 23, 15:20
In Track F (Deep Learning II)
14vrijdag 4 juli 14
Auto Encoding Variational Bayes
Both	
  P(X|Z)	
  and	
  Q(Z|X)	
  are	
  general	
  models	
  
(e.g.	
  deep	
  neural	
  net)
Kingma	
  &	
  W.,	
  2013,	
  Rezende	
  et	
  al	
  2014
15
The	
  Helmholtz	
  machine	
  
Wake/Sleep	
  algorithm
Dayan,	
  Hinton,	
  Neal,	
  Zemel,	
  1995
Z
X
Q(Z|X)	
  
P(X|Z)P(Z)
15vrijdag 4 juli 14
The VB Landscape
SVB SSVB
AEVBFSSVB
Stochas[c
Varia[onal	
  Bayes
Auto-­‐Encoding
Varia[onal	
  Bayes
Structured	
  Stoch.
Varia[onal	
  Bayes
Fully	
  Struc.	
  Stoch.
Varia[onal	
  Bayes (ICML	
  2015)
16vrijdag 4 juli 14
Variational Auto-Encoder
(with 2 latent variables)
17
17vrijdag 4 juli 14
Variational Auto-Encoder
(with 2 latent variables)
17
17vrijdag 4 juli 14
Face Model
18vrijdag 4 juli 14
Face Model
18vrijdag 4 juli 14
Semi-supervised Model
Z
X
Y
Q(Y,Z|X)	
  =	
  Q(Z|Y,X)Q(Y|X)
Analogies:	
  Fix	
  Z,	
  vary	
  Y,	
  sample	
  X|Z,Y
P(X,Z,Y)	
  =	
  P(X|Z,Y)P(Y)P(Z)
Kingma,	
  Rezende,	
  Mohamed,	
  Wierstra,	
  W.,	
  2014
19vrijdag 4 juli 14
REFERENCES	
  SVB:
-­‐Prac[cal	
  Varia[onal	
  Inference	
  for	
  Neural	
  Networks	
  [Alex	
  Graves,	
  2011]
-­‐Varia[onal	
  Bayesian	
  Inference	
  with	
  Stochas[c	
  Search	
  [D.M.	
  Blei,	
  M.I.	
  Jordan	
  and	
  J.W.	
  Paisley,	
  2012]
-­‐Fixed-­‐Form	
  Varia[onal	
  Posterior	
  Approxima[on	
  through	
  Stochas[c	
  Linear	
  Regression.	
  Bayesian	
  Analysis	
  [T.	
  Salimans	
  and	
  A.	
  Knowles,	
  2013].
-­‐Black	
  Box	
  Varia[onal	
  Inference.	
  [R.	
  Ranganath,	
  S.	
  Gerrish	
  and	
  D.M.	
  Blei.	
  2013]
-­‐Stochas[c	
  Varia[onal	
  Inference	
  [M.D.	
  Hoffman,	
  D.	
  Blei,	
  C.	
  Wang	
  and	
  J.	
  Paisley,	
  2013]
-­‐Stochas[c	
  Structured	
  Mean	
  Field	
  Varia[onal	
  Inference	
  [MaYhew	
  Hoffman,	
  	
  2013]
-­‐Doubly	
  Stochas/c	
  Varia/onal	
  Bayes	
  for	
  non-­‐Conjugate	
  Inference	
  [M.	
  K.	
  Titsias	
  and	
  M.	
  Lázaro-­‐Gredilla,	
  2014]
REFERENCES	
  STOCHASTIC	
  BACKPROP	
  AND	
  DEEP	
  GENERATIVE	
  MODELS
-­‐Fast	
  Gradient-­‐Based	
  Inference	
  with	
  Con[nuous	
  Latent	
  Variable	
  Models	
  in	
  Auxiliary	
  Form.	
  [D.P.	
  Kingma,	
  2013].
-­‐Es[ma[ng	
  or	
  propaga[ng	
  gradients	
  through	
  stochas[c	
  neurons.	
  [Y.	
  Bengio,	
  2013].
-­‐Auto-­‐Encoding	
  Varia[onal	
  Bayes	
  [D.P.	
  Kingma	
  and	
  M.	
  W.,	
  2013].
-­‐Semi-­‐supervised	
  Learning	
  with	
  Deep	
  Genera[ve	
  Models	
  [D.P.	
  Kingma,	
  D.J.	
  Rezende,	
  S.	
  Mohamed,	
  M.	
  W.,	
  2014]
-­‐Efficient	
  Gradient-­‐Based	
  Inference	
  through	
  Transforma/ons	
  between	
  Bayes	
  Nets	
  and	
  Neural	
  Nets	
  [D.P.	
  Kingma	
  and	
  M.	
  W.,	
  2014]
-­‐Deep	
  Genera/ve	
  Stochas/c	
  Networks	
  Trainable	
  by	
  Backprop	
  [Y.	
  Bengio,	
  E.	
  Laufer,	
  G.	
  Alain,	
  J,	
  Yosinski,	
  2014]
-­‐Stochas/c	
  Back-­‐propaga/on	
  and	
  Approximate	
  Inference	
  in	
  Deep	
  Genera/ve	
  Models	
  [D.J.	
  Rezende,	
  S.	
  Mohamed	
  and	
  D.	
  Wierstra,	
  2014]
-­‐Deep	
  AutoRegressive	
  Networks	
  [K.	
  Gregor,	
  A.	
  Mnih	
  and	
  D.	
  Wierstra,	
  2014].
-­‐Neural	
  Varia/onal	
  Inference	
  and	
  Learning	
  in	
  Belief	
  Networks.	
  [A.	
  Mnih	
  and	
  K.	
  Gregor,	
  2014].
References: Lots of action at ICML 2014!
20vrijdag 4 juli 14
Sampling 101 – Why MCMC?
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
• Make steps by perturbing previous sample
21vrijdag 4 juli 14
Sampling 101 – Why MCMC?
Generating Independent Samples
Sample from g and suppress samples with low p(θ|X)
e.g. a) Rejection Sampling b) Importance Sampling
- Does not scale to high dimensions
Markov Chain Monte Carlo
• Make steps by perturbing previous sample
• Probability of visiting a state is equal to P(θ|X)
21vrijdag 4 juli 14
Sampling 101 – What is MCMC?
.	
  .	
  .	
  
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
.	
  .	
  .	
  
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
.	
  .	
  .	
   .	
  .	
  .	
  
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
Auto correlation time
22vrijdag 4 juli 14
Sampling 101 – What is MCMC?
Burn-in ( Throw away)
.	
  .	
  .	
   .	
  .	
  .	
  
Samples from S0
Auto correlation time
High	
  τ	
   Low	
  	
  τ
22vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Propose
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
Is the new state
more probable?
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
Is it easy to come back
to the current state?
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
O
(N
)
For Bayesian Posterior Inference,
1) Burn-in is unnecessarily slow.
23vrijdag 4 juli 14
Sampling 101 – Metropolis-Hastings
Transition Kernel T(θt+1|θt)
Accept/Reject TestPropose
O
(N
)
For Bayesian Posterior Inference,
2) is too high.
1) Burn-in is unnecessarily slow.
23vrijdag 4 juli 14
Approximate MCMC
Low
Variance
( Fast )
High Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
24vrijdag 4 juli 14
Approximate MCMC
Low
Variance
( Fast )
High Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
24vrijdag 4 juli 14
Approximate MCMC
Low
Variance
( Fast )
High
Variance
( Slow )
High Bias Low Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
x
x
x
Decreasing ϵ
24vrijdag 4 juli 14
Minimizing Risk
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
25vrijdag 4 juli 14
Minimizing Risk
X Axis – ϵ, Y Axis – Bias2, Variance, Risk
Computational Time
25
Risk Bias Variance
= +
2
Given finite sampling
time, ϵ=0 is not the
optimal setting.
25vrijdag 4 juli 14
Designing fast MCMC samplers
Propose Accept/Reject
O(N)
26vrijdag 4 juli 14
Designing fast MCMC samplers
Propose Accept/Reject
O(N)
Method 1
Develop an approximate
accept/reject test that uses
only a fraction of the data
26vrijdag 4 juli 14
Designing fast MCMC samplers
Method 2
Develop a proposal with
acceptance probability ≈ 1
and avoid the expensive
accept/reject test
Propose Accept/Reject
O(N)
Method 1
Develop an approximate
accept/reject test that uses
only a fraction of the data
26vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
θt+1 is then accepted /rejected using a Metropolis-Hastings test
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
θt+1 is then accepted /rejected using a Metropolis-Hastings test
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
Avoid expensive Metropolis-Hastings test by keeping ε small
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
Stochastic Gradient Langevin Dynamics
Langevin Dynamics
Stochastic Gradient Langevin Dynamics (SGLD)
θt+1 is then accepted /rejected using a Metropolis-Hastings test
Avoid expensive Metropolis-Hastings test by keeping ε small
W.	
  &	
  Teh,	
  2011
27vrijdag 4 juli 14
SGLD & Optimization
OptimizationLarge ε
28
28vrijdag 4 juli 14
SGLD & Optimization
Optimization
Small ε
29
29vrijdag 4 juli 14
The SGLD Knob
Burn-in Biased Exact
Decrease ϵ over time
Low
Variance
( Fast )
High
Variance
( Slow )
High Bias Low Bias
xx
x
x
x x
x xx x
x x
x
x
x
x
xx
x
x
x x
x
x
x
x
x
x
x x
x
x
x
x
30vrijdag 4 juli 14
Demo: SGLD
31
31vrijdag 4 juli 14
Demo: SGLD
31
31vrijdag 4 juli 14

Contenu connexe

Dernier

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Sérgio Sacani
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfPharmatech-rx
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algaekushbuR
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxMAGOTI ERNEST
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptxCherry
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionAreesha Ahmad
 
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana LahariERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Laharimuralinath2
 
THE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPES
THE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPESTHE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPES
THE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPESkushbuR
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxPat (JS) Heslop-Harrison
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfMohamed Said
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyAreesha Ahmad
 
MSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfMSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfSuchita Rawat
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.syedmuneemqadri
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed systemADB online India
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxArpitaMishra69
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfSuchita Rawat
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
 

Dernier (20)

Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
 
Film Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdfFilm Coated Tablet and Film Coating raw materials.pdf
Film Coated Tablet and Film Coating raw materials.pdf
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana LahariERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
ERTHROPOIESIS: Dr. E. Muralinath & R. Gnana Lahari
 
THE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPES
THE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPESTHE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPES
THE GENERAL PROPERTIES OF PROTEOBACTERIA AND ITS TYPES
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptxSaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
SaffronCrocusGenomicsThessalonikiOnlineMay2024TalkOnline.pptx
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
MSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdfMSC IV_Forensic medicine -sexual offence.pdf
MSC IV_Forensic medicine -sexual offence.pdf
 
NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.NUMERICAL Proof Of TIme Electron Theory.
NUMERICAL Proof Of TIme Electron Theory.
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Deep generative learning_icml_part1

  • 1. Bayesian Posterior Inference in the Big Data Arena Max Welling Anoop Korattikara 1vrijdag 4 juli 14
  • 2. Outline • Introduction • Stochastic Variational Inference – Variational Inference 101 – Stochastic Variational Inference – Deep Generative Models with SVB • MCMC with mini-batches – MCMC 101 – MCMC using noisy gradients – MCMC using noisy Metropolis-Hastings – Theoretical results 2vrijdag 4 juli 14
  • 3. Big Data (mine is bigger than yours) Square  Kilometer  Array  (SKA)  produces  1  Exabyte  per  day  by  2024…   (interested  to  do  approximate  inference  on  this  data,  talk  to  me) 3vrijdag 4 juli 14
  • 4. Introduction Why  do  we  need  posterior  inference  if  the  datasets  are  BIG?   4vrijdag 4 juli 14
  • 5. p>>N  Big  data  may  mean  large  p,  small  n Gene  expression  data fMRI  data 5 5vrijdag 4 juli 14
  • 6. Planning Planning  against  uncertainty  needs  probabili=es   6 6vrijdag 4 juli 14
  • 7. Little data inside Big data Not  every  data-­‐case  carries  informa=on  about  every  model  component   New  user  with  no  raGngs (cold  start  problem) 7 7vrijdag 4 juli 14
  • 8. 1943:  First  NN (+/-­‐  N=10) 1988:  NetTalk (+/-­‐  N=20K) 2009:  Hinton’s   Deep  Belief  Net (+/-­‐  N=10M) 2013:  Google/Y!   (N=+/-­‐  10B) Big  Models! Models  grow  faster  than  useful  informa=on  in  data 8 8vrijdag 4 juli 14
  • 9. Two Ingredients for Big Data Bayes Any  big  data  posterior  inference  algorithm  should: 1. easily  run  on  a  distributed  architecture. 2. only  use  a  small  mini-­‐batch  of  the  data  at  every  itera=on.             9vrijdag 4 juli 14
  • 10. Bayesian Posterior Inference Variational Sampling Variational Family Q All probability distributions • DeterminisGc • Biased   • Local  minima • Easy  to  assess  convergence • StochasGc  (sample  error) • Unbiased • Hard  to  mix  between  modes • Hard  to  assess  convergence 10vrijdag 4 juli 14
  • 11. Variational Bayes 11 Hinton  &  van  Camp  (1993) Neal  &  Hinton  (1999) Saul  &  Jordan  (1996) Saul,  Jaakkola  &  Jordan  (1996) ATas  (1999,2000)   Wiegerinck  (2000) Ghahramani  &  Beal  (2000,2001) Coordinate  descent  on  Q P Q (Bishop,  PaYern  Recogni[on   and  Machine  Learning) 11vrijdag 4 juli 14
  • 12. Stochastic VB Hoffman,  Blei  &  Bach,  2010 Stochas=c  natural  gradient  descent  on  Q     12 • P  and  Q  in  exponenGal  family. • Q  factorized: • At  every  iteraGon:  subsample  n<<N  data-­‐cases: • solve                                              analyGcally. • update  parameter                        using  stochas=c  natural  gradient  descent. 12vrijdag 4 juli 14
  • 13. General SVB very  high  variance sample 13 subsample  X   (ignoring  latent  variables  Z) 13vrijdag 4 juli 14
  • 14. Reparameterization Trick 14 -­‐Varia[onal  Bayesian  Inference  with  Stochas[c  Search  [D.M.  Blei,  M.I.  Jordan  and  J.W.  Paisley,  2012] -­‐Fixed-­‐Form  Varia[onal  Posterior  Approxima[on  through  Stochas[c  Linear  Regression  [T.  Salimans  and  A.  Knowles,  2013]. -­‐Black  Box  Varia[onal  Inference.  [R.  Ranganath,  S.  Gerrish  and  D.M.  Blei.  2013] -­‐Stochas[c  Varia[onal  Inference  [M.D.  Hoffman,  D.  Blei,  C.  Wang  and  J.  Paisley,  2013] -­‐Es[ma[ng  or  propaga[ng  gradients  through  stochas[c  neurons.  [Y.  Bengio,  2013]. -­‐Neural  Varia[onal  Inference  and  Learning  in  Belief  Networks.  [A.  Mnih  and  K.  Gregor,  2014] Kingma  2013,  Bengio  2013,  Kingma  &  W.  2014 Other  solu=ons  to  solve  the  same  "large  variance  problem": 14vrijdag 4 juli 14
  • 15. Reparameterization Trick 14 -­‐Varia[onal  Bayesian  Inference  with  Stochas[c  Search  [D.M.  Blei,  M.I.  Jordan  and  J.W.  Paisley,  2012] -­‐Fixed-­‐Form  Varia[onal  Posterior  Approxima[on  through  Stochas[c  Linear  Regression  [T.  Salimans  and  A.  Knowles,  2013]. -­‐Black  Box  Varia[onal  Inference.  [R.  Ranganath,  S.  Gerrish  and  D.M.  Blei.  2013] -­‐Stochas[c  Varia[onal  Inference  [M.D.  Hoffman,  D.  Blei,  C.  Wang  and  J.  Paisley,  2013] -­‐Es[ma[ng  or  propaga[ng  gradients  through  stochas[c  neurons.  [Y.  Bengio,  2013]. -­‐Neural  Varia[onal  Inference  and  Learning  in  Belief  Networks.  [A.  Mnih  and  K.  Gregor,  2014] Kingma  2013,  Bengio  2013,  Kingma  &  W.  2014 Other  solu=ons  to  solve  the  same  "large  variance  problem": Talk Monday June 23, 15:20 In Track F (Deep Learning II) 14vrijdag 4 juli 14
  • 16. Auto Encoding Variational Bayes Both  P(X|Z)  and  Q(Z|X)  are  general  models   (e.g.  deep  neural  net) Kingma  &  W.,  2013,  Rezende  et  al  2014 15 The  Helmholtz  machine   Wake/Sleep  algorithm Dayan,  Hinton,  Neal,  Zemel,  1995 Z X Q(Z|X)   P(X|Z)P(Z) 15vrijdag 4 juli 14
  • 17. The VB Landscape SVB SSVB AEVBFSSVB Stochas[c Varia[onal  Bayes Auto-­‐Encoding Varia[onal  Bayes Structured  Stoch. Varia[onal  Bayes Fully  Struc.  Stoch. Varia[onal  Bayes (ICML  2015) 16vrijdag 4 juli 14
  • 18. Variational Auto-Encoder (with 2 latent variables) 17 17vrijdag 4 juli 14
  • 19. Variational Auto-Encoder (with 2 latent variables) 17 17vrijdag 4 juli 14
  • 22. Semi-supervised Model Z X Y Q(Y,Z|X)  =  Q(Z|Y,X)Q(Y|X) Analogies:  Fix  Z,  vary  Y,  sample  X|Z,Y P(X,Z,Y)  =  P(X|Z,Y)P(Y)P(Z) Kingma,  Rezende,  Mohamed,  Wierstra,  W.,  2014 19vrijdag 4 juli 14
  • 23. REFERENCES  SVB: -­‐Prac[cal  Varia[onal  Inference  for  Neural  Networks  [Alex  Graves,  2011] -­‐Varia[onal  Bayesian  Inference  with  Stochas[c  Search  [D.M.  Blei,  M.I.  Jordan  and  J.W.  Paisley,  2012] -­‐Fixed-­‐Form  Varia[onal  Posterior  Approxima[on  through  Stochas[c  Linear  Regression.  Bayesian  Analysis  [T.  Salimans  and  A.  Knowles,  2013]. -­‐Black  Box  Varia[onal  Inference.  [R.  Ranganath,  S.  Gerrish  and  D.M.  Blei.  2013] -­‐Stochas[c  Varia[onal  Inference  [M.D.  Hoffman,  D.  Blei,  C.  Wang  and  J.  Paisley,  2013] -­‐Stochas[c  Structured  Mean  Field  Varia[onal  Inference  [MaYhew  Hoffman,    2013] -­‐Doubly  Stochas/c  Varia/onal  Bayes  for  non-­‐Conjugate  Inference  [M.  K.  Titsias  and  M.  Lázaro-­‐Gredilla,  2014] REFERENCES  STOCHASTIC  BACKPROP  AND  DEEP  GENERATIVE  MODELS -­‐Fast  Gradient-­‐Based  Inference  with  Con[nuous  Latent  Variable  Models  in  Auxiliary  Form.  [D.P.  Kingma,  2013]. -­‐Es[ma[ng  or  propaga[ng  gradients  through  stochas[c  neurons.  [Y.  Bengio,  2013]. -­‐Auto-­‐Encoding  Varia[onal  Bayes  [D.P.  Kingma  and  M.  W.,  2013]. -­‐Semi-­‐supervised  Learning  with  Deep  Genera[ve  Models  [D.P.  Kingma,  D.J.  Rezende,  S.  Mohamed,  M.  W.,  2014] -­‐Efficient  Gradient-­‐Based  Inference  through  Transforma/ons  between  Bayes  Nets  and  Neural  Nets  [D.P.  Kingma  and  M.  W.,  2014] -­‐Deep  Genera/ve  Stochas/c  Networks  Trainable  by  Backprop  [Y.  Bengio,  E.  Laufer,  G.  Alain,  J,  Yosinski,  2014] -­‐Stochas/c  Back-­‐propaga/on  and  Approximate  Inference  in  Deep  Genera/ve  Models  [D.J.  Rezende,  S.  Mohamed  and  D.  Wierstra,  2014] -­‐Deep  AutoRegressive  Networks  [K.  Gregor,  A.  Mnih  and  D.  Wierstra,  2014]. -­‐Neural  Varia/onal  Inference  and  Learning  in  Belief  Networks.  [A.  Mnih  and  K.  Gregor,  2014]. References: Lots of action at ICML 2014! 20vrijdag 4 juli 14
  • 24. Sampling 101 – Why MCMC? 21vrijdag 4 juli 14
  • 25. Sampling 101 – Why MCMC? 21vrijdag 4 juli 14
  • 26. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions 21vrijdag 4 juli 14
  • 27. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo 21vrijdag 4 juli 14
  • 28. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo • Make steps by perturbing previous sample 21vrijdag 4 juli 14
  • 29. Sampling 101 – Why MCMC? Generating Independent Samples Sample from g and suppress samples with low p(θ|X) e.g. a) Rejection Sampling b) Importance Sampling - Does not scale to high dimensions Markov Chain Monte Carlo • Make steps by perturbing previous sample • Probability of visiting a state is equal to P(θ|X) 21vrijdag 4 juli 14
  • 30. Sampling 101 – What is MCMC? .  .  .   22vrijdag 4 juli 14
  • 31. Sampling 101 – What is MCMC? .  .  .   22vrijdag 4 juli 14
  • 32. Sampling 101 – What is MCMC? .  .  .   .  .  .   22vrijdag 4 juli 14
  • 33. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 34. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 35. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 36. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 22vrijdag 4 juli 14
  • 37. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 Auto correlation time 22vrijdag 4 juli 14
  • 38. Sampling 101 – What is MCMC? Burn-in ( Throw away) .  .  .   .  .  .   Samples from S0 Auto correlation time High  τ   Low    τ 22vrijdag 4 juli 14
  • 39. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) 23vrijdag 4 juli 14
  • 40. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Propose 23vrijdag 4 juli 14
  • 41. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose 23vrijdag 4 juli 14
  • 42. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose Is the new state more probable? 23vrijdag 4 juli 14
  • 43. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose Is it easy to come back to the current state? 23vrijdag 4 juli 14
  • 44. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose 23vrijdag 4 juli 14
  • 45. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose O (N ) For Bayesian Posterior Inference, 1) Burn-in is unnecessarily slow. 23vrijdag 4 juli 14
  • 46. Sampling 101 – Metropolis-Hastings Transition Kernel T(θt+1|θt) Accept/Reject TestPropose O (N ) For Bayesian Posterior Inference, 2) is too high. 1) Burn-in is unnecessarily slow. 23vrijdag 4 juli 14
  • 47. Approximate MCMC Low Variance ( Fast ) High Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x 24vrijdag 4 juli 14
  • 48. Approximate MCMC Low Variance ( Fast ) High Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x 24vrijdag 4 juli 14
  • 49. Approximate MCMC Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x Decreasing ϵ 24vrijdag 4 juli 14
  • 50. Minimizing Risk 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 51. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 52. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 53. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 25vrijdag 4 juli 14
  • 54. Minimizing Risk X Axis – ϵ, Y Axis – Bias2, Variance, Risk Computational Time 25 Risk Bias Variance = + 2 Given finite sampling time, ϵ=0 is not the optimal setting. 25vrijdag 4 juli 14
  • 55. Designing fast MCMC samplers Propose Accept/Reject O(N) 26vrijdag 4 juli 14
  • 56. Designing fast MCMC samplers Propose Accept/Reject O(N) Method 1 Develop an approximate accept/reject test that uses only a fraction of the data 26vrijdag 4 juli 14
  • 57. Designing fast MCMC samplers Method 2 Develop a proposal with acceptance probability ≈ 1 and avoid the expensive accept/reject test Propose Accept/Reject O(N) Method 1 Develop an approximate accept/reject test that uses only a fraction of the data 26vrijdag 4 juli 14
  • 58. Stochastic Gradient Langevin Dynamics W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 59. Stochastic Gradient Langevin Dynamics Langevin Dynamics θt+1 is then accepted /rejected using a Metropolis-Hastings test W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 60. Stochastic Gradient Langevin Dynamics Langevin Dynamics θt+1 is then accepted /rejected using a Metropolis-Hastings test W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 61. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 62. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test Avoid expensive Metropolis-Hastings test by keeping ε small W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 63. Stochastic Gradient Langevin Dynamics Langevin Dynamics Stochastic Gradient Langevin Dynamics (SGLD) θt+1 is then accepted /rejected using a Metropolis-Hastings test Avoid expensive Metropolis-Hastings test by keeping ε small W.  &  Teh,  2011 27vrijdag 4 juli 14
  • 64. SGLD & Optimization OptimizationLarge ε 28 28vrijdag 4 juli 14
  • 65. SGLD & Optimization Optimization Small ε 29 29vrijdag 4 juli 14
  • 66. The SGLD Knob Burn-in Biased Exact Decrease ϵ over time Low Variance ( Fast ) High Variance ( Slow ) High Bias Low Bias xx x x x x x xx x x x x x x x xx x x x x x x x x x x x x x x x x 30vrijdag 4 juli 14