SlideShare une entreprise Scribd logo
1  sur  63
Télécharger pour lire hors ligne
Sequential inference via low-dimensional
couplings
Youssef Marzouk
joint work with Alessio Spantini and Daniele Bigoni
Department of Aeronautics and Astronautics
Center for Computational Engineering
Statistics and Data Science Center
Massachusetts Institute of Technology
http://uqgroup.mit.edu
Support from DOE ASCR, NSF, DARPA
14 December 2017
Spantini, Bigoni, & Marzouk MIT 1 / 37
Bayesian inference in large-scale models
Observations y Parameters x
πpos(x) := π(x|y) ∝ π(y|x)πpr(x)
Bayes’ rule
Need to characterize the posterior distribution (density πpos)
This is a challenging task since:
x ∈ Rn
is typically high-dimensional (e.g., a discretized function)
πpos is non-Gaussian
evaluations of πpos may be expensive
πpos can be evaluated up to a normalizing constant
Spantini, Bigoni, & Marzouk MIT 2 / 37
Sequential Bayesian inference
State estimation (e.g., filtering and smoothing) or joint state and
parameter estimation, in a Bayesian setting
Need recursive, online algorithms for characterizing the posterior
distribution
Spantini, Bigoni, & Marzouk MIT 3 / 37
Bayesian filtering and smoothing
Nonlinear/non-Gaussian state-space model:
Transition density πZk |Zk−1
Observation density (likelihood) πYk |Zk
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Interested in recursively updating the full Bayesian solution:
πZ0:k | y0:k
→ πZ0:k+1 | y0:k+1
(smoothing)
Or focus on approximating the filtering distribution:
πZk | y0:k
→ πZk+1 | y0:k+1
(marginals of the full Bayesian solution)
Spantini, Bigoni, & Marzouk MIT 4 / 37
Joint state and parameter inference
Nonlinear/non-Gaussian state-space model with static params Θ:
Transition density πZk |Zk−1,Θ
Observation density (likelihood) πYk |Zk
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Θ
Again, interested in recursively updating the full Bayesian solution:
πZ0:k ,Θ | y0:k
→ πZ0:k+1,Θ | y0:k+1
(smoothing + sequential parameter inference)
Spantini, Bigoni, & Marzouk MIT 5 / 37
Joint state and parameter inference
Nonlinear/non-Gaussian state-space model with static params Θ:
Transition density πZk |Zk−1,Θ
Observation density (likelihood) πYk |Zk
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Θ
Again, interested in recursively updating the full Bayesian solution:
πZ0:k ,Θ | y0:k
→ πZ0:k+1,Θ | y0:k+1
(smoothing + sequential parameter inference)
Relate these goals to notions of coupling and transport. . .
Spantini, Bigoni, & Marzouk MIT 5 / 37
Deterministic couplings of probability measures
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
˜
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
T
η π
Core idea
Choose a reference distribution η (e.g., standard Gaussian)
Seek a transport map T : Rn → Rn such that T η = π
Spantini, Bigoni, & Marzouk MIT 6 / 37
Deterministic couplings of probability measures
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
˜
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
S =T−1
η π
Core idea
Choose a reference distribution η (e.g., standard Gaussian)
Seek a transport map T : Rn → Rn such that T η = π
Equivalently, find S = T−1 such that S π = η
Spantini, Bigoni, & Marzouk MIT 6 / 37
Deterministic couplings of probability measures
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
˜
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
S =T−1
η π
Core idea
Choose a reference distribution η (e.g., standard Gaussian)
Seek a transport map T : Rn → Rn such that T η = π
Equivalently, find S = T−1 such that S π = η
Might be content satisfying these conditions only approximately. . .
Spantini, Bigoni, & Marzouk MIT 6 / 37
Choice of transport map
A useful building block is the Knothe-Rosenblatt rearrangement:
T(x) =





T1(x1)
T2(x1, x2)
...
Tn(x1, x2, . . . , xn)





Exists and is unique (up to ordering) under mild conditions on η, π
Jacobian determinant easy to evaluate
Inverse map S = T−1 also lower triangular
“Exposes” marginals, enables conditional sampling. . .
Spantini, Bigoni, & Marzouk MIT 7 / 37
Choice of transport map
A useful building block is the Knothe-Rosenblatt rearrangement:
T(x) =





T1(x1)
T2(x1, x2)
...
Tn(x1, x2, . . . , xn)





Exists and is unique (up to ordering) under mild conditions on η, π
Jacobian determinant easy to evaluate
Inverse map S = T−1 also lower triangular
“Exposes” marginals, enables conditional sampling. . .
Numerical approximations can employ a monotone parameterization,
guaranteeing ∂xk
Tk > 0 for arbitrary ak , bk
Tk
(x1, . . . , xk ) = ak (x1, . . . , xk−1)+
xk
0
exp(bk (x1, . . . , xk−1, w)) dw
Spantini, Bigoni, & Marzouk MIT 7 / 37
Variational inference
Variational characterization of the direct map T [Moselhy & M 2012]:
min
T∈T
DKL( T η || π ) = min
T∈T
DKL( η || T−1
π )
T is the set of monotone lower triangular maps
Contains the Knothe-Rosenblatt rearrangement
Expectation is with respect to the reference measure η
Compute via, e.g., Monte Carlo, QMC, quadrature
Use unnormalized evaluations of π and its gradients
No MCMC or importance sampling
Spantini, Bigoni, & Marzouk MIT 8 / 37
Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
Useful features
Move samples; don’t just reweigh them
Independent and cheap samples: xi ∼ η ⇒ T(xi )
Clear convergence criterion, even with unnormalized target density:
DKL( T η || π ) ≈
1
2
Varη log
η
T−1
¯π
Spantini, Bigoni, & Marzouk MIT 10 / 37
Useful features
Move samples; don’t just reweigh them
Independent and cheap samples: xi ∼ η ⇒ T(xi )
Clear convergence criterion, even with unnormalized target density:
DKL( T η || π ) ≈
1
2
Varη log
η
T−1
¯π
Can either accept bias or reduce it by:
Increasing the complexity of the map T
Sampling the pullback T−1
π using MCMC [Parno & M 2015] or
importance sampling
Spantini, Bigoni, & Marzouk MIT 10 / 37
Useful features
Move samples; don’t just reweigh them
Independent and cheap samples: xi ∼ η ⇒ T(xi )
Clear convergence criterion, even with unnormalized target density:
DKL( T η || π ) ≈
1
2
Varη log
η
T−1
¯π
Can either accept bias or reduce it by:
Increasing the complexity of the map T
Sampling the pullback T−1
π using MCMC [Parno & M 2015] or
importance sampling
The KR parameterization is just one of many variational
constructions for continuous transport (cf. Stein variational gradient
descent [Liu & Wang 2016], normalizing flows [Rezende & Mohamed 2015],
Gibbs flows [Heng et al. 2015], particle flow filter [Reich 2011], implicit
sampling [Chorin et al. 2009–2015])
Spantini, Bigoni, & Marzouk MIT 10 / 37
Low-dimensional structure
Key challenge: maps in high dimensions
Major bottleneck: representation of the map, e.g., cardinality of the
map basis
How to make the construction/representation of high-dimensional
transports tractable?
Spantini, Bigoni, & Marzouk MIT 11 / 37
Low-dimensional structure
Key challenge: maps in high dimensions
Major bottleneck: representation of the map, e.g., cardinality of the
map basis
How to make the construction/representation of high-dimensional
transports tractable?
Main idea: exploit Markov structure of the target distribution
Leads to various low-dimensional properties of transport maps:
1 Decomposability
2 Sparsity
3 Low rank
Spantini, Bigoni, & Marzouk MIT 11 / 37
Markov random fields
Let Z1, . . . , Zn be random variables with joint density π > 0
A BS(i, j) /∈ E iff Zi ⊥⊥ Zj | ZV{i,j}
G encodes conditional independence (an I-map for π)
Choice of the probabilistic model =⇒ graphical structure
Spantini, Bigoni, & Marzouk MIT 12 / 37
Decomposable transport maps
Definition: a decomposable transport is a map T = T1 ◦ · · · ◦ Tk
that factorizes as the composition of finitely many maps of low
effective dimension that are triangular (up to a permutation), e.g.,
T(x) =








A1(x1, x2, x3)
B1(x2, x3)
C1(x3)
x4
x5
x6








T1
◦








x1
A2(x2, x3, x4, x5)
B2(x3, x4, x5)
C2(x4, x5)
D2(x5)
x6








T2
◦








x1
x2
x3
A3(x4)
B3(x4, x5)
C3(x4, x5, x6)








T3
Theorem: [Spantini et al. (2017)] Decomposable graphical models for π
lead to decomposable direct maps T, provided that η(x) = i η(xi )
Spantini, Bigoni, & Marzouk MIT 13 / 37
Graph decomposition (generic example)
A BS
For a given decomposition (A, S, B), consider the transport map
T1(x) =
TS∪A(xS∪A)
xB
=


h(xS, xA)
g(xS)
xB


such that the submap TS∪A satisfies ηXS∪A
TS∪A
−→ πXS∪A
What can we say about the pullback density T1π ?
Spantini, Bigoni, & Marzouk MIT 13 / 37
Graph decomposition (generic example)
A BS
Figure: I-map for the pullback of π through T1
Just remove any edge incident to any node in A
T1 is essentially a 3-D map
Pulling back π through T1 makes XA independent of XS∪B!
Spantini, Bigoni, & Marzouk MIT 13 / 37
Graph decomposition (generic example)
A A BS
Figure: I-map for the pullback of π through T1
Recursion at step k
1 Consider a new decomposition (A, S, B)
2 Compute transport Tk
3 Pullback by Tk
Spantini, Bigoni, & Marzouk MIT 13 / 37
Graph decomposition (generic example)
A A BS
Figure: I-map for the pullback of π through T1 ◦ T2
Recursion at step k
1 Consider a new decomposition (A, S, B)
2 Compute transport Tk
3 Pullback by Tk
Spantini, Bigoni, & Marzouk MIT 13 / 37
Graph decomposition (generic example)
A BS A A
(right) I-map for the pullback of π through T
T(x) =








A1(x1, x2, x3)
B1(x2, x3)
C1(x3)
x4
x5
x6








T1
◦








x1
A2(x2, x3, x4, x5)
B2(x3, x4, x5)
C2(x4, x5)
D2(x5)
x6








T2
◦








x1
x2
x3
A3(x4)
B3(x4, x5)
C3(x4, x5, x6)








T3
Spantini, Bigoni, & Marzouk MIT 13 / 37
Application to state-space models (chain graph)
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
X0 X1 X2 X3 XN
Compute M0 : R2n → R2n s.t.
M0(x0, x1) =
A0(x0, x1)
B0(x1)
Reference: ηX0 ηX1
Target: πZ0 πZ1|Z0
πY0|Z0
πY1|Z1
B0())
dim(M0) 2 × dim(Z0)
T0(x) =













A0(x0, x1)
B0(x1)
x2
x3
x4
x5
...
xN













Spantini, Bigoni, & Marzouk MIT 14 / 37
Second step: compute another 2-D map
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
X0 X1 X2 X3 XN
Compute M1 : R2n → R2n s.t.
M1(x1, x2) =
A1(x1, x2)
B1(x2)
Reference: ηX1 ηX2
Target: ηX1 πY2|Z2
πZ2|Z1
(·| B0 (·) )
Uses only one component of M0 ()
T1(x) =













x0
A1(x1, x2)
B1(x2)
x3
x4
x5
...
xN













Spantini, Bigoni, & Marzouk MIT 15 / 37
Proceed recursively forward in time
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
X0 X1 X2 X3 XN
Compute M2 : R2n → R2n s.t.
M2(x2, x3) =
A2(x2, x3)
B2(x3)
Reference: ηX2 ηX3
Target: ηX2 πY3|Z3
πZ3|Z2
(·| B1 (·) )
Uses only one component of M1 ()
T2(x) =













x0
x1
A2(x2, x3)
B2(x3)
x4
x5
...
xN













Spantini, Bigoni, & Marzouk MIT 16 / 37
A decomposition theorem for chains
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
X0 X1 X2 X3 XN
Theorem.
1 (Bk ) ηXk+1
= πZk+1 | Y0:k+1
(filtering)
2 (Mk ) ηXk:k+1
πZk ,Zk+1 | Y0:k+1
(lag-1 smoothing)
3 (T1 ◦ · · · ◦ Tk ) ηX0:k+1
= πZ0:k+1| Y0:k+1
(full Bayesian solution)
Spantini, Bigoni, & Marzouk MIT 17 / 37
A nested decomposable map
Tk = T0 ◦ T1 ◦ · · · ◦ Tk characterizes the joint dist πZ0:k+1| Y0:k+1
Tk+1(x) =













A0(x0, x1)
B0(x1)
x2
x3
x4
x5
...
xN













T0
◦













x0
A1(x1, x2)
B1(x2)
x3
x4
x5
...
xN













T1
◦













x0
x1
A2(x2, x3)
B2(x3)
x4
x5
...
xN













T2
◦ · · ·
Trivial to go from Tk to Tk+1: just append a new map Tk+1
No need to recompute T0, . . . , Tk (nested transports)
Tk is dense and high-dimensional but decomposable
Spantini, Bigoni, & Marzouk MIT 18 / 37
A nested decomposable map
Tk = T0 ◦ T1 ◦ · · · ◦ Tk characterizes the joint dist πZ0:k+1| Y0:k+1
Tk+1(x) =













A0(x0, x1)
B0(x1)
x2
x3
x4
x5
...
xN













T0
◦













x0
A1(x1, x2)
B1(x2)
x3
x4
x5
...
xN













T1
◦













x0
x1
A2(x2, x3)
B2(x3)
x4
x5
...
xN













T2
◦ · · ·
Trivial to go from Tk to Tk+1: just append a new map Tk+1
No need to recompute T0, . . . , Tk (nested transports)
Tk is dense and high-dimensional but decomposable
Spantini, Bigoni, & Marzouk MIT 18 / 37
A single-pass algorithm on the model
Meta-algorithm:
1 Compute the maps M0, M1, . . ., each of dimension 2 × dim(Z0)
2 Embed each Mj into an identity map to form Tj
3 Evaluate T0 ◦ · · · ◦ Tk for the full Bayesian solution
Remarks:
A single pass on the state-space model
Non-Gaussian generalization of the Rauch-Tung-Striebel smoother
Bias is only due to the numerical approximation of each map Mi
Can either accept the bias or reduce it by:
Increasing the complexity of each map Mi , or
Computing weights given by the proposal density
(T0 ◦ T1 ◦ · · · ◦ Tk ) ηX0:k+1
The cost of evaluating smoothing weights grows linearly with time
Spantini, Bigoni, & Marzouk MIT 19 / 37
Joint parameter/state estimation
Generalize to sequential joint parameter/state estimation
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Θ
(T0 ◦ · · · ◦ Tk ) ηΘ ηX0:k+1
= πΘ ,Z0:k+1 | Y0:k+1
(full Bayesian solution)
Now dim(Mj ) = 2 × dim(Zj ) + dim(Θ)
Remarks:
No artificial dynamic for the static parameters
No a priori fixed-lag smoothing approximation
Spantini, Bigoni, & Marzouk MIT 20 / 37
Example: stochastic volatility model
Build the decomposition recursively
T = Id◦T1◦TN−1
A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Start with the identity map
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = Id◦T1◦TN−1
A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Find a good first decomposition of G
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0◦Id◦TN−1
A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Compute an (essentially) 4-D T0 and pull back π
Underlying approximation of µ, φ, Z1|Y1
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0◦Id◦TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Find a new decomposition
Underlying approximation of µ, φ, Z1|Y1
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1◦Id◦TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Compute an (essentially) 4-D T1 and pull back π
Underlying approximation of µ, φ, Z1:2|Y1:2
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1◦Id◦TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Continue the recursion until no edges are left. . .
Underlying approximation of µ, φ, Z1:2|Y1:2
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1 ◦ T2◦Id ◦ TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Continue the recursion until no edges are left. . .
Underlying approximation of µ, φ, Z1:3|Y1:3
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1 ◦ T2 ◦ · · · ◦ TN−3◦Id
A A
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Continue the recursion until no edges are left. . .
Underlying approximation of µ, φ, Z1:N−1|Y1:N−1
Spantini, Bigoni, & Marzouk MIT 21 / 37
Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1 ◦ T2 ◦ · · · ◦ TN−3 ◦ TN−2◦Id
A A
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Each map Tk is essentially 4-D regardless of N
Underlying approximation of µ, φ, Z1:N|Y1:N
Spantini, Bigoni, & Marzouk MIT 21 / 37
Another decomposable map
Tk+1(x) =













P0(xθ)
A0(xθ, x0, x1)
B0(xθ, x1)
x2
x3
x4
...
xN













T0
◦













P1(xθ)
x0
A1(xθ, x1, x2)
B1(xθ, x2)
x3
x4
...
xN













T1
◦













P2(xθ)
x0
x1
A2(xθ, x2, x3)
B2(xθ, x3)
x4
...
xN













T2
◦· · ·
(P0 ◦ · · · ◦ Pk ) ηΘ = πΘ | Y0:k+1
(parameter inference)
If Pk = P0 ◦ · · · ◦ Pk , then Pk can be computed recursively as
Pk = Pk−1 ◦ Pk
=⇒ cost of evaluating Pk does not grow with k
Spantini, Bigoni, & Marzouk MIT 22 / 37
Example: stochastic volatility model
Stochastic volatility model: Latent log-volatilities take the form of
an AR(1) process for t = 1, . . . , N:
Zt+1 = µ + φ (Zt − µ) + ηt, ηt ∼ N(0, 1), Z1 ∼ N(0, 1/1 − φ2
)
Observe the mean return for holding an asset at time t
Yt = εt exp( 0.5 Zt ), εt ∼ N(0, 1), t = 1, . . . , N
Markov structure for π ∼ µ, φ, Z1:N|Y1:N is given by:
A BS
1 2 3 4 N
µ φ
Spantini, Bigoni, & Marzouk MIT 23 / 37
Stochastic volatility example
0 100 200 300 400 500
time
3
2
1
0
1
Zt
Infer log-volatility of the pound/dollar exchange rate, starting on 1
October 1981
Filtering (blue) versus smoothing (red) marginals
Spantini, Bigoni, & Marzouk MIT 24 / 37
Stochastic volatility example
0 200 400 600 800
time
3
2
1
0
1
2
Zt|Y0:N
Quantiles of smoothing marginals: map (red) compared to MCMC
(black)
Spantini, Bigoni, & Marzouk MIT 25 / 37
Stochastic volatility example
time
0 200 400 600 800
2.01.51.00.50.00.51.01.5
|Y0:t
0.0
0.5
1.0
1.5
2.0
2.5
Sequential parameter estimation: marginals of hyperparameter µ,
conditioning on successively more observations y0:k
Map (solid lines) versus MCMC (dashed)
Spantini, Bigoni, & Marzouk MIT 26 / 37
Stochastic volatility example
time
0 200 400 600 800
0.4
0.5
0.6
0.7
0.80.91.0
|Y0:t
0
5
10
15
20
25
30
Sequential parameter estimation: marginals of hyperparameter φ,
conditioning on successively more observations y0:k
Map (solid lines) versus MCMC (dashed)
Spantini, Bigoni, & Marzouk MIT 27 / 37
Stochastic volatility example
0 200 400 600 800
time
3
2
1
0
1
2
3
4
Yy|Y0:N
data
Posterior predictive distribution (shaded red) versus the data (dots)
Spantini, Bigoni, & Marzouk MIT 28 / 37
Stochastic volatility example
4500 5000 5500 6000 6500 7000 7500 8000 8500 9000
time
3
2
1
0
1
2
Zt|Y0:N
Long-time smoothing: map (red) compared to MCMC (black)
Observe fall 2008 financial crisis, Brexit, . . .
Spantini, Bigoni, & Marzouk MIT 29 / 37
Stochastic volatility example
Variance diagnostic Varη[log(η/T−1
¯π)] values, for a 947-dimensional
target π (smoothing and parameter estimation for 945 days) :
Laplace map = 5.68; linear maps = 1.49; degree ≤ 7 maps = 0.11
Important open question: how does error in the approximation of
the parameter posterior evolve over time?
Spantini, Bigoni, & Marzouk MIT 30 / 37
Beyond the Markov properties of π
Alternative source of low dimensionality: low-rank structure
Example: fix target π to be the posterior density of a Bayesian
inference problem,
π(z) := πpos(z) ∝ πY|Z(y | z) πZ(z)
Let Tpr push forward the reference η to the prior πZ (prior map)
Define πpos(z) := (T−1
pr ) πpos(z) ∝ πY|Z(y |Tpr(z) ) η(z)
Theorem [Graph decoupling]
If η = i ηXi
and
rank Eη [ log R ⊗ log R ] = k, R = πpos/η = πY|Z ◦ Tpr
then there exists a rotation Q such that:
Q−1
πpos(z) = g( z1, ..., zk )
n
i>k
ηXi
(zi )
Spantini, Bigoni, & Marzouk MIT 31 / 37
Changing the Markov structure. . .
The pullback has a different Markov structure:
Q−1
πpos(z) = g( z1, ..., zk )
n
i>k
ηXi
(zi )
G G Pullback
Corollary: There exists a transport T η = Q−1
πpos of the form
T(x) = [ g(x1:k ), xk+1, . . . , xn ], where g : Rk → Rk .
The composition Tpr ◦ Q ◦ T pushes forward η to πpos
Why low rank structure? For example, few data-informed
directions.
Spantini, Bigoni, & Marzouk MIT 32 / 37
Log-Gaussian Cox process
6
7
8
9
10
0
1
2
3
4
5
6
4096-D GMRF prior, Z ∼ N(µ, Γ), Γ−1 specified through + κ2 Id
30 sparse observations at locations i ∈ I, Yi |Zi ∼ Pois( exp Zi )
Posterior density Z|Y ∼ πpos is:
πpos(z) ∝
i∈I
exp[− exp(zi ) + zi · yi ] exp −
1
2
(z − µ) Γ−1
(z − µ)
What is an independence map G for πpos?
Spantini, Bigoni, & Marzouk MIT 33 / 37
Log-Gaussian Cox process
G
4096-D GMRF prior, Z ∼ N(µ, Γ), Γ−1 specified through + κ2 Id
30 sparse observations at locations i ∈ I, Yi |Zi ∼ Pois( exp Zi )
Posterior density Z|Y ∼ πpos is:
πpos(z) ∝
i∈I
exp[− exp(zi ) + zi · yi ] exp −
1
2
(z − µ) Γ−1
(z − µ)
What is an independence map G for πpos? A 64 × 64 grid.
Spantini, Bigoni, & Marzouk MIT 33 / 37
Log-Gaussian Cox process
Fix πref ∼ N(0, I) and let Tpr push forward πref to πpr (prior map)
Consider the pullback πpos = Tpr πpos and find that
rank Eπref
[ log R ⊗ log R ] = 30 n, R = πpos/πref
Deflate the problem and compute a transport map in 30 dimensions
Change from prior to posterior concentrated in a low-dimensional
subspace (likelihood-informed subspace Cui, Law, M 2014; active
subspace Constantine 2015)
6
7
8
9
10
0
1
2
3
4
5
6
8
9
truth posterior sample posterior mean
Spantini, Bigoni, & Marzouk MIT 34 / 37
Log-Gaussian Cox process
8
9
0.36
0.42
0.48
0.54
0.60
0.66
0.72
0.78
0.84
8
9
0.36
0.42
0.48
0.54
0.60
0.66
0.72
0.78
0.84
(left) E[Z|y], (right) Var[Z|y]. (top) transport; (bottom) MCMC
Excellent match with reference MCMC solution, on a problem of
n = 4096 dimensions
Spantini, Bigoni, & Marzouk MIT 35 / 37
Conclusions
Bayesian inference through the construction of deterministic couplings
Computation of transport maps in high dimensions, leveraging the
structure of the posterior:
1 Decomposability of direct transports
Variational approaches for Bayesian filtering, smoothing, and
sequential parameter inference
2 Low-rank transports
Ongoing work:
Error analysis of variational filtering schemes
Transport-based generalizations of the ensemble Kalman filter
Structure learning for continuous non-Gaussian Markov random fields
[Morrison, Baptista, & M NIPS 2017]
Mapping sparse quadrature or QMC schemes
Approximate Markov properties. . .
Spantini, Bigoni, & Marzouk MIT 36 / 37
References
Main reference for this talk: A. Spantini, D. Bigoni, Y. Marzouk. “Inference via
low-dimensional couplings.” arXiv:1703.06131
R. Morrison, R. Baptista, Y. Marzouk. “Beyond normality: learning sparse
probabilistic graphical models in the non-Gaussian setting.” NIPS 2017.
Y. Marzouk, T. Moselhy, M. Parno, A. Spantini, “An introduction to sampling via
measure transport.” Handbook of Uncertainty Quantification, R. Ghanem, D.
Higdon, H. Owhadi, eds. Springer (2016). arXiv:1602.05023.
(broad introduction to transport maps)
M. Parno, T. Moselhy, Y. Marzouk, “A multiscale strategy for Bayesian inference
using transport maps.” SIAM JUQ, 4: 1160–1190 (2016).
M. Parno, Y. Marzouk, “Transport map accelerated Markov chain Monte Carlo.”
arXiv:1412.5492. SIAM JUQ (to appear)
T. Moselhy, Y. Marzouk, “Bayesian inference with optimal maps.” J. Comp. Phys.,
231: 7815–7850 (2012).
Python code at http://transportmaps.mit.edu
Spantini, Bigoni, & Marzouk MIT 37 / 37

Contenu connexe

Similaire à QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017

The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Discussion of Matti Vihola's talk
Discussion of Matti Vihola's talkDiscussion of Matti Vihola's talk
Discussion of Matti Vihola's talkChristian Robert
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Taiji Suzuki
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...IJRTEMJOURNAL
 
Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...journal ijrtem
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
A Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsA Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsYasmine Anino
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
Fixed point theorem in fuzzy metric space with e.a property
Fixed point theorem in fuzzy metric space with e.a propertyFixed point theorem in fuzzy metric space with e.a property
Fixed point theorem in fuzzy metric space with e.a propertyAlexander Decker
 

Similaire à QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017 (20)

The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Dag in mmhc
Dag in mmhcDag in mmhc
Dag in mmhc
 
Discussion of Matti Vihola's talk
Discussion of Matti Vihola's talkDiscussion of Matti Vihola's talk
Discussion of Matti Vihola's talk
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Smart Multitask Bregman Clustering
Smart Multitask Bregman ClusteringSmart Multitask Bregman Clustering
Smart Multitask Bregman Clustering
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
 
Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...
 
Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...Existence results for fractional q-differential equations with integral and m...
Existence results for fractional q-differential equations with integral and m...
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
A Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium ProblemsA Family Of Extragradient Methods For Solving Equilibrium Problems
A Family Of Extragradient Methods For Solving Equilibrium Problems
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
Fixed point theorem in fuzzy metric space with e.a property
Fixed point theorem in fuzzy metric space with e.a propertyFixed point theorem in fuzzy metric space with e.a property
Fixed point theorem in fuzzy metric space with e.a property
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
 

Plus de The Statistical and Applied Mathematical Sciences Institute

Plus de The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Dernier

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 

Dernier (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017

  • 1. Sequential inference via low-dimensional couplings Youssef Marzouk joint work with Alessio Spantini and Daniele Bigoni Department of Aeronautics and Astronautics Center for Computational Engineering Statistics and Data Science Center Massachusetts Institute of Technology http://uqgroup.mit.edu Support from DOE ASCR, NSF, DARPA 14 December 2017 Spantini, Bigoni, & Marzouk MIT 1 / 37
  • 2. Bayesian inference in large-scale models Observations y Parameters x πpos(x) := π(x|y) ∝ π(y|x)πpr(x) Bayes’ rule Need to characterize the posterior distribution (density πpos) This is a challenging task since: x ∈ Rn is typically high-dimensional (e.g., a discretized function) πpos is non-Gaussian evaluations of πpos may be expensive πpos can be evaluated up to a normalizing constant Spantini, Bigoni, & Marzouk MIT 2 / 37
  • 3. Sequential Bayesian inference State estimation (e.g., filtering and smoothing) or joint state and parameter estimation, in a Bayesian setting Need recursive, online algorithms for characterizing the posterior distribution Spantini, Bigoni, & Marzouk MIT 3 / 37
  • 4. Bayesian filtering and smoothing Nonlinear/non-Gaussian state-space model: Transition density πZk |Zk−1 Observation density (likelihood) πYk |Zk Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN Interested in recursively updating the full Bayesian solution: πZ0:k | y0:k → πZ0:k+1 | y0:k+1 (smoothing) Or focus on approximating the filtering distribution: πZk | y0:k → πZk+1 | y0:k+1 (marginals of the full Bayesian solution) Spantini, Bigoni, & Marzouk MIT 4 / 37
  • 5. Joint state and parameter inference Nonlinear/non-Gaussian state-space model with static params Θ: Transition density πZk |Zk−1,Θ Observation density (likelihood) πYk |Zk Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN Θ Again, interested in recursively updating the full Bayesian solution: πZ0:k ,Θ | y0:k → πZ0:k+1,Θ | y0:k+1 (smoothing + sequential parameter inference) Spantini, Bigoni, & Marzouk MIT 5 / 37
  • 6. Joint state and parameter inference Nonlinear/non-Gaussian state-space model with static params Θ: Transition density πZk |Zk−1,Θ Observation density (likelihood) πYk |Zk Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN Θ Again, interested in recursively updating the full Bayesian solution: πZ0:k ,Θ | y0:k → πZ0:k+1,Θ | y0:k+1 (smoothing + sequential parameter inference) Relate these goals to notions of coupling and transport. . . Spantini, Bigoni, & Marzouk MIT 5 / 37
  • 7. Deterministic couplings of probability measures ⇡(✓) p(r) ˜⇡(r) T(✓) ˜T(✓) ˜ ⇡(✓) p(r) ˜⇡(r) T(✓) ˜T(✓) T η π Core idea Choose a reference distribution η (e.g., standard Gaussian) Seek a transport map T : Rn → Rn such that T η = π Spantini, Bigoni, & Marzouk MIT 6 / 37
  • 8. Deterministic couplings of probability measures ⇡(✓) p(r) ˜⇡(r) T(✓) ˜T(✓) ˜ ⇡(✓) p(r) ˜⇡(r) T(✓) ˜T(✓) S =T−1 η π Core idea Choose a reference distribution η (e.g., standard Gaussian) Seek a transport map T : Rn → Rn such that T η = π Equivalently, find S = T−1 such that S π = η Spantini, Bigoni, & Marzouk MIT 6 / 37
  • 9. Deterministic couplings of probability measures ⇡(✓) p(r) ˜⇡(r) T(✓) ˜T(✓) ˜ ⇡(✓) p(r) ˜⇡(r) T(✓) ˜T(✓) S =T−1 η π Core idea Choose a reference distribution η (e.g., standard Gaussian) Seek a transport map T : Rn → Rn such that T η = π Equivalently, find S = T−1 such that S π = η Might be content satisfying these conditions only approximately. . . Spantini, Bigoni, & Marzouk MIT 6 / 37
  • 10. Choice of transport map A useful building block is the Knothe-Rosenblatt rearrangement: T(x) =      T1(x1) T2(x1, x2) ... Tn(x1, x2, . . . , xn)      Exists and is unique (up to ordering) under mild conditions on η, π Jacobian determinant easy to evaluate Inverse map S = T−1 also lower triangular “Exposes” marginals, enables conditional sampling. . . Spantini, Bigoni, & Marzouk MIT 7 / 37
  • 11. Choice of transport map A useful building block is the Knothe-Rosenblatt rearrangement: T(x) =      T1(x1) T2(x1, x2) ... Tn(x1, x2, . . . , xn)      Exists and is unique (up to ordering) under mild conditions on η, π Jacobian determinant easy to evaluate Inverse map S = T−1 also lower triangular “Exposes” marginals, enables conditional sampling. . . Numerical approximations can employ a monotone parameterization, guaranteeing ∂xk Tk > 0 for arbitrary ak , bk Tk (x1, . . . , xk ) = ak (x1, . . . , xk−1)+ xk 0 exp(bk (x1, . . . , xk−1, w)) dw Spantini, Bigoni, & Marzouk MIT 7 / 37
  • 12. Variational inference Variational characterization of the direct map T [Moselhy & M 2012]: min T∈T DKL( T η || π ) = min T∈T DKL( η || T−1 π ) T is the set of monotone lower triangular maps Contains the Knothe-Rosenblatt rearrangement Expectation is with respect to the reference measure η Compute via, e.g., Monte Carlo, QMC, quadrature Use unnormalized evaluations of π and its gradients No MCMC or importance sampling Spantini, Bigoni, & Marzouk MIT 8 / 37
  • 13. Simple example min T Eη[ − log π ◦ T − k log ∂xk Tk ] Parameterized map T ∈ T h ⊂ T Optimize over coefficients of functions (ak )n k=1, (bk )n k=1 Use gradient-based optimization Approximate Eη[g] ≈ i wi g(xi ) The posterior is in the tail of the reference 3 0 3 3 0 3 6 9 12 15 18 Spantini, Bigoni, & Marzouk MIT 9 / 37
  • 14. Simple example min T Eη[ − log π ◦ T − k log ∂xk Tk ] Parameterized map T ∈ T h ⊂ T Optimize over coefficients of functions (ak )n k=1, (bk )n k=1 Use gradient-based optimization Approximate Eη[g] ≈ i wi g(xi ) The posterior is in the tail of the reference 3 0 3 3 0 3 6 9 12 15 18 Spantini, Bigoni, & Marzouk MIT 9 / 37
  • 15. Simple example min T Eη[ − log π ◦ T − k log ∂xk Tk ] Parameterized map T ∈ T h ⊂ T Optimize over coefficients of functions (ak )n k=1, (bk )n k=1 Use gradient-based optimization Approximate Eη[g] ≈ i wi g(xi ) The posterior is in the tail of the reference 3 0 3 3 0 3 6 9 12 15 18 Spantini, Bigoni, & Marzouk MIT 9 / 37
  • 16. Simple example min T Eη[ − log π ◦ T − k log ∂xk Tk ] Parameterized map T ∈ T h ⊂ T Optimize over coefficients of functions (ak )n k=1, (bk )n k=1 Use gradient-based optimization Approximate Eη[g] ≈ i wi g(xi ) The posterior is in the tail of the reference 3 0 3 3 0 3 6 9 12 15 18 Spantini, Bigoni, & Marzouk MIT 9 / 37
  • 17. Simple example min T Eη[ − log π ◦ T − k log ∂xk Tk ] Parameterized map T ∈ T h ⊂ T Optimize over coefficients of functions (ak )n k=1, (bk )n k=1 Use gradient-based optimization Approximate Eη[g] ≈ i wi g(xi ) The posterior is in the tail of the reference 3 0 3 3 0 3 6 9 12 15 18 Spantini, Bigoni, & Marzouk MIT 9 / 37
  • 18. Useful features Move samples; don’t just reweigh them Independent and cheap samples: xi ∼ η ⇒ T(xi ) Clear convergence criterion, even with unnormalized target density: DKL( T η || π ) ≈ 1 2 Varη log η T−1 ¯π Spantini, Bigoni, & Marzouk MIT 10 / 37
  • 19. Useful features Move samples; don’t just reweigh them Independent and cheap samples: xi ∼ η ⇒ T(xi ) Clear convergence criterion, even with unnormalized target density: DKL( T η || π ) ≈ 1 2 Varη log η T−1 ¯π Can either accept bias or reduce it by: Increasing the complexity of the map T Sampling the pullback T−1 π using MCMC [Parno & M 2015] or importance sampling Spantini, Bigoni, & Marzouk MIT 10 / 37
  • 20. Useful features Move samples; don’t just reweigh them Independent and cheap samples: xi ∼ η ⇒ T(xi ) Clear convergence criterion, even with unnormalized target density: DKL( T η || π ) ≈ 1 2 Varη log η T−1 ¯π Can either accept bias or reduce it by: Increasing the complexity of the map T Sampling the pullback T−1 π using MCMC [Parno & M 2015] or importance sampling The KR parameterization is just one of many variational constructions for continuous transport (cf. Stein variational gradient descent [Liu & Wang 2016], normalizing flows [Rezende & Mohamed 2015], Gibbs flows [Heng et al. 2015], particle flow filter [Reich 2011], implicit sampling [Chorin et al. 2009–2015]) Spantini, Bigoni, & Marzouk MIT 10 / 37
  • 21. Low-dimensional structure Key challenge: maps in high dimensions Major bottleneck: representation of the map, e.g., cardinality of the map basis How to make the construction/representation of high-dimensional transports tractable? Spantini, Bigoni, & Marzouk MIT 11 / 37
  • 22. Low-dimensional structure Key challenge: maps in high dimensions Major bottleneck: representation of the map, e.g., cardinality of the map basis How to make the construction/representation of high-dimensional transports tractable? Main idea: exploit Markov structure of the target distribution Leads to various low-dimensional properties of transport maps: 1 Decomposability 2 Sparsity 3 Low rank Spantini, Bigoni, & Marzouk MIT 11 / 37
  • 23. Markov random fields Let Z1, . . . , Zn be random variables with joint density π > 0 A BS(i, j) /∈ E iff Zi ⊥⊥ Zj | ZV{i,j} G encodes conditional independence (an I-map for π) Choice of the probabilistic model =⇒ graphical structure Spantini, Bigoni, & Marzouk MIT 12 / 37
  • 24. Decomposable transport maps Definition: a decomposable transport is a map T = T1 ◦ · · · ◦ Tk that factorizes as the composition of finitely many maps of low effective dimension that are triangular (up to a permutation), e.g., T(x) =         A1(x1, x2, x3) B1(x2, x3) C1(x3) x4 x5 x6         T1 ◦         x1 A2(x2, x3, x4, x5) B2(x3, x4, x5) C2(x4, x5) D2(x5) x6         T2 ◦         x1 x2 x3 A3(x4) B3(x4, x5) C3(x4, x5, x6)         T3 Theorem: [Spantini et al. (2017)] Decomposable graphical models for π lead to decomposable direct maps T, provided that η(x) = i η(xi ) Spantini, Bigoni, & Marzouk MIT 13 / 37
  • 25. Graph decomposition (generic example) A BS For a given decomposition (A, S, B), consider the transport map T1(x) = TS∪A(xS∪A) xB =   h(xS, xA) g(xS) xB   such that the submap TS∪A satisfies ηXS∪A TS∪A −→ πXS∪A What can we say about the pullback density T1π ? Spantini, Bigoni, & Marzouk MIT 13 / 37
  • 26. Graph decomposition (generic example) A BS Figure: I-map for the pullback of π through T1 Just remove any edge incident to any node in A T1 is essentially a 3-D map Pulling back π through T1 makes XA independent of XS∪B! Spantini, Bigoni, & Marzouk MIT 13 / 37
  • 27. Graph decomposition (generic example) A A BS Figure: I-map for the pullback of π through T1 Recursion at step k 1 Consider a new decomposition (A, S, B) 2 Compute transport Tk 3 Pullback by Tk Spantini, Bigoni, & Marzouk MIT 13 / 37
  • 28. Graph decomposition (generic example) A A BS Figure: I-map for the pullback of π through T1 ◦ T2 Recursion at step k 1 Consider a new decomposition (A, S, B) 2 Compute transport Tk 3 Pullback by Tk Spantini, Bigoni, & Marzouk MIT 13 / 37
  • 29. Graph decomposition (generic example) A BS A A (right) I-map for the pullback of π through T T(x) =         A1(x1, x2, x3) B1(x2, x3) C1(x3) x4 x5 x6         T1 ◦         x1 A2(x2, x3, x4, x5) B2(x3, x4, x5) C2(x4, x5) D2(x5) x6         T2 ◦         x1 x2 x3 A3(x4) B3(x4, x5) C3(x4, x5, x6)         T3 Spantini, Bigoni, & Marzouk MIT 13 / 37
  • 30. Application to state-space models (chain graph) Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN X0 X1 X2 X3 XN Compute M0 : R2n → R2n s.t. M0(x0, x1) = A0(x0, x1) B0(x1) Reference: ηX0 ηX1 Target: πZ0 πZ1|Z0 πY0|Z0 πY1|Z1 B0()) dim(M0) 2 × dim(Z0) T0(x) =              A0(x0, x1) B0(x1) x2 x3 x4 x5 ... xN              Spantini, Bigoni, & Marzouk MIT 14 / 37
  • 31. Second step: compute another 2-D map Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN X0 X1 X2 X3 XN Compute M1 : R2n → R2n s.t. M1(x1, x2) = A1(x1, x2) B1(x2) Reference: ηX1 ηX2 Target: ηX1 πY2|Z2 πZ2|Z1 (·| B0 (·) ) Uses only one component of M0 () T1(x) =              x0 A1(x1, x2) B1(x2) x3 x4 x5 ... xN              Spantini, Bigoni, & Marzouk MIT 15 / 37
  • 32. Proceed recursively forward in time Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN X0 X1 X2 X3 XN Compute M2 : R2n → R2n s.t. M2(x2, x3) = A2(x2, x3) B2(x3) Reference: ηX2 ηX3 Target: ηX2 πY3|Z3 πZ3|Z2 (·| B1 (·) ) Uses only one component of M1 () T2(x) =              x0 x1 A2(x2, x3) B2(x3) x4 x5 ... xN              Spantini, Bigoni, & Marzouk MIT 16 / 37
  • 33. A decomposition theorem for chains Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN X0 X1 X2 X3 XN Theorem. 1 (Bk ) ηXk+1 = πZk+1 | Y0:k+1 (filtering) 2 (Mk ) ηXk:k+1 πZk ,Zk+1 | Y0:k+1 (lag-1 smoothing) 3 (T1 ◦ · · · ◦ Tk ) ηX0:k+1 = πZ0:k+1| Y0:k+1 (full Bayesian solution) Spantini, Bigoni, & Marzouk MIT 17 / 37
  • 34. A nested decomposable map Tk = T0 ◦ T1 ◦ · · · ◦ Tk characterizes the joint dist πZ0:k+1| Y0:k+1 Tk+1(x) =              A0(x0, x1) B0(x1) x2 x3 x4 x5 ... xN              T0 ◦              x0 A1(x1, x2) B1(x2) x3 x4 x5 ... xN              T1 ◦              x0 x1 A2(x2, x3) B2(x3) x4 x5 ... xN              T2 ◦ · · · Trivial to go from Tk to Tk+1: just append a new map Tk+1 No need to recompute T0, . . . , Tk (nested transports) Tk is dense and high-dimensional but decomposable Spantini, Bigoni, & Marzouk MIT 18 / 37
  • 35. A nested decomposable map Tk = T0 ◦ T1 ◦ · · · ◦ Tk characterizes the joint dist πZ0:k+1| Y0:k+1 Tk+1(x) =              A0(x0, x1) B0(x1) x2 x3 x4 x5 ... xN              T0 ◦              x0 A1(x1, x2) B1(x2) x3 x4 x5 ... xN              T1 ◦              x0 x1 A2(x2, x3) B2(x3) x4 x5 ... xN              T2 ◦ · · · Trivial to go from Tk to Tk+1: just append a new map Tk+1 No need to recompute T0, . . . , Tk (nested transports) Tk is dense and high-dimensional but decomposable Spantini, Bigoni, & Marzouk MIT 18 / 37
  • 36. A single-pass algorithm on the model Meta-algorithm: 1 Compute the maps M0, M1, . . ., each of dimension 2 × dim(Z0) 2 Embed each Mj into an identity map to form Tj 3 Evaluate T0 ◦ · · · ◦ Tk for the full Bayesian solution Remarks: A single pass on the state-space model Non-Gaussian generalization of the Rauch-Tung-Striebel smoother Bias is only due to the numerical approximation of each map Mi Can either accept the bias or reduce it by: Increasing the complexity of each map Mi , or Computing weights given by the proposal density (T0 ◦ T1 ◦ · · · ◦ Tk ) ηX0:k+1 The cost of evaluating smoothing weights grows linearly with time Spantini, Bigoni, & Marzouk MIT 19 / 37
  • 37. Joint parameter/state estimation Generalize to sequential joint parameter/state estimation Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN Θ (T0 ◦ · · · ◦ Tk ) ηΘ ηX0:k+1 = πΘ ,Z0:k+1 | Y0:k+1 (full Bayesian solution) Now dim(Mj ) = 2 × dim(Zj ) + dim(Θ) Remarks: No artificial dynamic for the static parameters No a priori fixed-lag smoothing approximation Spantini, Bigoni, & Marzouk MIT 20 / 37
  • 38. Example: stochastic volatility model Build the decomposition recursively T = Id◦T1◦TN−1 A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Start with the identity map Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 39. Stochastic volatility model Build the decomposition recursively T = Id◦T1◦TN−1 A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Find a good first decomposition of G Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 40. Stochastic volatility model Build the decomposition recursively T = T0◦Id◦TN−1 A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Compute an (essentially) 4-D T0 and pull back π Underlying approximation of µ, φ, Z1|Y1 Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 41. Stochastic volatility model Build the decomposition recursively T = T0◦Id◦TN−1 A A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Find a new decomposition Underlying approximation of µ, φ, Z1|Y1 Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 42. Stochastic volatility model Build the decomposition recursively T = T0 ◦ T1◦Id◦TN−1 A A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Compute an (essentially) 4-D T1 and pull back π Underlying approximation of µ, φ, Z1:2|Y1:2 Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 43. Stochastic volatility model Build the decomposition recursively T = T0 ◦ T1◦Id◦TN−1 A A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Continue the recursion until no edges are left. . . Underlying approximation of µ, φ, Z1:2|Y1:2 Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 44. Stochastic volatility model Build the decomposition recursively T = T0 ◦ T1 ◦ T2◦Id ◦ TN−1 A A BS 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Continue the recursion until no edges are left. . . Underlying approximation of µ, φ, Z1:3|Y1:3 Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 45. Stochastic volatility model Build the decomposition recursively T = T0 ◦ T1 ◦ T2 ◦ · · · ◦ TN−3◦Id A A 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Continue the recursion until no edges are left. . . Underlying approximation of µ, φ, Z1:N−1|Y1:N−1 Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 46. Stochastic volatility model Build the decomposition recursively T = T0 ◦ T1 ◦ T2 ◦ · · · ◦ TN−3 ◦ TN−2◦Id A A 1 2 3 4 N µ φ Figure: Markov structure for the pullback of π through T Each map Tk is essentially 4-D regardless of N Underlying approximation of µ, φ, Z1:N|Y1:N Spantini, Bigoni, & Marzouk MIT 21 / 37
  • 47. Another decomposable map Tk+1(x) =              P0(xθ) A0(xθ, x0, x1) B0(xθ, x1) x2 x3 x4 ... xN              T0 ◦              P1(xθ) x0 A1(xθ, x1, x2) B1(xθ, x2) x3 x4 ... xN              T1 ◦              P2(xθ) x0 x1 A2(xθ, x2, x3) B2(xθ, x3) x4 ... xN              T2 ◦· · · (P0 ◦ · · · ◦ Pk ) ηΘ = πΘ | Y0:k+1 (parameter inference) If Pk = P0 ◦ · · · ◦ Pk , then Pk can be computed recursively as Pk = Pk−1 ◦ Pk =⇒ cost of evaluating Pk does not grow with k Spantini, Bigoni, & Marzouk MIT 22 / 37
  • 48. Example: stochastic volatility model Stochastic volatility model: Latent log-volatilities take the form of an AR(1) process for t = 1, . . . , N: Zt+1 = µ + φ (Zt − µ) + ηt, ηt ∼ N(0, 1), Z1 ∼ N(0, 1/1 − φ2 ) Observe the mean return for holding an asset at time t Yt = εt exp( 0.5 Zt ), εt ∼ N(0, 1), t = 1, . . . , N Markov structure for π ∼ µ, φ, Z1:N|Y1:N is given by: A BS 1 2 3 4 N µ φ Spantini, Bigoni, & Marzouk MIT 23 / 37
  • 49. Stochastic volatility example 0 100 200 300 400 500 time 3 2 1 0 1 Zt Infer log-volatility of the pound/dollar exchange rate, starting on 1 October 1981 Filtering (blue) versus smoothing (red) marginals Spantini, Bigoni, & Marzouk MIT 24 / 37
  • 50. Stochastic volatility example 0 200 400 600 800 time 3 2 1 0 1 2 Zt|Y0:N Quantiles of smoothing marginals: map (red) compared to MCMC (black) Spantini, Bigoni, & Marzouk MIT 25 / 37
  • 51. Stochastic volatility example time 0 200 400 600 800 2.01.51.00.50.00.51.01.5 |Y0:t 0.0 0.5 1.0 1.5 2.0 2.5 Sequential parameter estimation: marginals of hyperparameter µ, conditioning on successively more observations y0:k Map (solid lines) versus MCMC (dashed) Spantini, Bigoni, & Marzouk MIT 26 / 37
  • 52. Stochastic volatility example time 0 200 400 600 800 0.4 0.5 0.6 0.7 0.80.91.0 |Y0:t 0 5 10 15 20 25 30 Sequential parameter estimation: marginals of hyperparameter φ, conditioning on successively more observations y0:k Map (solid lines) versus MCMC (dashed) Spantini, Bigoni, & Marzouk MIT 27 / 37
  • 53. Stochastic volatility example 0 200 400 600 800 time 3 2 1 0 1 2 3 4 Yy|Y0:N data Posterior predictive distribution (shaded red) versus the data (dots) Spantini, Bigoni, & Marzouk MIT 28 / 37
  • 54. Stochastic volatility example 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 time 3 2 1 0 1 2 Zt|Y0:N Long-time smoothing: map (red) compared to MCMC (black) Observe fall 2008 financial crisis, Brexit, . . . Spantini, Bigoni, & Marzouk MIT 29 / 37
  • 55. Stochastic volatility example Variance diagnostic Varη[log(η/T−1 ¯π)] values, for a 947-dimensional target π (smoothing and parameter estimation for 945 days) : Laplace map = 5.68; linear maps = 1.49; degree ≤ 7 maps = 0.11 Important open question: how does error in the approximation of the parameter posterior evolve over time? Spantini, Bigoni, & Marzouk MIT 30 / 37
  • 56. Beyond the Markov properties of π Alternative source of low dimensionality: low-rank structure Example: fix target π to be the posterior density of a Bayesian inference problem, π(z) := πpos(z) ∝ πY|Z(y | z) πZ(z) Let Tpr push forward the reference η to the prior πZ (prior map) Define πpos(z) := (T−1 pr ) πpos(z) ∝ πY|Z(y |Tpr(z) ) η(z) Theorem [Graph decoupling] If η = i ηXi and rank Eη [ log R ⊗ log R ] = k, R = πpos/η = πY|Z ◦ Tpr then there exists a rotation Q such that: Q−1 πpos(z) = g( z1, ..., zk ) n i>k ηXi (zi ) Spantini, Bigoni, & Marzouk MIT 31 / 37
  • 57. Changing the Markov structure. . . The pullback has a different Markov structure: Q−1 πpos(z) = g( z1, ..., zk ) n i>k ηXi (zi ) G G Pullback Corollary: There exists a transport T η = Q−1 πpos of the form T(x) = [ g(x1:k ), xk+1, . . . , xn ], where g : Rk → Rk . The composition Tpr ◦ Q ◦ T pushes forward η to πpos Why low rank structure? For example, few data-informed directions. Spantini, Bigoni, & Marzouk MIT 32 / 37
  • 58. Log-Gaussian Cox process 6 7 8 9 10 0 1 2 3 4 5 6 4096-D GMRF prior, Z ∼ N(µ, Γ), Γ−1 specified through + κ2 Id 30 sparse observations at locations i ∈ I, Yi |Zi ∼ Pois( exp Zi ) Posterior density Z|Y ∼ πpos is: πpos(z) ∝ i∈I exp[− exp(zi ) + zi · yi ] exp − 1 2 (z − µ) Γ−1 (z − µ) What is an independence map G for πpos? Spantini, Bigoni, & Marzouk MIT 33 / 37
  • 59. Log-Gaussian Cox process G 4096-D GMRF prior, Z ∼ N(µ, Γ), Γ−1 specified through + κ2 Id 30 sparse observations at locations i ∈ I, Yi |Zi ∼ Pois( exp Zi ) Posterior density Z|Y ∼ πpos is: πpos(z) ∝ i∈I exp[− exp(zi ) + zi · yi ] exp − 1 2 (z − µ) Γ−1 (z − µ) What is an independence map G for πpos? A 64 × 64 grid. Spantini, Bigoni, & Marzouk MIT 33 / 37
  • 60. Log-Gaussian Cox process Fix πref ∼ N(0, I) and let Tpr push forward πref to πpr (prior map) Consider the pullback πpos = Tpr πpos and find that rank Eπref [ log R ⊗ log R ] = 30 n, R = πpos/πref Deflate the problem and compute a transport map in 30 dimensions Change from prior to posterior concentrated in a low-dimensional subspace (likelihood-informed subspace Cui, Law, M 2014; active subspace Constantine 2015) 6 7 8 9 10 0 1 2 3 4 5 6 8 9 truth posterior sample posterior mean Spantini, Bigoni, & Marzouk MIT 34 / 37
  • 61. Log-Gaussian Cox process 8 9 0.36 0.42 0.48 0.54 0.60 0.66 0.72 0.78 0.84 8 9 0.36 0.42 0.48 0.54 0.60 0.66 0.72 0.78 0.84 (left) E[Z|y], (right) Var[Z|y]. (top) transport; (bottom) MCMC Excellent match with reference MCMC solution, on a problem of n = 4096 dimensions Spantini, Bigoni, & Marzouk MIT 35 / 37
  • 62. Conclusions Bayesian inference through the construction of deterministic couplings Computation of transport maps in high dimensions, leveraging the structure of the posterior: 1 Decomposability of direct transports Variational approaches for Bayesian filtering, smoothing, and sequential parameter inference 2 Low-rank transports Ongoing work: Error analysis of variational filtering schemes Transport-based generalizations of the ensemble Kalman filter Structure learning for continuous non-Gaussian Markov random fields [Morrison, Baptista, & M NIPS 2017] Mapping sparse quadrature or QMC schemes Approximate Markov properties. . . Spantini, Bigoni, & Marzouk MIT 36 / 37
  • 63. References Main reference for this talk: A. Spantini, D. Bigoni, Y. Marzouk. “Inference via low-dimensional couplings.” arXiv:1703.06131 R. Morrison, R. Baptista, Y. Marzouk. “Beyond normality: learning sparse probabilistic graphical models in the non-Gaussian setting.” NIPS 2017. Y. Marzouk, T. Moselhy, M. Parno, A. Spantini, “An introduction to sampling via measure transport.” Handbook of Uncertainty Quantification, R. Ghanem, D. Higdon, H. Owhadi, eds. Springer (2016). arXiv:1602.05023. (broad introduction to transport maps) M. Parno, T. Moselhy, Y. Marzouk, “A multiscale strategy for Bayesian inference using transport maps.” SIAM JUQ, 4: 1160–1190 (2016). M. Parno, Y. Marzouk, “Transport map accelerated Markov chain Monte Carlo.” arXiv:1412.5492. SIAM JUQ (to appear) T. Moselhy, Y. Marzouk, “Bayesian inference with optimal maps.” J. Comp. Phys., 231: 7815–7850 (2012). Python code at http://transportmaps.mit.edu Spantini, Bigoni, & Marzouk MIT 37 / 37