QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017
Similaire à QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017
Similaire à QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017 (20)
social pharmacy d-pharm 1st year by Pragati K. Mahajan
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Sequential Inference via Low-dimensional Couplings - Youssef Marzouk, Dec 14, 2017
1. Sequential inference via low-dimensional
couplings
Youssef Marzouk
joint work with Alessio Spantini and Daniele Bigoni
Department of Aeronautics and Astronautics
Center for Computational Engineering
Statistics and Data Science Center
Massachusetts Institute of Technology
http://uqgroup.mit.edu
Support from DOE ASCR, NSF, DARPA
14 December 2017
Spantini, Bigoni, & Marzouk MIT 1 / 37
2. Bayesian inference in large-scale models
Observations y Parameters x
πpos(x) := π(x|y) ∝ π(y|x)πpr(x)
Bayes’ rule
Need to characterize the posterior distribution (density πpos)
This is a challenging task since:
x ∈ Rn
is typically high-dimensional (e.g., a discretized function)
πpos is non-Gaussian
evaluations of πpos may be expensive
πpos can be evaluated up to a normalizing constant
Spantini, Bigoni, & Marzouk MIT 2 / 37
3. Sequential Bayesian inference
State estimation (e.g., filtering and smoothing) or joint state and
parameter estimation, in a Bayesian setting
Need recursive, online algorithms for characterizing the posterior
distribution
Spantini, Bigoni, & Marzouk MIT 3 / 37
4. Bayesian filtering and smoothing
Nonlinear/non-Gaussian state-space model:
Transition density πZk |Zk−1
Observation density (likelihood) πYk |Zk
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Interested in recursively updating the full Bayesian solution:
πZ0:k | y0:k
→ πZ0:k+1 | y0:k+1
(smoothing)
Or focus on approximating the filtering distribution:
πZk | y0:k
→ πZk+1 | y0:k+1
(marginals of the full Bayesian solution)
Spantini, Bigoni, & Marzouk MIT 4 / 37
5. Joint state and parameter inference
Nonlinear/non-Gaussian state-space model with static params Θ:
Transition density πZk |Zk−1,Θ
Observation density (likelihood) πYk |Zk
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Θ
Again, interested in recursively updating the full Bayesian solution:
πZ0:k ,Θ | y0:k
→ πZ0:k+1,Θ | y0:k+1
(smoothing + sequential parameter inference)
Spantini, Bigoni, & Marzouk MIT 5 / 37
6. Joint state and parameter inference
Nonlinear/non-Gaussian state-space model with static params Θ:
Transition density πZk |Zk−1,Θ
Observation density (likelihood) πYk |Zk
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Θ
Again, interested in recursively updating the full Bayesian solution:
πZ0:k ,Θ | y0:k
→ πZ0:k+1,Θ | y0:k+1
(smoothing + sequential parameter inference)
Relate these goals to notions of coupling and transport. . .
Spantini, Bigoni, & Marzouk MIT 5 / 37
7. Deterministic couplings of probability measures
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
˜
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
T
η π
Core idea
Choose a reference distribution η (e.g., standard Gaussian)
Seek a transport map T : Rn → Rn such that T η = π
Spantini, Bigoni, & Marzouk MIT 6 / 37
8. Deterministic couplings of probability measures
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
˜
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
S =T−1
η π
Core idea
Choose a reference distribution η (e.g., standard Gaussian)
Seek a transport map T : Rn → Rn such that T η = π
Equivalently, find S = T−1 such that S π = η
Spantini, Bigoni, & Marzouk MIT 6 / 37
9. Deterministic couplings of probability measures
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
˜
⇡(✓) p(r)
˜⇡(r)
T(✓)
˜T(✓)
S =T−1
η π
Core idea
Choose a reference distribution η (e.g., standard Gaussian)
Seek a transport map T : Rn → Rn such that T η = π
Equivalently, find S = T−1 such that S π = η
Might be content satisfying these conditions only approximately. . .
Spantini, Bigoni, & Marzouk MIT 6 / 37
10. Choice of transport map
A useful building block is the Knothe-Rosenblatt rearrangement:
T(x) =
T1(x1)
T2(x1, x2)
...
Tn(x1, x2, . . . , xn)
Exists and is unique (up to ordering) under mild conditions on η, π
Jacobian determinant easy to evaluate
Inverse map S = T−1 also lower triangular
“Exposes” marginals, enables conditional sampling. . .
Spantini, Bigoni, & Marzouk MIT 7 / 37
11. Choice of transport map
A useful building block is the Knothe-Rosenblatt rearrangement:
T(x) =
T1(x1)
T2(x1, x2)
...
Tn(x1, x2, . . . , xn)
Exists and is unique (up to ordering) under mild conditions on η, π
Jacobian determinant easy to evaluate
Inverse map S = T−1 also lower triangular
“Exposes” marginals, enables conditional sampling. . .
Numerical approximations can employ a monotone parameterization,
guaranteeing ∂xk
Tk > 0 for arbitrary ak , bk
Tk
(x1, . . . , xk ) = ak (x1, . . . , xk−1)+
xk
0
exp(bk (x1, . . . , xk−1, w)) dw
Spantini, Bigoni, & Marzouk MIT 7 / 37
12. Variational inference
Variational characterization of the direct map T [Moselhy & M 2012]:
min
T∈T
DKL( T η || π ) = min
T∈T
DKL( η || T−1
π )
T is the set of monotone lower triangular maps
Contains the Knothe-Rosenblatt rearrangement
Expectation is with respect to the reference measure η
Compute via, e.g., Monte Carlo, QMC, quadrature
Use unnormalized evaluations of π and its gradients
No MCMC or importance sampling
Spantini, Bigoni, & Marzouk MIT 8 / 37
13. Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
14. Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
15. Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
16. Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
17. Simple example
min
T
Eη[ − log π ◦ T −
k
log ∂xk
Tk
]
Parameterized map T ∈ T h ⊂ T
Optimize over coefficients of functions
(ak )n
k=1, (bk )n
k=1
Use gradient-based optimization
Approximate Eη[g] ≈ i wi g(xi )
The posterior is in the tail of the reference
3 0 3
3
0
3
6
9
12
15
18
Spantini, Bigoni, & Marzouk MIT 9 / 37
18. Useful features
Move samples; don’t just reweigh them
Independent and cheap samples: xi ∼ η ⇒ T(xi )
Clear convergence criterion, even with unnormalized target density:
DKL( T η || π ) ≈
1
2
Varη log
η
T−1
¯π
Spantini, Bigoni, & Marzouk MIT 10 / 37
19. Useful features
Move samples; don’t just reweigh them
Independent and cheap samples: xi ∼ η ⇒ T(xi )
Clear convergence criterion, even with unnormalized target density:
DKL( T η || π ) ≈
1
2
Varη log
η
T−1
¯π
Can either accept bias or reduce it by:
Increasing the complexity of the map T
Sampling the pullback T−1
π using MCMC [Parno & M 2015] or
importance sampling
Spantini, Bigoni, & Marzouk MIT 10 / 37
20. Useful features
Move samples; don’t just reweigh them
Independent and cheap samples: xi ∼ η ⇒ T(xi )
Clear convergence criterion, even with unnormalized target density:
DKL( T η || π ) ≈
1
2
Varη log
η
T−1
¯π
Can either accept bias or reduce it by:
Increasing the complexity of the map T
Sampling the pullback T−1
π using MCMC [Parno & M 2015] or
importance sampling
The KR parameterization is just one of many variational
constructions for continuous transport (cf. Stein variational gradient
descent [Liu & Wang 2016], normalizing flows [Rezende & Mohamed 2015],
Gibbs flows [Heng et al. 2015], particle flow filter [Reich 2011], implicit
sampling [Chorin et al. 2009–2015])
Spantini, Bigoni, & Marzouk MIT 10 / 37
21. Low-dimensional structure
Key challenge: maps in high dimensions
Major bottleneck: representation of the map, e.g., cardinality of the
map basis
How to make the construction/representation of high-dimensional
transports tractable?
Spantini, Bigoni, & Marzouk MIT 11 / 37
22. Low-dimensional structure
Key challenge: maps in high dimensions
Major bottleneck: representation of the map, e.g., cardinality of the
map basis
How to make the construction/representation of high-dimensional
transports tractable?
Main idea: exploit Markov structure of the target distribution
Leads to various low-dimensional properties of transport maps:
1 Decomposability
2 Sparsity
3 Low rank
Spantini, Bigoni, & Marzouk MIT 11 / 37
23. Markov random fields
Let Z1, . . . , Zn be random variables with joint density π > 0
A BS(i, j) /∈ E iff Zi ⊥⊥ Zj | ZV{i,j}
G encodes conditional independence (an I-map for π)
Choice of the probabilistic model =⇒ graphical structure
Spantini, Bigoni, & Marzouk MIT 12 / 37
24. Decomposable transport maps
Definition: a decomposable transport is a map T = T1 ◦ · · · ◦ Tk
that factorizes as the composition of finitely many maps of low
effective dimension that are triangular (up to a permutation), e.g.,
T(x) =
A1(x1, x2, x3)
B1(x2, x3)
C1(x3)
x4
x5
x6
T1
◦
x1
A2(x2, x3, x4, x5)
B2(x3, x4, x5)
C2(x4, x5)
D2(x5)
x6
T2
◦
x1
x2
x3
A3(x4)
B3(x4, x5)
C3(x4, x5, x6)
T3
Theorem: [Spantini et al. (2017)] Decomposable graphical models for π
lead to decomposable direct maps T, provided that η(x) = i η(xi )
Spantini, Bigoni, & Marzouk MIT 13 / 37
25. Graph decomposition (generic example)
A BS
For a given decomposition (A, S, B), consider the transport map
T1(x) =
TS∪A(xS∪A)
xB
=
h(xS, xA)
g(xS)
xB
such that the submap TS∪A satisfies ηXS∪A
TS∪A
−→ πXS∪A
What can we say about the pullback density T1π ?
Spantini, Bigoni, & Marzouk MIT 13 / 37
26. Graph decomposition (generic example)
A BS
Figure: I-map for the pullback of π through T1
Just remove any edge incident to any node in A
T1 is essentially a 3-D map
Pulling back π through T1 makes XA independent of XS∪B!
Spantini, Bigoni, & Marzouk MIT 13 / 37
27. Graph decomposition (generic example)
A A BS
Figure: I-map for the pullback of π through T1
Recursion at step k
1 Consider a new decomposition (A, S, B)
2 Compute transport Tk
3 Pullback by Tk
Spantini, Bigoni, & Marzouk MIT 13 / 37
28. Graph decomposition (generic example)
A A BS
Figure: I-map for the pullback of π through T1 ◦ T2
Recursion at step k
1 Consider a new decomposition (A, S, B)
2 Compute transport Tk
3 Pullback by Tk
Spantini, Bigoni, & Marzouk MIT 13 / 37
36. A single-pass algorithm on the model
Meta-algorithm:
1 Compute the maps M0, M1, . . ., each of dimension 2 × dim(Z0)
2 Embed each Mj into an identity map to form Tj
3 Evaluate T0 ◦ · · · ◦ Tk for the full Bayesian solution
Remarks:
A single pass on the state-space model
Non-Gaussian generalization of the Rauch-Tung-Striebel smoother
Bias is only due to the numerical approximation of each map Mi
Can either accept the bias or reduce it by:
Increasing the complexity of each map Mi , or
Computing weights given by the proposal density
(T0 ◦ T1 ◦ · · · ◦ Tk ) ηX0:k+1
The cost of evaluating smoothing weights grows linearly with time
Spantini, Bigoni, & Marzouk MIT 19 / 37
37. Joint parameter/state estimation
Generalize to sequential joint parameter/state estimation
Z0 Z1 Z2 Z3 ZN
Y0 Y1 Y2 Y3 YN
Θ
(T0 ◦ · · · ◦ Tk ) ηΘ ηX0:k+1
= πΘ ,Z0:k+1 | Y0:k+1
(full Bayesian solution)
Now dim(Mj ) = 2 × dim(Zj ) + dim(Θ)
Remarks:
No artificial dynamic for the static parameters
No a priori fixed-lag smoothing approximation
Spantini, Bigoni, & Marzouk MIT 20 / 37
38. Example: stochastic volatility model
Build the decomposition recursively
T = Id◦T1◦TN−1
A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Start with the identity map
Spantini, Bigoni, & Marzouk MIT 21 / 37
39. Stochastic volatility model
Build the decomposition recursively
T = Id◦T1◦TN−1
A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Find a good first decomposition of G
Spantini, Bigoni, & Marzouk MIT 21 / 37
40. Stochastic volatility model
Build the decomposition recursively
T = T0◦Id◦TN−1
A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Compute an (essentially) 4-D T0 and pull back π
Underlying approximation of µ, φ, Z1|Y1
Spantini, Bigoni, & Marzouk MIT 21 / 37
41. Stochastic volatility model
Build the decomposition recursively
T = T0◦Id◦TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Find a new decomposition
Underlying approximation of µ, φ, Z1|Y1
Spantini, Bigoni, & Marzouk MIT 21 / 37
42. Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1◦Id◦TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Compute an (essentially) 4-D T1 and pull back π
Underlying approximation of µ, φ, Z1:2|Y1:2
Spantini, Bigoni, & Marzouk MIT 21 / 37
43. Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1◦Id◦TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Continue the recursion until no edges are left. . .
Underlying approximation of µ, φ, Z1:2|Y1:2
Spantini, Bigoni, & Marzouk MIT 21 / 37
44. Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1 ◦ T2◦Id ◦ TN−1
A A BS
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Continue the recursion until no edges are left. . .
Underlying approximation of µ, φ, Z1:3|Y1:3
Spantini, Bigoni, & Marzouk MIT 21 / 37
45. Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1 ◦ T2 ◦ · · · ◦ TN−3◦Id
A A
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Continue the recursion until no edges are left. . .
Underlying approximation of µ, φ, Z1:N−1|Y1:N−1
Spantini, Bigoni, & Marzouk MIT 21 / 37
46. Stochastic volatility model
Build the decomposition recursively
T = T0 ◦ T1 ◦ T2 ◦ · · · ◦ TN−3 ◦ TN−2◦Id
A A
1 2 3 4 N
µ φ
Figure: Markov structure for the pullback of π through T
Each map Tk is essentially 4-D regardless of N
Underlying approximation of µ, φ, Z1:N|Y1:N
Spantini, Bigoni, & Marzouk MIT 21 / 37
48. Example: stochastic volatility model
Stochastic volatility model: Latent log-volatilities take the form of
an AR(1) process for t = 1, . . . , N:
Zt+1 = µ + φ (Zt − µ) + ηt, ηt ∼ N(0, 1), Z1 ∼ N(0, 1/1 − φ2
)
Observe the mean return for holding an asset at time t
Yt = εt exp( 0.5 Zt ), εt ∼ N(0, 1), t = 1, . . . , N
Markov structure for π ∼ µ, φ, Z1:N|Y1:N is given by:
A BS
1 2 3 4 N
µ φ
Spantini, Bigoni, & Marzouk MIT 23 / 37
49. Stochastic volatility example
0 100 200 300 400 500
time
3
2
1
0
1
Zt
Infer log-volatility of the pound/dollar exchange rate, starting on 1
October 1981
Filtering (blue) versus smoothing (red) marginals
Spantini, Bigoni, & Marzouk MIT 24 / 37
50. Stochastic volatility example
0 200 400 600 800
time
3
2
1
0
1
2
Zt|Y0:N
Quantiles of smoothing marginals: map (red) compared to MCMC
(black)
Spantini, Bigoni, & Marzouk MIT 25 / 37
51. Stochastic volatility example
time
0 200 400 600 800
2.01.51.00.50.00.51.01.5
|Y0:t
0.0
0.5
1.0
1.5
2.0
2.5
Sequential parameter estimation: marginals of hyperparameter µ,
conditioning on successively more observations y0:k
Map (solid lines) versus MCMC (dashed)
Spantini, Bigoni, & Marzouk MIT 26 / 37
52. Stochastic volatility example
time
0 200 400 600 800
0.4
0.5
0.6
0.7
0.80.91.0
|Y0:t
0
5
10
15
20
25
30
Sequential parameter estimation: marginals of hyperparameter φ,
conditioning on successively more observations y0:k
Map (solid lines) versus MCMC (dashed)
Spantini, Bigoni, & Marzouk MIT 27 / 37
53. Stochastic volatility example
0 200 400 600 800
time
3
2
1
0
1
2
3
4
Yy|Y0:N
data
Posterior predictive distribution (shaded red) versus the data (dots)
Spantini, Bigoni, & Marzouk MIT 28 / 37
54. Stochastic volatility example
4500 5000 5500 6000 6500 7000 7500 8000 8500 9000
time
3
2
1
0
1
2
Zt|Y0:N
Long-time smoothing: map (red) compared to MCMC (black)
Observe fall 2008 financial crisis, Brexit, . . .
Spantini, Bigoni, & Marzouk MIT 29 / 37
55. Stochastic volatility example
Variance diagnostic Varη[log(η/T−1
¯π)] values, for a 947-dimensional
target π (smoothing and parameter estimation for 945 days) :
Laplace map = 5.68; linear maps = 1.49; degree ≤ 7 maps = 0.11
Important open question: how does error in the approximation of
the parameter posterior evolve over time?
Spantini, Bigoni, & Marzouk MIT 30 / 37
56. Beyond the Markov properties of π
Alternative source of low dimensionality: low-rank structure
Example: fix target π to be the posterior density of a Bayesian
inference problem,
π(z) := πpos(z) ∝ πY|Z(y | z) πZ(z)
Let Tpr push forward the reference η to the prior πZ (prior map)
Define πpos(z) := (T−1
pr ) πpos(z) ∝ πY|Z(y |Tpr(z) ) η(z)
Theorem [Graph decoupling]
If η = i ηXi
and
rank Eη [ log R ⊗ log R ] = k, R = πpos/η = πY|Z ◦ Tpr
then there exists a rotation Q such that:
Q−1
πpos(z) = g( z1, ..., zk )
n
i>k
ηXi
(zi )
Spantini, Bigoni, & Marzouk MIT 31 / 37
57. Changing the Markov structure. . .
The pullback has a different Markov structure:
Q−1
πpos(z) = g( z1, ..., zk )
n
i>k
ηXi
(zi )
G G Pullback
Corollary: There exists a transport T η = Q−1
πpos of the form
T(x) = [ g(x1:k ), xk+1, . . . , xn ], where g : Rk → Rk .
The composition Tpr ◦ Q ◦ T pushes forward η to πpos
Why low rank structure? For example, few data-informed
directions.
Spantini, Bigoni, & Marzouk MIT 32 / 37
58. Log-Gaussian Cox process
6
7
8
9
10
0
1
2
3
4
5
6
4096-D GMRF prior, Z ∼ N(µ, Γ), Γ−1 specified through + κ2 Id
30 sparse observations at locations i ∈ I, Yi |Zi ∼ Pois( exp Zi )
Posterior density Z|Y ∼ πpos is:
πpos(z) ∝
i∈I
exp[− exp(zi ) + zi · yi ] exp −
1
2
(z − µ) Γ−1
(z − µ)
What is an independence map G for πpos?
Spantini, Bigoni, & Marzouk MIT 33 / 37
59. Log-Gaussian Cox process
G
4096-D GMRF prior, Z ∼ N(µ, Γ), Γ−1 specified through + κ2 Id
30 sparse observations at locations i ∈ I, Yi |Zi ∼ Pois( exp Zi )
Posterior density Z|Y ∼ πpos is:
πpos(z) ∝
i∈I
exp[− exp(zi ) + zi · yi ] exp −
1
2
(z − µ) Γ−1
(z − µ)
What is an independence map G for πpos? A 64 × 64 grid.
Spantini, Bigoni, & Marzouk MIT 33 / 37
60. Log-Gaussian Cox process
Fix πref ∼ N(0, I) and let Tpr push forward πref to πpr (prior map)
Consider the pullback πpos = Tpr πpos and find that
rank Eπref
[ log R ⊗ log R ] = 30 n, R = πpos/πref
Deflate the problem and compute a transport map in 30 dimensions
Change from prior to posterior concentrated in a low-dimensional
subspace (likelihood-informed subspace Cui, Law, M 2014; active
subspace Constantine 2015)
6
7
8
9
10
0
1
2
3
4
5
6
8
9
truth posterior sample posterior mean
Spantini, Bigoni, & Marzouk MIT 34 / 37
62. Conclusions
Bayesian inference through the construction of deterministic couplings
Computation of transport maps in high dimensions, leveraging the
structure of the posterior:
1 Decomposability of direct transports
Variational approaches for Bayesian filtering, smoothing, and
sequential parameter inference
2 Low-rank transports
Ongoing work:
Error analysis of variational filtering schemes
Transport-based generalizations of the ensemble Kalman filter
Structure learning for continuous non-Gaussian Markov random fields
[Morrison, Baptista, & M NIPS 2017]
Mapping sparse quadrature or QMC schemes
Approximate Markov properties. . .
Spantini, Bigoni, & Marzouk MIT 36 / 37
63. References
Main reference for this talk: A. Spantini, D. Bigoni, Y. Marzouk. “Inference via
low-dimensional couplings.” arXiv:1703.06131
R. Morrison, R. Baptista, Y. Marzouk. “Beyond normality: learning sparse
probabilistic graphical models in the non-Gaussian setting.” NIPS 2017.
Y. Marzouk, T. Moselhy, M. Parno, A. Spantini, “An introduction to sampling via
measure transport.” Handbook of Uncertainty Quantification, R. Ghanem, D.
Higdon, H. Owhadi, eds. Springer (2016). arXiv:1602.05023.
(broad introduction to transport maps)
M. Parno, T. Moselhy, Y. Marzouk, “A multiscale strategy for Bayesian inference
using transport maps.” SIAM JUQ, 4: 1160–1190 (2016).
M. Parno, Y. Marzouk, “Transport map accelerated Markov chain Monte Carlo.”
arXiv:1412.5492. SIAM JUQ (to appear)
T. Moselhy, Y. Marzouk, “Bayesian inference with optimal maps.” J. Comp. Phys.,
231: 7815–7850 (2012).
Python code at http://transportmaps.mit.edu
Spantini, Bigoni, & Marzouk MIT 37 / 37