1. We apply hierarchical matrix techniques (HLIB, hlibpro) to approximate huge covariance matrices. We are able to work with 250K-350K non-regular grid nodes.
2. We maximize a non-linear, non-convex Gaussian log-likelihood function to identify hyper-parameters of covariance.
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Hierarchical matrix techniques for maximum likelihood covariance estimation
1. Hierarchical matrix techniques for maximum
likelihood covariance estimation
Alexander Litvinenko,
Extreme Computing Research Center and Uncertainty
Quantification Center, KAUST
(joint work with M. Genton, Y. Sun and D. Keyes)
Center for Uncertainty
Quantification
ntification Logo Lock-up
http://sri-uq.kaust.edu.sa/
2. 4*
The structure of the talk
1. Motivation
2. Hierarchical matrices [Hackbusch 1999]:
3. Mat´ern covariance function
4. Uncertain parameters of the covariance function:
4.1 Uncertain covariance length
4.2 Uncertain smoothness parameter
5. Identification of these parameters via maximizing the
log-likelihood.
Center for Uncertainty
Quantification
tion Logo Lock-up
2 / 38
3. 4*
Motivation, problem 1
Task: to predict temperature, velocity, salinity, estimate parameters of
covariance
Grid: 50Mi locations on 50 levels, 4*(X*Y*Z) + X*Y= 4*500*500*50 +
500*500 = 50Mi.
High-resolution time-dependent data about Red Sea: zonal velocity and
temperature
Center for Uncertainty
Quantification
tion Logo Lock-up
3 / 38
4. 4*
Motivation, problem 2
Task: to predict moisture, compute covariance, estimate its parameters
Grid: 1830 × 1329 = 2, 432, 070 locations with 2,153,888 observations
and 278,182 missing values.
−120 −110 −100 −90 −80 −70
253035404550
Soil moisture
longitude
latitude
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
High-resolution daily soil moisture data at the top layer of the Mississippi
basin, U.S.A., 01.01.2014 (Chaney et al., in review).
Important for agriculture, defense. Moisture is very heterogeneous.
Center for Uncertainty
Quantification
tion Logo Lock-up
4 / 38
5. 4*
Motivation, estimation of uncertain parameters
H-matrix rank
3 7 9
cov.length
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
Box-plots for = 0.0334 (domain [0, 1]2) vs different H-matrix
ranks k = {3, 7, 9}.
Which H-matrix rank is sufficient for identification of parameters
of a particular type of cov. matrix?
Center for Uncertainty
Quantification
tion Logo Lock-up
5 / 38
6. 4*
Motivation for H-matrices
General dense matrix requires O(n3) storage and time. Is very
expensive.
If covariance matrix is structured (diagonal, Toeplitz, circulant)
then we can apply e.g. FFT with O(nlogn), but if not ?
Center for Uncertainty
Quantification
tion Logo Lock-up
6 / 38
7. 4*
H-matrix storage and complexity (p-proc. on shared mem.)
Operation Sequential Compl. Parallel Complexity
(R.Kriemann 2005)
building(M) N = O(nlogn) N
p + O(|V (T)L(T)|)
storage(M) N = O(knlogn) N
Mx N = O(knlogn) N
p + n√
p
αM ⊕ βM N = O(k2nlogn) N
p
αM M ⊕ βM N = O(k2nlog2
n) N
p + O(Csp(T)|V (T)|)
M−1 N = O(k2nlog2
n) N
p + O(nn2
min)
LU N = O(k2nlog2
n) N
H-LU N = O(k2nlog2
n) N
p + O(
k2nlog2
n
n1/d )
Center for Uncertainty
Quantification
tion Logo Lock-up
7 / 38
10. 4*
Low-rank (rank-k) matrices
How do we compute green blocks?
M ∈ Rn×m, U ≈ ˜U ∈ Rn×k, V ≈ ˜V ∈ Rm×k, k min(n, m).
The storage ˜M = ˜U ˜Σ ˜V T is k(n + m) instead of n · m for M
represented in the full matrix format.
VU Σ
T=M
U
VΣ∼
∼ ∼ T
=M
∼
Figure : Reduced SVD, only k biggest singular values are taken.
Center for Uncertainty
Quantification
tion Logo Lock-up
10 / 38
11. 4*
H-matrices (Hackbusch ’99), main steps
1. Build cluster tree TI and block cluster tree TI×I .
I
I
I I
I
I
I I I I1
1
2
2
11 12 21 22
I11
I12
I21
I22
Center for Uncertainty
Quantification
tion Logo Lock-up
11 / 38
12. 4*
H - Matrices
Let h = 2
i=1 h2
i / 2
i , where hi := xi − yi , i are cov. lengths and
d = 1.
exponential cov(h) = σ2 · exp(−h),
The cov. matrix C ∈ Rn×n, n = 652.
1 2
C−CH 2
C 2
0.01 0.02 3e − 2
0.1 0.2 8e − 3
1 2 2.8e − 6
10 20 3.7e − 9
Center for Uncertainty
Quantification
tion Logo Lock-up
12 / 38
13. 4*
Mat´ern covariance functions
Mat´ern covariance functions
Cθ =
2σ2
Γ(ν)
r
2
ν
Kν
r
, θ = (σ2
, ν, ).
Center for Uncertainty
Quantification
tion Logo Lock-up
13 / 38
15. 4*
Mat´ern function for different parameters
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0
0.05
0.1
0.15
0.2
0.25
Matern covariance (nu=1)
σ=0.5, l=0.5
σ=0.5, l=0.3
σ=0.5, l=0.2
σ=0.5, l=0.1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0
0.05
0.1
0.15
0.2
0.25
nu=0.15
nu=0.3
nu=0.5
nu=1
nu=2
nu=30
Computed in sglib, E. Zander.
Center for Uncertainty
Quantification
tion Logo Lock-up
15 / 38
16. Realisations of Mat´ern random field for different parameters
To generate a realization κ(x, θ∗) of a RF κ(x, θ), one needs: 1)
C = LLT ,
2) generate a realization ξ(θ∗) of a random vector ξ(θ) and
3) compute MV product L · ξ(θ∗).
Center for Uncertainty
Quantification
tion Logo Lock-up
16 / 38
17. 4*
Numerical experiments with H-matrices
H-matrix approximations of covariance matrices,
computing time and storage
Center for Uncertainty
Quantification
tion Logo Lock-up
17 / 38
18. 4*
Memory and computational times
n rank k size, MB t, sec. ε ε2
for ˜C C ˜C C ˜C
4.0 · 103 10 48 3 0.8 0.08 7 · 10−3 2.0 · 10−4
1.05 · 104 18 439 19 7.0 0.4 7 · 10−4 1.0 · 10−4
2.1 · 104 25 2054 64 45.0 1.4 1 · 10−5 4.4 · 10−6
Table : The accuracy of the H-matrix approximation (weak admissibility)
of the exp. covariance function, 1 = 3 = 0.1, 2 = 0.5, L-shape 3D
domain (right) [Khoromskij et al’ 09]
.
Center for Uncertainty
Quantification
tion Logo Lock-up
18 / 38
20. 4*
Identifying uncertain parameters
Given: a vector of measurements z = (z1, ..., zn)T with a
covariance matrix C(θ∗) = C(σ2, ν, ).
Cθ =
2σ2
Γ(ν)
r
2
ν
Kν
r
, θ = (σ2
, ν, ).
To identify: uncertain parameters (σ2, ν, ).
Plan: Maximize the log-likelihood function
L(θ) = −
1
2
Nlog2π + log det{C(θ)} + zT
C(θ)−1
z ,
On each iteration i we have a new matrix C(θi ).
Center for Uncertainty
Quantification
tion Logo Lock-up
20 / 38
21. 4*
Other works
1. S. AMBIKASARAN, et al., Fast direct methods for gaussian processes and the analysis of NASA Kepler
mission, arXiv:1403.6015, (2014).
2. S. AMBIKASARAN, J. Y. LI, P. K. KITANIDIS, AND E. DARVE, Large-scale stochastic linear inversion
using hierarchical matrices, Computational Geosciences, (2013)
3. J. BALLANI AND D. KRESSNER, Sparse inverse covariance estimation with hierarchical matrices, (2015).
4. M. BEBENDORF, Why approximate LU decompositions of finite element discretizations of elliptic
operators can be computed with almost linear complexity, (2007).
5. S. BOERM AND J. GARCKE, Approximating gaussian processes with H2-matrices, 2007.
6. J. E. CASTRILLON, M. G. GENTON, AND R. YOKOTA, Multi-Level Restricted Maximum Likelihood
Covariance Estimation and Kriging for Large Non-Gridded Spatial Datasets, (2015).
7. J. DOELZ, H. HARBRECHT, AND C. SCHWAB, Covariance regularity and H-matrix approximation for
rough random fields, ETH-Zuerich, 2014.
8. H. HARBRECHT et al, Efficient approximation of random fields for numerical applications, Numerical
Linear Algebra with Applications, (2015).
9. C.-J. HSIEH, et al, Big QUIC: Sparse inverse covariance estimation for a million variables, 2013
10. J. QUINONERO-CANDELA, et al, A unifying view of sparse approximate gaussian process regression,
(2005).
11. A. SAIBABA, S. AMBIKASARAN, J. YUE LI, P. KITANIDIS, AND E. DARVE, Application of hierarchical
matrices to linear inverse problems in geostatistics, Oil & Gas Science (2012).
Center for Uncertainty
Quantification
tion Logo Lock-up
21 / 38
22. 4*
Convergence of the optimization method
Center for Uncertainty
Quantification
tion Logo Lock-up
22 / 38
23. 4*
Details of the identification
To maximize the log-likelihood function we use the Brent’s method
[Brent’73] (combining bisection method, secant method and
inverse quadratic interpolation).
1. C(θ) ≈ CH(θ, k).
2. H-Cholesky: CH(θ, k) = LLT
3. zT C−1z = zT (LLT )−1z = vT · v, where v is a solution of
˜L(θ, k)v(θ) := z(θ∗).
4. Let λi be diagonal elements of H-Cholesky factor L, then
log det{C} = log det{LLT
} = log det{
n
i=1
λ2
i } = 2
n
i=1
logλi ,
L(θ, k) = −
N
2
log(2π) −
N
i=1
log{˜Lii (θ, k)} −
1
2
(v(θ)T
· v(θ)). (3)
Center for Uncertainty
Quantification
tion Logo Lock-up
23 / 38
24. 0 10 20 30 40
−4000
−3000
−2000
−1000
0
1000
2000
parameter θ, truth θ*=12
Log−likelihood(θ)
Shape of Log−likelihood(θ)
log(det(C))
zT
C−1
z
Log−likelihood
Figure : Minimum of negative log-likelihood (black) is at
θ = (·, ·, ) ≈ 12 (σ2
and ν are fixed)
Center for Uncertainty
Quantification
tion Logo Lock-up
24 / 38
25. 4*
What will change?
Approximate C by CH
1. How the eigenvalues of C and CH differ ?
2. How det(C) differs from det(CH) ?
3. How L differs from LH ?
4. How C−1 differs from (CH)−1 ?
5. How L(θ, k) differs from L(θ)?
6. What is optimal H-matrix rank?
7. How θH differs from θ?
For theory, estimates for the rank and accuracy see works of
Bebendorf, Grasedyck, Le Borne, Hackbusch,...
Center for Uncertainty
Quantification
tion Logo Lock-up
25 / 38
26. 4*
Remark
For a small H-matrix rank k the H-matrix Cholesky of CH can be
not so stable (talk of Ralf Zimmermann) when eigenvalues of C
come very close to zero. A remedy is to increase the rank k.
In our example for n = 652 we increased k from 7 to 9.
To avoid this instability, we can modify CH
m = CH + δ2I. Assume
λi are eigenvalues of CH. Then eigenvalues of CH
m will be λi + δ2.
log det(CH
m ) = log
n
i=1
(λi + δ2
) =
n
i=1
log(λi + δ2
). (4)
Center for Uncertainty
Quantification
tion Logo Lock-up
26 / 38
27. 4*
Error analysis
Theorem (Existence of H-matrix inverse in [Bebendorf’11,
Ballani, Kressner’14)
Under certain conditions an H-matrix inverse exist
C−1
H − C−1
≤ ε C−1
, (5)
theoretical estimations for rank kinv of C−1
H are given.
Theorem (Error in log det)
Let E := C − CH, (CH)−1E := (CH)−1C − I and for the spectral
radius
ρ((CH
)−1
E) = ρ((CH
)−1
C − I) ≤ ε < ε. (6)
Then |log det(C) − log det(CH)| ≤ −plog(1 − ε).
Proof: See [Ballani, Kressner 14], [Ipsen’05].
Center for Uncertainty
Quantification
tion Logo Lock-up
27 / 38
28. 4*
How sensible is covariance matrix to H-matrix rank ?
It is not at all sensible.
H-matrix approximation changes function L(θ, k) and estimation
of θ very-very small.
θ 0.05 1.05 2.04 3.04 4.03 5.03 6.02 7.02 8.01
L(exact) 1628 -2354 -1450 27 1744 3594 5529 7522 9559
L(7) 1625 -2354 -1450 27 1745 3595 5530 7524 9560
L(20) 1625 -2354 -1450 27 1745 3595 5530 7524 9561
Comparison of three likelihood functions, computed with different
H-matrix ranks: exact, H-rank 7, H-rank 20. Exponential
covariance function, with covariance length = 0.9, domain
G = [0, 1]2.
Center for Uncertainty
Quantification
tion Logo Lock-up
28 / 38
29. 4*
How sensible is covariance matrix to H-matrix rank ?
0 5 10
−5000
0
5000
10000
15000
θ
−loglikelihood
Figure : Three negative log-likelihood functions: exact, commuted with
H-matrix rank 7 and 17. One can see that even with rank 7 one can
achieve very accurate results.
Center for Uncertainty
Quantification
tion Logo Lock-up
29 / 38
30. 4*
Decreasing of error bars with number of measurements
Error bars (mean +/- st. dev.) computed for different n.
Decreasing of error bars with increasing of number of
measurements/dimension, n = {172, 332, 652}. The mean and
median are obtained after averaging 200 simulations.
Center for Uncertainty
Quantification
tion Logo Lock-up
30 / 38
33. 4*
Difference between two distributions, computed with C and CH
Kullback-Leibler divergence (KLD) DKL(P Q) is a measure of the
information lost when distribution Q is used to approximate P:
DKL(P Q) =
i
P(i) ln
P(i)
Q(i)
, DKL(P Q) =
∞
−∞
p(x) ln
p(x)
q(x)
dx,
where p, q densities of P and Q. For multi-variate normal
distributions (µ0, Σ0) and (µ1, Σ1)
2DKL(N0 N1) = tr(Σ−1
1 Σ0)+(µ1 −µ0)T
Σ−1
1 (µ1 −µ0)−k −ln
det Σ0
det Σ1
Center for Uncertainty
Quantification
tion Logo Lock-up
33 / 38
35. Figure : Dependence of H-matrix approximation error on parameter ν.
Relative error C−CH
2
CH
2
via smoothness parameter ν. H-matrix rank
k = 8, n = 16641, Matern covariance matrix.
Center for Uncertainty
Quantification
tion Logo Lock-up
34 / 38
36. Figure : Dependence of H-matrix approximation error on parameter ν.
Relative error C−CH
2
CH
2
via covariance length . H-matrix rank k = 8,
n = 16641, Matern covariance matrix.
Center for Uncertainty
Quantification
tion Logo Lock-up
35 / 38
37. 4*
Conclusion
Covariance matrices can be approximated in H-matrix format.
Influence of H-matrix approximation error on the estimated
parameters is small.
With application of H-matrices
we extend the class of covariance functions to work with,
allows non-regular discretization of the covariance function on
large spatial grids.
With the maximizing algorithm we are able to identify both
parameters: covariance lengths and the smoothness ν
Center for Uncertainty
Quantification
tion Logo Lock-up
35 / 38
38. 4*
Future plans
ECRC, center of D. Keyes: Parallel H-Cholesky on different
architectures → very large covariance matrices on complicate
grids
Apply H-matrices for
1. Kriging estimate ˆs := Csy C−1
yy y
2. Estimation of variance ˆσ, is the diagonal of conditional cov.
matrix Css|y = diag Css − Csy C−1
yy Cys ,
3. Gestatistical optimal design ϕA := n−1
traceCss|y ,
ϕC := cT
Css − Csy C−1
yy Cys c,
Identify all three parameters (σ2, , ν) simultaneously
Compare with the Bayesian Update (H. Matthies, H. Najm,
K. Law, A. Stuart et al)
Center for Uncertainty
Quantification
tion Logo Lock-up
36 / 38
39. 4*
Literature
1. PCE of random coefficients and the solution of stochastic partial
differential equations in the Tensor Train format, S. Dolgov, B. N.
Khoromskij, A. Litvinenko, H. G. Matthies, 2015/3/11, arXiv:1503.03210
2. Efficient analysis of high dimensional data in tensor formats, M. Espig,
W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander Sparse Grids and
Applications, 31-56, 40, 2013
3. Application of hierarchical matrices for computing the Karhunen-Loeve
expansion, B.N. Khoromskij, A. Litvinenko, H.G. Matthies, Computing
84 (1-2), 49-67, 31, 2009
4. Efficient low-rank approximation of the stochastic Galerkin matrix in
tensor formats, M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies,
P. Waehnert, Comp. & Math. with Appl. 67 (4), 818-829, 2012
5. Numerical Methods for Uncertainty Quantification and Bayesian
Update in Aerodynamics, A. Litvinenko, H. G. Matthies, Book
”Management and Minimisation of Uncertainties and Errors in Numerical
Aerodynamics” pp 265-282, 2013
Center for Uncertainty
Quantification
tion Logo Lock-up
37 / 38
40. 4*
Acknowledgement
1. Lars Grasedyck (RWTH Aachen) and Steffen Boerm (Uni
Kiel) for HLIB (www.hlib.org)
2. KAUST Research Computing group, KAUST Supercomputing
Lab (KSL)
3. Stochastic Galerkin library (sglib from E. Zander). Type in
your terminal
git clone git://github.com/ezander/sglib.git
To initialize all variables, run startup.m You will find:
generalised PCE, sparse grids, (Q)MC, stochastic Galerkin, linear
solvers, KLE, covariance matrices, statistics, quadratures
(multivariate Chebyshev, Laguerre, Lagrange, Hermite ) etc
There are: many examples, many test, rich demos
Center for Uncertainty
Quantification
tion Logo Lock-up
38 / 38