Seismic Method Estimate velocity from seismic data.pptx
Linear Non-Gaussian Structural Equation Models
1. IMPS 2008, Durham, NH
Linear Non-Gaussian Structural
Equation Models
Shohei Shimizu, Patrik Hoyer and Aapo Hyvarinen
Osaka University, Japan
University of Helsinki, Finland
2. 2
Abstract
• Linear Structural Equation Modeling (linear SEM)
– Analyzes causal relations
• Covariance-based SEM
– Uses covariance structure alone for model identification
– A number of indistinguishable models
• Linear non-Gaussian SEM
– Uses non-Gaussian structures for model identification
– Makes many models distinguishable
3. 3
SEM and causal analysis
• SEM is often used for causal analysis
based on non-experimental data
• Assumption: the data generating process
is represented by a SEM model
• If the assumption is reasonable, SEM
provides causal information
4. 4
Limitations of covariance-based SEM
• Covariance-based SEM cannot distinguish
between many models
• Example
e1 x1 x2 x1 x2 e2
5. 5
Linear non-Gaussian SEM
• Many observed data are considerably non-
Gaussian (Micceri, 1989; Hyvarinen et al. 2001)
• Non-Gaussian structures of data are useful
(Bentler 1983; Mooijaart 1985)
• Non-Gaussianity distinguish between the two
models (Shimizu et al. 2006) :
e1 x1 x2 x1 x2 e2
6. 6
Independent component
analysis (ICA)
• Observed random vector x is modeled as
x = As
s
– are independent and non-Gaussian
i
• Zero means and unit variances
– A is a constant matrix
• Typically square, # variables = # independent components
• Identifiable up to permutation of the columns
(Mooijaart 1985; Comon, 1994)
7. 7
ICA estimation
• An alternative expression of ICA (x=As):
s =Wx
,
~
called a recovering matrix
~
where 1 W = A
• Find such that maximizes independence of
sˆ =Wx
components of
– Many proposals (Hyvarinen et al. 2001)
W ~
• is estimated up to permutation of the rows:
~
=
W PW
W
8. 8
ICA estimation
• An alternative expression of ICA (x=As):
s =Wx
,
~
called a recovering matrix
~
where 1 W = A
• Find such that maximizes independence of
sˆ =Wx
components of
– Many proposals (Hyvarinen et al. 2001)
W ~
• is estimated up to permutation of the rows:
~
=
W PW
W
9. 9
ICA estimation
• An alternative expression of ICA (x=As):
s =Wx
,
~
called a recovering matrix
~
where 1 W = A
• Find such that maximizes independence of
sˆ =Wx
components of
– Many proposals (Hyvarinen et al. 2001)
W ~
• is estimated up to permutation of the rows:
~
=
W PW
W
10. Discovery of linear non-Gaussian
acyclic models
Shimizu, Hoyer, Hyvarinen and Kerminen (2006)
11. 11
Linear non-Gaussian acyclic
model (LiNGAM)
• Directed acyclic graphs (DAG)
x
– can be arranged in a order k(i)
• Assumptions:
– Linearity
– External influences e
are independent
– and are non-Gaussian
x = Bx + e i
x = b x + e i ij j
k ( j )
k ( i
)
or
i
i
12. 12
Goal
• We know
– Data X is generated by
• We do NOT know
– Path coefficients: bij
– Orders k(i)
– External influences: ei
x = Bx + e
• What we observe is data X only
• Goal
– Estimate B and k(i) using data X only!
13. 13
Key idea
• First, relate LiNGAM with ICA as follows:
= +
x Bx e
- ICA!
( ) 1
= =
x I B e Ae
= =
e I B x Wx
• Due to the permutation indeterminacy, ICA
gives:
• Can find a correct P
~
=
– The correct permutation is the only one that has no
zeros in the diagonal
~
equivalently ( )
W PW
14. 14
Key idea
• First, relate LiNGAM with ICA as follows:
= +
x Bx e
- ICA!
( ) 1
= =
x I B e Ae
= =
e I B x Wx
• Due to the permutation indeterminacy, ICA
gives:
• Can find a correct P
~
=
– The correct permutation is the only one that has no
zeros in the diagonal
~
equivalently ( )
W PW
15. 15
Key idea
• First, relate LiNGAM with ICA as follows:
= +
- ICA!
( ) 1
= =
x I B e Ae
= =
e I B x Wx
• Due to the permutation indeterminacy, ICA
gives:
W PW
• Can find a correct P
– The correct permutation is the only one that has no
zeros in the diagonal
~
=
x Bx e
~
equivalently ( )
16. 16
Key idea
• First, relate LiNGAM with ICA as follows:
= +
- ICA!
( ) 1
= =
x I B e Ae
= =
e I B x Wx
• Due to the permutation indeterminacy, ICA
gives:
W PW
• Can find the correct P
– The correct permutation is the only one that has no
zeros in the diagonal
~
=
x Bx e
~
equivalently ( )
17. 17
Illustrative example
• Consider the model:
=
x
1
• Goal
e1 x1 x2
+
e
1
x
1
0 0.6
– Estimate the path direction between x1 and
x2 observing only x1 and x2
0.6
2
2
2
0 0
e
x
x
14243
B
18. 18
Perform ICA
• Relation of the LiNGAM model with ICA:
x
1
e
1
1 0.6
~
=
• Due to the permutation indeterminacy, ICA might
give:
=
2
2
0 1
x
e
14243
1 0 ~W
W P
( )
= =
1 0.8
e Wx
W ~
19. 19
Perform ICA
• Relation of the LiNGAM model with ICA:
=
x
1
2
e
1
e
2
1 0.6
0 1
x
14243
~
=
• Due to the permutation indeterminacy, ICA
might give:
1 0 ~W
W P
( )
= =
1 0.6
e Wx
W ~
20. x
1
20
Find the correct P
• Find a permutation of the rows of W so that it
has no zeros in the diagonal
• In the example…
=
1 0.8
2
e
1
2
0 1
x
e
14243
0 1
=
x
1
2
e
2
1
1 0.6
x
e
14243
Permute the rows
W W ~ 0
21. x
1
21
Find the correct P
• Find a permutation of the rows of W so that it
has no zeros in the diagonal
• In the example…
=
1 0.8
2
e
1
2
0 1
x
e
14243
0 1
=
x
1
2
e
2
1
1 0.6
x
e
14243
Permute the rows
W W ~ 0
22. x
1
22
Find the correct P
• Find a permutation of the rows of W so that it
has no zeros in the diagonal
• In the example…
=
1 0.6
2
e
1
2
0 1
x
e
14243
0 1
=
x
1
2
e
2
1
1 0.6
x
e
14243
Permute the rows
0
0
~ W W
23. 23
Find the correct P
• In practice,
1
( PTW
)ii
ˆ max =
P
P
• Heavily penalizes small absolute values in
the diagonal
25. 25
Prune B (1)
• In practice, due to estimation errors, we
would get:
+
=
e
1
e
2
x
1
2
x
1
2
0 0.65
0.05 0
x
x
1442443
B
• Need to find which path coefficients are
actually zeros
26. 26
Find a permutation that gives a
lower triangular matrix
• The LiNGAM model is acyclic
– The matrix B can be permuted to be lower triangular
for some permutation of variables (Bollen, 1989)
• First, find a simultaneous permutation of rows
and columns of B that gives a lower-triangular B
• In practice, find a permutation matrix Q that
minimizes the sum of the elements in its upper
triangular part: Q ˆ =
max
( QBQ
T ) ij
Q
i j
27. 27
Find a permutation that gives a
lower triangular matrix
• The LiNGAM model is acyclic
– The matrix B can be permuted to be lower triangular
for some permutation of variables (Bollen, 1989)
• First, find a simultaneous permutation of rows
and columns of B that gives a lower-triangular B
• In practice, find a permutation matrix Q that
minimizes the sum of the elements in its upper
triangular part: Q ˆ =
max
( QBQ
T ) ij
Q
i j
28. 28
Find a permutation that gives a
lower triangular matrix
• The LiNGAM model is acyclic
– The matrix B can be permuted to be lower triangular
for some permutation of variables (Bollen, 1989)
• First, find a simultaneous permutation of rows
and columns of B that gives a lower-triangular B
• In practice, find a permutation matrix Q that
minimizes the sum of the elements in its upper
triangular part: Q ˆ =
min
( QBQ
T ) ij
Q
i j
29. 29
Get a lower-triangular B
• Applying such a simultaneous permutation of the
• we get a permuted B that is as lower-triangular
• Set the upper-triangular elements to be zeros
rows and columns,
+
as possible
0 0.65
=
e
1
2
x
1
2
x
1
2
0.05 0
e
x
x
+
=
e
2
1
2
x
1
2
x
1
0 0.05
0.62 0
e
x
x
B
T QBQ
30. 30
Get a lower-triangular B
• Applying such a simultaneous permutation of the
• we get a permuted B that is as lower-triangular
• Set the upper-triangular elements to be zeros
rows and columns,
+
as possible
0 0.65
=
e
1
2
x
1
2
x
1
2
0.05 0
e
x
x
+
=
e
2
1
2
x
1
2
x
1
0 0.05
0.65 0
e
x
x
B
T QBQ
31. 31
Get a lower-triangular B
• Applying such a simultaneous permutation of the
• we get a permuted B that is as lower-triangular
• Set the upper-triangular elements to be zeros
rows and columns,
+
as possible
0 0.65
=
e
1
2
x
1
2
x
1
2
0.05 0
e
x
x
+
-0.05
=
e
2
1
2
x
1
2
x
1
0 0.05
0.65 0
e
x
x
B
T QBQ
32. 32
Get a lower-triangular B
• Applying such a simultaneous permutation of the
• we get a permuted B that is as lower-triangular
• Set the upper-triangular elements to be zeros
rows and columns,
+
as possible
0 0.65
=
e
1
2
x
1
2
x
1
2
0.05 0
e
x
x
+
=
e
2
1
2
x
1
2
x
1
0
0 0.05
0.65 0
e
x
x
B
T QBQ
33. 33
Pruning B (2)
• Once we get a lower-triangular B, the model is
identifiable using covariance-based SEM
+
=
e
2
e
1
2
x
x
1
2
x
x
1
0 0
0.65 0
• Many existing methods can be used for pruning
the remaining path coefficients
– Wald test, Bootstrapping, Model fit
– Lasso-type estimators (Tibshirani 1996; Zou, 2006) etc.
34. 34
To summarize the procedure…
1. Estimate B
– ICA + finding the correct row permutation
2. Prune estimated B
1. Find a row-and-column permutation that makes
estimated B lower triangular
2. Prune remaining paths using a covariance-based
method
1. Estimate B 2. Prune estimated B
x4
x3
x2
x1
x4
x3
x2
x1
x4
x3
x2
x1
35. 35
Summary of the regular LiNGAM
• A linear acyclic model is identifiable based on
non-Gaussianity
• ICA-based estimation works well
– Confidence intervals (Konya et al., in progress)
• Better pruning methods might be developed
– Imposing sparseness in the ICA stage (Zhang Chang,
2006; Hayashi et al. in progress) like Lasso (Tibshirani 1996)
37. 37
Latent factors (Shimizu et al., 2007)
• A non-Gaussian multiple indicator model:
= +
f Bf d
= +
x Gf e
• Suppose that G is identified, then B is identified
– Could identify G in a data driven way using a tetrad-constraint-
based method (Silva et al., 2006)
38. 38
Latent classes
(Shimizu Hyvarinen, 2008)
• LiNGAM model for each class q:
x = B x + (I B )ì + e x = ì + A e - ICA!
q q q q q q q
• ICA mixtures (Lee et al., 2000; Mollah et al., 2006)
Class 1:
Class 2:
0 0
0.9
x2 x1
6 5
0.2
x2 x1
39. 39
Unobserved confounders
(Hoyer et al., in press
• Can identify and distinguish between more
models
1. 2. 3.
x1 x2 x1 x2
4. 5. 6.
u1
x1 x2
x1 x2
u1
x1 x2
u1
x1 x2
40. 40
Time structures (Hyvarinen et al., 2008)
• Combining LiNGAM and autoregressive model:
k
e x B x + ==
( ) ( ) ( )
t t t
0
– In econometrics: Structural vector autoregression
(Swanson Granger, 1997)
• Changes ordinary AR coefficients based on
instantaneous effects:
( ) for 0
B I B M ( :AR matrix)
0 =
M
41. 41
Some variables are Gaussian
(Hoyer et al., 2008)
• Consider the model:
0.6
e2 x2 x1
• Can identify the path direction
– if either of x1 or e2 is non-Gaussian
• In general, there exist several equivalent models
that entail the same distribution if some are
Gaussian
42. 42
Some other extensions
• Cyclic models (Lacerda et al., 2008)
– Fewer equivalent models than covariance-based
approach
• Nonlinearity (Zhang Chan, 2007; Sun et al., 2007)
• Model fit statistics are under development
– Non-Gaussian structures
43. 43
Conclusion
• Use of non-Gaussianity in SEM is useful
for model identification
• Many observed data are considerably
non-Gaussian
• The non-Gaussian approach can be a
good option
44. 44
• Most of our papers and Matlab/Octave
code are available on our webpages
• Google will find us!