1. Ancestral Causal Inference
Sara Magliacane1,2
, Tom Claassen1,3
, Joris M. Mooij1
1
University of Amsterdam; 2
VU Amsterdam 3
Radboud University Nijmegen
Current'best'choice'
CausAM
Causality@AmsterdaM
Main Contributions
• Ancestral Causal Discovery (ACI), a causal discovery method as accurate as the
state-of-the-art but much more scalable
• A method for scoring causal relations that approximates marginal probability
Causal discovery methods
• Score-based: evaluate models using a penalized likelihood score
• Constraint-based causal discovery: use statistical independences to express
constraints over possible causal models
Advantages of constraint-based w.r.t. score-based methods:
• can handle latent confounders naturally
• easy integration of background knowledge
Disadvantages of constraint-based methods:
• vulnerability to errors in statistical independence tests
• No estimation of confidence in the causal predictions
Causal inference as an optimization problem
To solve the vulnerability to errors in statistical tests Hyttinen et al. [2014] propose HEJ,
which formulates causal discovery as an optimization problem:
• Weighted list of statistical independence results: I = {(ij, wj)}:
– E.g. I = { (Y ⊥⊥ Z | X, 0.2), (Y ⊥⊥ X, 0.1)}
• For any possible causal structure C, define a loss function:
loss(C, I) :=
(ij,wj)∈I: ij is not satisfied in C
wj
• “ij is not satisfied in C” = defined by causal reasoning rules
• Causal inference = Find causal structure minimizing loss function
C∗
= arg min
C∈C
loss(C, I)
Problem: Scalability, e.g. HEJ is very slow already for 8 random variables.
Ancestral Causal Inference (ACI)
Instead of direct causal relations use a more coarse-grained representation, e.g., an
ancestral structure, i.e. the transitive closure of the observed variables of the DAG:
(reflexivity) : X X,
(transitivity) : X Y ∧ Y Z =⇒ X Z,
(antisymmetry) : X Y ∧ Y X =⇒ X = Y,
Ancestral Causal Inference (ACI)
We reformulate the causal discovery as an optimization problem in terms of ancestral
structures, which reduce drastically the search space (e.g. for 7 variables: 2.3 × 1015
→ 6 × 106
possible structures). This requires new ancestral reasoning rules:
For X, Y , W disjoint (sets of) variables:
1. (X ⊥⊥ Y | W ) ∧ (X W ) =⇒ X Y
2. X ⊥⊥ Y | W ∪ [Z] =⇒ (X ⊥⊥ Z | W ) ∧ (Z {X, Y } ∪ W )
3. X ⊥⊥ Y | W ∪ [Z] =⇒ (X ⊥⊥ Z | W ) ∧ (Z {X, Y } ∪ W )
4. (X ⊥⊥ Y | W ∪ [Z]) ∧ (X ⊥⊥ Z | W ∪ U) =⇒ (X ⊥⊥ Y | W ∪ U)
5. (Z ⊥⊥ X | W ) ∧ (Z ⊥⊥ Y | W ) ∧ (X ⊥⊥ Y | W ) =⇒ X ⊥⊥ Y | W ∪ Z
Possible weighting schemes for inputs
ACI supports two types of weighted input statements: statistical independence results
and ancestral relations. We propose two simple weighting schemes:
• a frequentist approach, in which for any appropriate frequentist statistical test with
independence as null hypothesis, we define the weight:
w = | log p − log α|, where p = p-value of the test, α = significance level (e.g., 5%);
• a Bayesian approach, in which the weight of each input i using data set D is:
w = log
p(i|D)
p(¬i|D)
= log
p(D|i)
p(D|¬i)
p(i)
p(¬i)
,
where the prior probability p(i) can be used as a tuning parameter.
For X Y we test the independence of Y and IX, an indicator variable (0 for
observational samples, 1 for samples from the distribution where X is intervened upon).
A method for scoring causal predictions
• Score the confidence in a predicted statement s (e.g. X Y ) as:
C(f) = min
C∈C
loss(C, I + (¬s, ∞))
− min
C∈C
loss(C, I + (s, ∞))
• ≈ MAP approximation of the log-odds ratio of s
• Asymptotically consistent, when consistent input weights
• Can be used with any method that solves an optimization problem
Simulated data
• Generate randomly 2000 linear acyclic models of n observed variables, with latent
variables and Gaussian noise
• Per model: sample 500 data points and perform independence tests up to order c
Evaluation on Simulated data
We compare ACI, HEJ [Hyttinen et al., 2014] equipped with our scoring method, and
bootstrapped versions of FCI and CFCI.
Recall
0 0.05 0.1 0.15 0.2
Precision
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bootstrapped (100) CFCI
Bootstrapped (100) FCI
HEJ (c=1)
ACI (c=1)
Standard CFCI
Standard FCI
Recall
0 0.005 0.01 0.015 0.02
Precision
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Precision recall curves for ancestral (left) and nonancestral (right) relations. The middle
column is a zoom of ancestral PR curve.
• ACI is as accurate as HEJ for c = 1, outperforming bootstrapped C/FCI
0.01
0.1
1
10
100
1000
6 6.5 7 7.5 8 8.5 9
Executiontime(s)
Number of variables
HEJ
ACI
• ACI is orders of magnitude faster than HEJ
• The difference grows exponentially as the
number of variables n increases (log-scale)
• HEJ is not feasible for 8 variables
• ACI can scale up to 12 variables
Application on real data
We apply ACI to reconstruct a signalling network from flow cytometry data.
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
BCFCI (indep. <= 1)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
Bootstrapped CFCI (in-
dependences c = 1)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
ACI (ancestral relations)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
ACI (ancestral rela-
tions)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
ACI (ancestral r. + indep. <= 1)
Raf
Mek
PLCg
PIP2
PIP3
Erk
Akt
PKA
PKC
p38
JNK
ACI (ancestral relations
and indep. c = 1)
• ACI can take advantage of weighted ancestral re-
lations from experimental data
• CFCI cannot, so it predicts much less
• ACI is consistent with other methods, e.g.
[MooijHeskes2013]
Raf
Mek
Erk
Akt
JNK
PIP3
PLCg
PIP2
PKC
PKA
p38
References
Antti Hyttinen, Frederick Eberhardt, and Matti J¨arvisalo. Constraint-based causal dis-
covery: Conflict resolution with Answer Set Programming. In UAI, 2014.
ACI source code: http://github.com/caus-am/aci