A model for interpretable high dimensional interactions

A Model for Interpretable High Dimensional
Interactions
Sahir Rai Bhatnagar
Joint work with Yi Yang, Mathieu Blanchette and Celia Greenwood
Poster Number 67

one predictor variable at a time
Predictor Variable Phenotype

one predictor variable at a time
Test 1
Test 2
Test 3
Test 4
Test 5
1

a network based view

a network based view
Test 1
2

system level changes due to environment
Predictor Variable PhenotypeEnvironment
A
B

system level changes due to environment
Predictor Variable PhenotypeEnvironment
A
B
Test 1
3

Motivating Dataset: Newborn epigenetic adaptations to gesta-
tional diabetes exposure (Luigi Bouchard, Sherbrooke)
Environment
Gestational
Diabetes
Large Data
Child’s epigenome
(p ≈ 450k)
Phenotype
Obesity measures
4

Diﬀerential Correlation between environments
(a) Gestational diabetes aﬀected pregnancy (b) Controls
5

formal statement of initial problem
• n: number of subjects
6

• p: number of predictor variables
6

• Xn×p: high dimensional data set (p >> n)
6

• Yn×1: phenotype
6

• En×1: environmental factor that has widespread eﬀect on X and can
modify the relation between X and Y
6

• En×1: environmental factor that has widespread eﬀect on X and can
modify the relation between X and Y
Objective
• Which elements of X that are associated with Y , depend on E?
6

ECLUST - our proposed method: 3 phases
Original Data

Original Data
E = 0
1) Gene Similarity
E = 1

Original Data
E = 0
1) Gene Similarity
E = 1
2) Cluster
Representation

Original Data
E = 0
1) Gene Similarity
E = 1
2) Cluster
Representation
n × 1 n × 1

Original Data
E = 0
1) Gene Similarity
E = 1
2) Cluster
Representation
n × 1 n × 1
3) Penalized
Regression
Yn×1∼ + ×E
7

the objective of statistical
methods is the reduction of data.
A quantity of data . . . is to be
replaced by relatively few quantities
which shall adequately represent
. . . the relevant information
contained in the original data.
- Sir R. A. Fisher, 1922
7

Model
g(µ) =β0 + β1X1 + · · · + βpXp + βE E
main eﬀects
+ α1E (X1E) + · · · + αpE (XpE)
interactions
1Choi et al. 2010, JASA
2Chipman 1996, Canadian Journal of Statistics
8

Model
g(µ) =β0 + β1X1 + · · · + βpXp + βE E
main eﬀects
+ α1E (X1E) + · · · + αpE (XpE)
interactions
Reparametrization1
: αjE = γjE βj βE .
8

Model
g(µ) =β0 + β1X1 + · · · + βpXp + βE E
main eﬀects
+ α1E (X1E) + · · · + αpE (XpE)
interactions
Reparametrization1
: αjE = γjE βj βE .
Strong heredity principle2
:
ˆαjE = 0 ⇒ ˆβj = 0 and ˆβE = 0
8

Strong Heredity Model with Penalization
arg min
β0,β,γ
1
2
Y − g(µ)
2
+
λβ (w1β1 + · · · + wqβq + wE βE ) +
λγ (w1E γ1E + · · · + wqE γqE )
wj =
1
ˆβj
, wjE =
ˆβj
ˆβE
ˆαjE
9

Simulation Study: Jaccard Index and test set MSE
10

Open source software
• Software implementation in R: http://sahirbhatnagar.com/eclust/
• Allows user speciﬁed interaction terms
• Automatically determines the optimal tuning parameters through
cross validation
• Can also be applied to genetic data
11

Conclusions and Contributions
• Large system-wide changes are observed in many environments
12

• This assumption can possibly be exploited to aid analysis of large
data
12

data
• We develop and implement a multivariate penalization procedure for
predicting a continuous or binary disease outcome while detecting
interactions between high dimensional data (p >> n) and an
environmental factor.
12

data
• Dimension reduction is achieved through leveraging the
environmental-class-conditional correlations
12

data
• Also, we develop and implement a strong heredity framework
within the penalized model
12

data
• Also, we develop and implement a strong heredity framework
within the penalized model
• R software: http://sahirbhatnagar.com/eclust/
12

Limitations
• There must be a high-dimensional signature of the exposure
13

Limitations
• Clustering is unsupervised
13

Limitations
• Two tuning parameters
13

Limitations
• Two tuning parameters
• Need more samples . . . Got data? (Poster 67)
13

acknowledgements
• Dr. Celia Greenwood
• Dr. Blanchette and Dr. Yang
• Dr. Luigi Bouchard, Andr´e Anne
Houde
• Dr. Steele, Dr. Kramer,
Dr. Abrahamowicz
• Maxime Turgeon, Kevin
McGregor, Lauren Mokry,
Dr. Forest
• Greg Voisin, Dr. Forgetta,
Dr. Klein
• Mothers and children from the
study
14

A model for interpretable high dimensional interactions

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

A model for interpretable high dimensional interactions