SIMoNe: Statistical Iference for MOdular NEtworks

SIMoNe
An R package for inferring Gausssian networks with latent
clustering

Julien Chiquet (and Camille, Christophe, Gilles, Catherine, Yves)

´
Laboratoire Statistique et Genome,
´ ´ ´
La genopole - Universite d’Evry

SSB – 13 avril 2010

SIMoNe: inferring Gaussian networks with latent clustering 1

Problem

Inference

n ≈ 10s/100s of slides
g ≈ 1000s of genes Which interactions?
O(g 2 ) parameters (edges) !

The main statistical issue is the high dimensional setting


Handling the scarcity of data (1)
By reducing the number of parameters

Assumption
Connections will only appear between informative genes

select p key genes P
differential analysis p “reasonable” compared to n

typically, n ∈ [p/5; 5p]

the learning dataset
inference n size–p vectors of expression

(X1 , . . . , Xn ) with Xi ∈ Rp


By collecting as many observations as possible

Multitask learning Go to learning

How should we merge the data?
organism
drug 2
drug 1 drug 3




by inferring each network independently
organism
drug 2
drug 1 drug 3

(1) (1) (1) (2) (2) (2) (3) (3) (3)
(X1 , . . . , Xn1 ), Xi ∈ Rp1 (X1 , . . . , Xn2 ), Xi ∈ Rp2 (X1 , . . . , Xn3 ), Xi ∈ Rp3

inference inference inference




by pooling all the available data
organism
drug 2
drug 1 drug 3

(X1 , . . . , Xn ), Xi ∈ Rp , with n = n1 + n2 + n3 .

inference




by breaking the separability
organism
drug 2
drug 1 drug 3

(1) (1) (1) (2) (2) (2) (3) (3) (3)
(X1 , . . . , Xn1 ), Xi ∈ Rp1 (X1 , . . . , Xn2 ), Xi ∈ Rp2 (X1 , . . . , Xn3 ), Xi ∈ Rp3

inference


By introducing some prior

Priors should be biologically grounded
1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data
(directedness relies on the modelling).

G5

G4 G6

G2

G3 G7

G0 G1

G9

G8


By introducing some prior

Priors should be biologically grounded
1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data
(directedness relies on the modelling).

B3

B2 B4

A3

B1 B5

A1 A2

C2

C1


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Algorithms and methods
Overall view
Network inference
Model selection
Latent structure

Numerical experiments
Performance on simulated data
R package demo: the breast cancer data set


The graphical models: general settings

Assumption
A microarray can be represented as a multivariate Gaussian
vector X = (X(1), . . . , X(p)) ∈ Rp .

Collecting gene expression
1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.

Graphical interpretation
i conditional dependency between X(i) and X(j)
if and only if or
j non null partial correlation between X(i) and X(j)



Assumption
vector X = (X(1), . . . , X(p)) ∈ Rp .


i conditional dependency between X(i) and X(j)
? if and only if or
j non null partial correlation between X(i) and X(j)



Assumption
vector X = (X(1), . . . , X(p)) ∈ Rp .


i conditional dependency between Xt (i) and Xt−1 (j)
? if and only if or
j non null partial correlation between Xt (i) and Xt−1 (j)


The general statistical approach

Let Θ be the parameters to infer (the edges).

A penalized likelihood approach

ˆ
Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z),
Θ

L is the model log-likelihood,
Z is a latent clustering of the network,
pen 1
is a penalty function tuned by λ > 0.
It performs
1. regularization (needed when n p),
2. selection (sparsity induced by the 1 -norm),
3. model-driven inference (penalty adapted according to Z).


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



The Gaussian model for an i.i.d. sample

Let
X ∼ N (0p , Σ) with X1 , . . . , Xn i.i.d. copies of X,
X be the n × p matrix whose kth row is Xk ,
Θ = (θij )i,j∈P Σ−1 be the concentration matrix.

Since corij|P{i,j} = −θij / θii θjj for i = j,

 θij = 0
X(i) ⊥ X(j)|X(P{i, j}) ⇔
⊥ or
edge (i, j) ∈ network.
/


Θ describes the undirected graph of conditional
dependencies.


Neighborhood selection (1)

Let
Xi be the ith column of X,
Xi be X deprived of Xi .

θij
Xi = Xi β + ε, where βj = − .
θii

¨
Meinshausen and Bulhman, 2006
Since sign(corij|P{i,j} ) = sign(βj ), select the neighbors of i with

1 2
arg min Xi − Xi β 2
+λ β .
β n 1

The sign pattern of Θλ is inferred after a symmetrization step.


Neighborhood selection (2)

The pseudo log-likelihood of the i.i.d Gaussian sample is
p n
˜
Liid (Θ; S) = log P(Xk (i)|Xk (Pi); Θi ) ,
i=1 k=1
n n n
= log det(D) − Trace D−1/2 ΘSΘD−1/2 − log(2π),
2 2 2
where D = diag(Θ).

Proposition

Θpseudo = arg max Liid (Θ; S) − λ Θ
ˆ
λ
˜
1
Θ:θij =θii

has the same null entries as inferred by neighborhood selection.


The Gaussian likelihood for an i.i.d. sample

Let S = n−1 X X be the empirical variance-covariance matrix: S
is a sufﬁcient statistic of Θ.

The log-likelihood
n n n
Liid (Θ; S) = log det(Θ) − Trace(SΘ) + log(2π).
2 2 2

The MLE = S−1 of Θ is not deﬁned for n < p and never
sparse.
The need for regularization is huge.


Penalized log-likelihood

Banerjee et al., JMLR 2008

ˆ
Θλ = arg max Liid (Θ; S) − λ Θ ,
1
Θ

efficiently solved by the graphical L ASSO of Friedman et al, 2008.

Ambroise, Chiquet, Matias, EJS 2009
Use adaptive penalty parameters for different coefficients

Liid (Θ; S) − λ PZ Θ 1 ,

where PZ is a matrix of weights depending on the underlying
clustering Z.
Works with the pseudo log-likelihood (computationally
efficient).


Banerjee et al., JMLR 2008

ˆ
Θλ = arg max Liid (Θ; S) − λ Θ ,
1
Θ

efficiently solved by the graphical L ASSO of Friedman et al, 2008.

Ambroise, Chiquet, Matias, EJS 2009
Use adaptive penalty parameters for different coefficients
˜
Liid (Θ; S) − λ PZ Θ ,
1

where PZ is a matrix of weights depending on the underlying
clustering Z.
Works with the pseudo log-likelihood (computationally
efficient).

Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



The Gaussian model for time-course data (1)
Let X1 , . . . , Xn be a ﬁrst order vector autoregressive process

Xt = ΘXt−1 + b + εt , t ∈ [1, n]

where we are looking for Θ = (θij )i,j∈P and
X0 ∼ N (0p , Σ0 ),
εt is a Gaussian white noise with covariance σ 2 Ip ,
cov(Xt , εs ) = 0 for s > t, so that Xt is markovian.

since
cov (Xt (i), Xt−1 (j)|Xt−1 (Pj))
θij = ,
var (Xt−1 (j)|Xt−1 (Pj))

 θij = 0
Xt (i) ⊥ Xt−1 (j)|Xt−1 (Pj) ⇔
⊥ or
edge (j i) ∈ network
/



The Gaussian model for time-course data (2)

Let
X be the n × p matrix whose kth row is Xk ,
S = n−1 Xn Xn be the within time covariance matrix,
V = n−1 Xn X0 be the across time covariance matrix.

The log-likelihood
n
Ltime (Θ; S, V) = n Trace (VΘ) − Trace (Θ SΘ) + c.
2

The MLE = S−1 V of Θ is still not deﬁned for n < p.



Charbonnier, Chiquet, Ambroise, SAGMB 2010

ˆ
Θλ = arg max Ltime (Θ; S, V) − λ PZ Θ 1
Θ

where PZ is a (non-symmetric) matrix of weights depending on
the underlying clustering Z.

Major difference with the i.i.d. case
The graph is directed:

cov (Xt (i), Xt−1 (j)|Xt−1 (Pj))
θij =
var (Xt−1 (j)|Xt−1 (Pj))
cov (Xt (j), Xt−1 (i)|Xt−1 (Pi))
= .
var (Xt−1 (i)|Xt−1 (Pi))


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



Coupling related problems
Consider
T samples concerning the expressions of the same p genes,
(t) (t)
X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with
covariance matrix S(t) .

Multiple samples setup Go to scheme

Ignoring the relationships between the tasks leads to
T
arg max L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z).
Θ(t) ,t=1...,T t=1

Breaking the separability
Either by modifying the objective function
or the constraints.

Coupling related problems
Consider
T samples concerning the expressions of the same p genes,
(t) (t)
X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with
covariance matrix S(t) .

Multiple samples setup Go to scheme
Remarks
Ignoring the relationships between the tasks leads to
In the sequel, the Z is eluded for clarity (no loss of
generality). T
Multitask learning is easily adapted pen (Θ(t) , Z). data yet
arg max L(Θ(t) ; S(t) ) − λ to time-course
1
only steady statet=1
Θ (t) ,t=1...,T
version is presented here.

Breaking the separability
Either by modifying the objective function
or the constraints.

Coupling problems through the objective function

The Intertwined L ASSO

T
max ˜ ˜
L(Θ(t) ; S(t) ) − λ Θ(t) 1
Θ(t) ,t...,T
t=1

¯
S = n T nt S(t) is an “across-task” covariance matrix.
1
t=1
˜ ¯
S(t) = αS(t) + (1 − α)S is a mixture between inner/over-tasks
covariance matrices.

setting α = 0 is equivalent to pooling all the data and infer
one common network,
setting α = 1 is equivalent to treating T independent
problems.


Coupling problems by grouping variables (1)

Groups deﬁnition
Groups are the T -tuple composed by the (i, j) entries of
each Θ(t) , t = 1, . . . , T .
Most relationships between the genes are kept or removed
across all tasks simultaneously.

The graphical group-L ASSO

T T 1/2
˜ (t) (t) (t) 2
max L Θ ;S −λ θij .
Θ(t) ,t...,T
t=1 i,j∈P t=1
i=j


(2) (2)
β2 =0 β2 = 0.3

1 1

Group-L ASSO penalty
=0

(1) (1)
β1 β1
−1 1 −1 1
Assume
(2)
β1

−1
(1)
−1
(1)
2 tasks (T = 2)
β2 β2
2 coefﬁcients (p = 2)
1 1

Let represent the unit ball
= 0.3

(1) (1)
β1 β1
−1 1 −1 1
2 2 1/2
(2)

(t) 2
β1

−1 −1
βi ≤1
(1) (1)
β2 β2 i=1 t=1


Coupling problems by grouping variables (2)

Graphical group-L ASSO modiﬁcation

Inside a group, value are most likeliky sign consistent.

The graphical cooperative-L ASSO

T
max ˜
L S(t) ; Θ(t)
Θ(t) ,t...,T
t=1
 1/2 1/2

T T
(t) 2 (t) 2
 
−λ θij + θij ,
 + − 
i,j∈P t=1 t=1
i=j

where [u]+ = max(0, u) and [u]− = min(0, u).


(2) (2)
β2 =0 β2 = 0.3

Coop-L ASSO penalty
Assume
1 1

2 tasks (T = 2)
=0

(1) (1)
β1
−1 1
β1
−1 1 2 coefﬁcients (p = 2)
(2)
β1

−1 −1
β2
(1)
β2
(1)
Let represent the unit ball

2 2 1/2
1 1 2
(t)
βi
+
= 0.3

(1) (1) i=1 t=1
β1 β1
−1 1 −1 1
2 2 1/2
(2)

(t)
β1

−1 −1
+ −βi ≤1
(1) (1) +
β2 β2 i=1 t=1


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



The overall strategy
Our basic criteria is of the form
L(Θ; data) − λ PZ Θ 1 .

What we are looking for
the edges, through Θ,
the correct level of sparsity λ,
the underlying clustering Z with connectivity matrix πZ .

What SIMoNe does
1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]}
2. Select G that maximizes an information criteria
3. Learn Z on the selected network G
4. Infer a family of networks with PZ ∝ 1 − πZ
5. Select GZ that maximizes an information criteria


SIMoNe

Suppose you want toSIMoNE
recover a clustered network:

Graph

Target Adjacency Matrix

Target Network


SIMoNe

Start with microarray data
SIMoNE

Data


SIMoNe
SIMoNE

SIMoNE without prior

Adjacency Matrix
Data corresponding to G


SIMoNe
SIMoNE


Adjacency Matrix
Penalty matrix PZ Data corresponding to G

Decreasing transformation Mixer
πZ
Connectivity matrix


SIMoNe
SIMoNE


+

Adjacency Matrix Adjacency Matrix
Penalty matrix PZ Data corresponding to G corresponding to GZ

Decreasing transformation Mixer
πZ
Connectivity matrix


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



Monotask framework: problem decomposition

Consider the following reordering of Θ

Θii Θii Θii
Θ= , Θi = .
Θii θii θii

Block coordinate descent algorithm

arg max L(Θ; data) − λ pen 1 (Θ)
Θ

relies on p penalized, convex-optimization problems

arg min f (β; S) + λ pen 1 (β), (1)
β∈Rp−1

where f is convex and β = Θii for steady-state data.


Monotask framework: problem decomposition

Consider the following reordering of Θ

Θii Θii Θii
Θ= , Θi = .
Θii θii θii

Block coordinate descent algorithm

arg max L(Θ; data) − λ pen 1 (Θ)
Θ

relies on p penalized, convex-optimization problems

arg min f (β; S, V) + λ pen 1 (β), (1)
β∈Rp

where f is convex and β = Θi for time-course data.


Monotask framework: algorithms

1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 )
starts from S + λIp positive deﬁnite,
iterates on the columns of Θ−1 until stabilization,
both estimation and selection of Θ.

˜
2. steady-state: neighborhood selection (Liid (Θ) − λ Θ )
1
select signs patterns of Θii with the L ASSO,
only one pass per column required,
post-symmetrization needed.

3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 )
select and estimate Θi with the L ASSO,
only one pass per column required,
both estimation and selection.


Multitask framework: problem decomposition (1)
Consider the (p T ) × (p T ) block-diagonal matrix C composed by
the empirical covariance matrices of each tasks

0
 (1) 
S
C=
 .. ,

.
0 S(T )

and deﬁne
Remark
Let us consider multitask algorithms in the steady-state frame-
 (1)
Sii 0
  (1) 
S
work (easily adapted to time-course data)  ii 
..  , Cii =  .  .
 
Cii = 
 .   . 
.
(T ) (T )
0 Sii Sii

The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we
removed each line and each column pertaining to variable i.


Consider the (p T ) × (p T ) block-diagonal matrix C composed by
the empirical covariance matrices of each tasks

0
 (1) 
S
C=
 .. ,

.
0 S(T )

and deﬁne
(1) (1)
   
Sii 0 Sii
.. =  . .
   
Cii = 
 .  , Cii
  . 
.
(T ) (T )
0 Sii Sii

The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we
removed each line and each column pertaining to variable i.



Estimate the ith -columns of the T tasks bind together
T
arg max ˜
L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) )
Θ(t) ,t=1...,T t=1

is decomposed into p convex optimization problems

arg min f (β; C) + λ pen 1 (β),
β∈RT ×(p−1)

(t)
where we set β (t) = Θii and

β (1)
 

β =  .  ∈ RT ×(p−1) .
 . 
.
β (T )


Solving the sub-problem

Subdifferential approach

min L(β) = f (β) + pen 1 (β) ,
β∈RT ×(p−1)

β is a minimizer iif 0p ∈ ∂β L(β), with

∂β L(β) = β f (β) + λ∂β pen 1 (β).




min L(β) = f (β) + pen 1 (β) ,
β∈RT ×(p−1)


∂β L(β) = β f (β) + λ∂β pen 1 (β).

For the graphical Intertwined L ASSO
T
pen 1 (β) = β (t) ,
1
t=1

where the grouping effect is managed by the function f .




min L(β) = f (β) + pen 1 (β) ,
β∈RT ×(p−1)


∂β L(β) = β f (β) + λ∂β pen 1 (β).

For the graphical Group-L ASSO
p−1
[1:T ]
pen 1 (β) = βi ,
2
i=1

[1:T ] (1) (T )
where β i = βi , . . . , βi ∈ RT is the vector of the ith com-
ponent across tasks.



min L(β) = f (β) + pen 1 (β) ,
β∈RT ×(p−1)


∂β L(β) = β f (β) + λ∂β pen 1 (β).

For the graphical Coop-L ASSO
p−1
[1:T ] [1:T ]
pen 1 (β) = βi + −β i ,
+ 2 + 2
i=1

[1:T ] (1) (T )
where β i = βi , . . . , βi ∈ RT is the vector of the ith com-
ponent across tasks.

General active set algorithm: yellow belt
// 0. INITIALIZATION
β ← 0, A ← ∅
while 0 ∈ ∂β L(β) do
/
// 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
Find a solution h to the smooth problem

h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1
= h pen 1 .

βA ← βA + h
// 2. IDENTIFY NEWLY ZEROED VARIABLES

A ← A{i}

// 3. IDENTIFY NEW NON-ZERO VARIABLES
// Select a candidate i ∈ Ac

∂f (β)
i ← arg max vj , where vj = min ∂βj
+ λν
j∈Ac ν∈∂β gk
j

end

General active set algorithm: orange belt
β ← 0, A ← ∅
/

= h pen 1 .

βA ← βA + h

A ← A{i}

// Select a candidate i ∈ Ac which violates the more the optimality
conditions
∂f (β)
i ← arg max vj , where vj = min ∂βj
+ λν
j∈Ac ν∈∂β gk
j
if it exists such an i then
A ← A ∪ {i}
else
Stop and return β, which is optimal
end
end

General active set algorithm: green belt
β ← 0, A ← ∅
/

= h pen 1 .

βA ← βA + h
∂f (β)
while ∃i ∈ A : βi = 0 and min ∂β
+ λν = 0 do
ν∈∂β gk i
i
A ← A{i}
end
// Select a candidate i ∈ Ac such that an infinitesimal change of βi
provides the highest reduction of L
∂f (β)
i ← arg max vj , where vj = min ∂β
+ λν
ν∈∂β gk j
j∈Ac j
if vi = 0 then
A ← A ∪ {i}
else
Stop and return β, which is optimal
end
end

Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



Tuning the penalty parameter
What does the literature say?

Theory based penalty choices
√
1. Optimal order of penalty in the p n framework: n log p
Bunea et al. 2007, Bickel et al. 2009

2. Control on the probability of connecting two distinct
connectivity sets
Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009

practically much too conservative

Cross-validation
Optimal in terms of prediction, not in terms of selection
Problematic with small samples:
changes the sparsity constraint due to sample size


Tuning the penalty parameter
BIC / AIC

Theorem (Zou et al. 2008)

ˆlasso ˆlasso
df(βλ ) = βλ
0

Straightforward extensions to the graphical framework

ˆ ˆ log n
BIC(λ) = L(Θλ ; X) − df(Θλ )
2

ˆ ˆ
AIC(λ) = L(Θλ ; X) − df(Θλ )

Rely on asymptotic approximations, but still relevant for
small data set
˜
Easily adapted to Liid , Liid , Ltime and multitask framework.


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



MixNet
¨ ´
Erdos-Renyi Mixture for Networks

The data is now the network itself
Consider A = (aij )i,j∈P , the adjacency matrix associated to Θ:

aij = 1{θij =0} .

Latent structure modeling (Daudin et al., 2008)
Spread the nodes on a set Q = {1, . . . , q, . . . , Q} of classes with
α a Q–size vector giving αi = P(i ∈ q),
ziq = 1{i∈q} are independent hidden variables Zi ∼ M(1, α),
π a Q × Q matrix giving πq = P(aij = 1|i ∈ q, j ∈ ).

Connexion probabilities depends on the node class belonging:

aij |{Ziq Zj = 1} ∼ B(πq ).


Variational inference

Principle
Approximate P(Z|A, α, π) by Rτ (Z) chosen to minimize

KL(Rτ (Z); P(Z|A, α, π)),

where Rτ is such as log Rτ (Z) = iq Ziq log τiq and τ are the
variational parameters to optimize.

Variational Bayes (Latouche et al.)
Put appropriate priors on α and π,
Give good performances especially for the choice of Q
and is thus relevant in the SIMoNe context.


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



Network generation

Let ﬁx
the number p = card(P) of nodes,
if the graph is directed or not.

Afﬁliation matrix A = (aij )i,j∈P
1. usual MixNet framework
the Q × Q matrix Π, with πq = P(aij = 1|i ∈ q, j ∈ ),
the Q-size vector α with αq = P(i ∈ q).

2. constraint MixNet version
the Q × Q matrix Π, with πq = card{(i, j) ∈ P × P : i ∈ q, j ∈ },
the Q-size vector α with αq = card({i ∈ P : i ∈ q})/p.


Gaussian data generation
The Θ matrix
1. for undirected case, Θ is the concentration matrix
compute the normalized Laplacian of A,
generate a symmetric pattern of random signs.
2. for directed case, Θ represents the VAR(1) parameters
generate random correlations for aij = 0,
normalized by the eigen-value with greatest modulus,
generate a pattern of random signs.

The Gaussian sample X
1. for undirected case,
compute Σ by pseudo-inversion of Θ ,
generate the multivariate Gaussian sample with Cholesky
decomposition of Σ .
2. for directed case,
Θ permits to generate a stable VAR(1) process.

Example 1: time-course data with star-pattern

Simulation settings
1. 50 networks with p = 100 edges, time series of length n = 100,
2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.



Simulation settings
Boxplot of Precision values, without and with structure inference
0.8
0.6

precision = TP/(TP+FP)
0.4
0.2

precision wocl.BIC precision wocl.AIC



Simulation settings
Boxplot of Recall values, without and with structure inference
1.0
0.8
0.6

recall = TP/P (power)
0.4
0.2

recall wocl.BIC recall wcl.BIC recall wocl.AIC recall wcl.AIC



Simulation settings
Boxplot of Fallout values, without and with structure inference
0.04

● ●

●

●
●
●
● ●
0.03

● ●

●
●

fallout = FP/N (type I error)
0.02
0.01
0.00

fallout wocl.BIC fallout wcl.BIC fallout wocl.AIC fallout wcl.AIC


Example 2: steady-state, multitask framework

Simulating the tasks
1. generate a “ancestor” with p = 20 node and K = 20 edges,
2. generate T = 4 children by adding and deleting δ edges,
3. generate T = 4 Gaussian samples.

Figure: ancestor and children with δ perturbations


Multitask: simulation results

Precision/Recall curve ROC curve
precision = TP/(TP+FP) fallout = FP/N (type I error)
recall = TP/P (power) recall = TP/P (power)



penalty: λmax −→ 0 penalty: λmax −→ 0
1.0

1.0
0.8

0.8
0.6

0.6
precision

recall
0.4

0.4
CoopLasso CoopLasso
0.2

0.2
GroupLasso GroupLasso
Intertwined Intertwined
Independent Independent
Pooled Pooled
0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
recall fallout

Figure: nt = 25, δ = 1



1.0

1.0
0.8

0.8
0.6

0.6
precision

recall
0.4

0.4
CoopLasso CoopLasso
0.2

0.2
Pooled Pooled
0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
recall fallout




1.0

1.0
0.8

0.8
0.6

0.6
precision

recall
0.4

0.4
CoopLasso CoopLasso
0.2

0.2
Pooled Pooled
0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
recall fallout

Figure: nt = 100, δ = 1



1.0

1.0
0.8

0.8
0.6

0.6
precision

recall
0.4

0.4
CoopLasso CoopLasso
0.2

0.2
Pooled Pooled
0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
recall fallout

Figure: nt = 100, δ = 3



1.0

1.0
0.8

0.8
0.6

0.6
precision

recall
0.4

0.4
CoopLasso CoopLasso
0.2

0.2
Pooled Pooled
0.0

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
recall fallout

Figure: nt = 100, δ = 5


Outline

Statistical models
Steady-state data
Time-course data
Multitask learning

Overall view
Network inference
Model selection
Latent structure



Breast cancer
Prediction of the outcome of preoperative chemotherapy

Two types of patients
Patient response can be classiﬁed as
1. either a pathologic complete response (PCR),
2. or residual disease (not PCR).

Gene expression data
133 patients (99 not PCR, 34 PCR)
26 identiﬁed genes (differential analysis)


Pooling the data

cancer data: pooling approach

demo/cancer_pooled.swf

Multitask approach: PCR / not PCR

cancer data: graphical cooperative Lasso

demo/cancer_mtasks.swf

Conclusions

To sum-up
SIMoNe embeds most state-of-the-art statistical methods for
GGM inference based upon 1 -penalization,
both steady-state and time course data can be dealt with,
(hopefully) biologist-friendly R package.

Perspectives
Adding transversal tools such as
network comparison,
bootstrap to limit the number of false positives,
more critieria to choose the penalty parameter,
interface to Gene Ontology.


Publications

Ambroise, Chiquet, Matias, 2009.
Inferring sparse Gaussian graphical models with latent structure
Electronic Journal of Statistics, 3, 205-238.
Chiquet, Smith, Grasseau, Matias, Ambroise, 2009.
SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics,
25(3), 417-418.
Charbonnier, Chiquet, Ambroise, 2010.
Weighted-Lasso for Structured Network Inference from Time
Course Data., SAGMB, 9.
Chiquet, Grandvalet, Ambroise, arXiv preprint.
Inferring multiple Gaussian graphical models.


Publications

Ambroise, Chiquet, Matias, 2009.
Inferring sparse Gaussian graphical models with latent structure
Electronic Journal of Statistics, 3, 205-238.
Chiquet, Smith, Grasseau, Matias, Ambroise, 2009.
SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics,
25(3), 417-418.
Charbonnier, Chiquet, Ambroise, 2010.
Weighted-Lasso for Structured Network Inference from Time
Course Data., SAGMB, 9.
Chiquet, Grandvalet, Ambroise, arXiv preprint.
Inferring multiple Gaussian graphical models.
Working paper: Chiquet, Charbonnier, Ambroise, Grasseau.
SIMoNe: An R package for inferring Gausssian networks with
latent structure, Journal of Statistical Softwares.
Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin.
Biological analysis of breast cancer by multitasks learning.

SIMoNe: Statistical Iference for MOdular NEtworks

Recommandé

Recommandé

Contenu connexe

Plus de Laboratoire Statistique et génome

Plus de Laboratoire Statistique et génome (6)

Dernier

Dernier (20)

SIMoNe: Statistical Iference for MOdular NEtworks