SlideShare une entreprise Scribd logo
1  sur  100
Télécharger pour lire hors ligne
SIMoNe
        An R package for inferring Gausssian networks with latent
                                clustering


      Julien Chiquet (and Camille, Christophe, Gilles, Catherine, Yves)

                                                                 ´
                                     Laboratoire Statistique et Genome,
                                           ´                   ´   ´
                                       La genopole - Universite d’Evry


                                                 SSB – 13 avril 2010




SIMoNe: inferring Gaussian networks with latent clustering                1
Problem




                                        Inference




       n ≈ 10s/100s of slides
         g ≈ 1000s of genes                                  Which interactions?
   O(g 2 )   parameters (edges) !




          The main statistical issue is the high dimensional setting



SIMoNe: inferring Gaussian networks with latent clustering                         2
Handling the scarcity of data (1)
 By reducing the number of parameters



  Assumption
  Connections will only appear between informative genes


                                                               select p key genes P
                                       differential analysis   p “reasonable” compared to n

                                                               typically, n ∈ [p/5; 5p]




                                                               the learning dataset
                                                  inference    n size–p vectors of expression

                                                               (X1 , . . . , Xn ) with Xi ∈ Rp




SIMoNe: inferring Gaussian networks with latent clustering                                       3
Handling the scarcity of data (2)
 By collecting as many observations as possible

  Multitask learning                   Go to learning

  How should we merge the data?
                                                  organism
                                                 drug 2
                            drug 1                           drug 3




SIMoNe: inferring Gaussian networks with latent clustering            4
Handling the scarcity of data (2)
 By collecting as many observations as possible

  Multitask learning                         Go to learning

  by inferring each network independently
                                                       organism
                                                     drug 2
                                   drug 1                                                  drug 3




       (1)            (1)      (1)                   (2)            (2)      (2)                    (3)            (3)      (3)
   (X1       , . . . , Xn1 ), Xi     ∈ Rp1       (X1       , . . . , Xn2 ), Xi     ∈ Rp2        (X1       , . . . , Xn3 ), Xi     ∈ Rp3

               inference                                         inference                                           inference




SIMoNe: inferring Gaussian networks with latent clustering                                                                                4
Handling the scarcity of data (2)
 By collecting as many observations as possible

  Multitask learning                     Go to learning

  by pooling all the available data
                                                    organism
                                                  drug 2
                            drug 1                                              drug 3




                                     (X1 , . . . , Xn ), Xi ∈ Rp , with n = n1 + n2 + n3 .

                                                           inference




SIMoNe: inferring Gaussian networks with latent clustering                                   4
Handling the scarcity of data (2)
 By collecting as many observations as possible

  Multitask learning                         Go to learning

  by breaking the separability
                                                       organism
                                                     drug 2
                                   drug 1                                                  drug 3




       (1)            (1)      (1)                   (2)            (2)      (2)                    (3)            (3)      (3)
   (X1       , . . . , Xn1 ), Xi     ∈ Rp1       (X1       , . . . , Xn2 ), Xi     ∈ Rp2        (X1       , . . . , Xn3 ), Xi     ∈ Rp3

                                                                 inference




SIMoNe: inferring Gaussian networks with latent clustering                                                                                4
Handling the scarcity of data (3)
 By introducing some prior

  Priors should be biologically grounded
    1. few genes effectively interact (sparsity),
    2. networks are organized (latent clustering),
    3. steady-state or time-course data
       (directedness relies on the modelling).

                                                    G5



                                          G4                  G6


                      G2


                                               G3        G7


                 G0        G1


                                                                   G9


                                                         G8




SIMoNe: inferring Gaussian networks with latent clustering              5
Handling the scarcity of data (3)
 By introducing some prior

  Priors should be biologically grounded
    1. few genes effectively interact (sparsity),
    2. networks are organized (latent clustering),
    3. steady-state or time-course data
       (directedness relies on the modelling).

                                                    G5



                                          G4                  G6


                      G2


                                               G3        G7


                 G0        G1


                                                                   G9


                                                         G8




SIMoNe: inferring Gaussian networks with latent clustering              5
Handling the scarcity of data (3)
 By introducing some prior

  Priors should be biologically grounded
    1. few genes effectively interact (sparsity),
    2. networks are organized (latent clustering),
    3. steady-state or time-course data
       (directedness relies on the modelling).

                                                    B3



                                          B2                   B4


                      A3


                                               B1         B5


                 A1        A2


                                                                    C2


                                                         C1




SIMoNe: inferring Gaussian networks with latent clustering               5
Handling the scarcity of data (3)
 By introducing some prior

  Priors should be biologically grounded
    1. few genes effectively interact (sparsity),
    2. networks are organized (latent clustering),
    3. steady-state or time-course data
       (directedness relies on the modelling).

                                                    B3



                                          B2                   B4


                      A3


                                               B1         B5


                 A1        A2


                                                                    C2


                                                         C1




SIMoNe: inferring Gaussian networks with latent clustering               5
Handling the scarcity of data (3)
 By introducing some prior

  Priors should be biologically grounded
    1. few genes effectively interact (sparsity),
    2. networks are organized (latent clustering),
    3. steady-state or time-course data
       (directedness relies on the modelling).

                                                    B3



                                          B2                   B4


                      A3


                                               B1         B5


                 A1        A2


                                                                    C2


                                                         C1




SIMoNe: inferring Gaussian networks with latent clustering               5
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   6
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   6
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   6
The graphical models: general settings

  Assumption
  A microarray can be represented as a multivariate Gaussian
  vector X = (X(1), . . . , X(p)) ∈ Rp .

  Collecting gene expression
    1. Steady-state data leads to an i.i.d. sample.
    2. Time-course data gives a time series.


  Graphical interpretation
    i                                            conditional dependency between X(i) and X(j)
                          if and only if                                    or
              j                                  non null partial correlation between X(i) and X(j)




SIMoNe: inferring Gaussian networks with latent clustering                                            7
The graphical models: general settings

  Assumption
  A microarray can be represented as a multivariate Gaussian
  vector X = (X(1), . . . , X(p)) ∈ Rp .

  Collecting gene expression
    1. Steady-state data leads to an i.i.d. sample.
    2. Time-course data gives a time series.


  Graphical interpretation
    i                                            conditional dependency between X(i) and X(j)
        ?                 if and only if                                    or
              j                                  non null partial correlation between X(i) and X(j)




SIMoNe: inferring Gaussian networks with latent clustering                                            7
The graphical models: general settings

  Assumption
  A microarray can be represented as a multivariate Gaussian
  vector X = (X(1), . . . , X(p)) ∈ Rp .

  Collecting gene expression
    1. Steady-state data leads to an i.i.d. sample.
    2. Time-course data gives a time series.


  Graphical interpretation
    i                                          conditional dependency between Xt (i) and Xt−1 (j)
        ?                 if and only if                                   or
              j                                non null partial correlation between Xt (i) and Xt−1 (j)




SIMoNe: inferring Gaussian networks with latent clustering                                            7
The general statistical approach

  Let Θ be the parameters to infer (the edges).

  A penalized likelihood approach

                         ˆ
                         Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z),
                                            Θ



         L is the model log-likelihood,
         Z is a latent clustering of the network,
         pen     1
                     is a penalty function tuned by λ > 0.
         It performs
            1. regularization (needed when n     p),
            2. selection (sparsity induced by the 1 -norm),
            3. model-driven inference (penalty adapted according to Z).


SIMoNe: inferring Gaussian networks with latent clustering                8
The general statistical approach

  Let Θ be the parameters to infer (the edges).

  A penalized likelihood approach

                         ˆ
                         Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z),
                                            Θ



         L is the model log-likelihood,
         Z is a latent clustering of the network,
         pen     1
                     is a penalty function tuned by λ > 0.
         It performs
            1. regularization (needed when n     p),
            2. selection (sparsity induced by the 1 -norm),
            3. model-driven inference (penalty adapted according to Z).


SIMoNe: inferring Gaussian networks with latent clustering                8
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   9
The Gaussian model for an i.i.d. sample

  Let
         X ∼ N (0p , Σ) with X1 , . . . , Xn i.i.d. copies of X,
         X be the n × p matrix whose kth row is Xk ,
         Θ = (θij )i,j∈P              Σ−1 be the concentration matrix.

  Graphical interpretation
  Since corij|P{i,j} = −θij / θii θjj for i = j,
                                                             
                                                                     θij = 0
            X(i) ⊥ X(j)|X(P{i, j}) ⇔
                 ⊥                                                       or
                                                               edge (i, j) ∈ network.
                                                                           /
                                                             



    Θ describes the undirected graph of conditional
  dependencies.

SIMoNe: inferring Gaussian networks with latent clustering                              10
Neighborhood selection (1)

  Let
         Xi be the ith column of X,
         Xi be X deprived of Xi .

                                                                            θij
                              Xi = Xi β + ε,                where βj = −       .
                                                                            θii

                   ¨
  Meinshausen and Bulhman, 2006
  Since sign(corij|P{i,j} ) = sign(βj ), select the neighbors of i with

                                             1                   2
                               arg min         Xi − Xi β        2
                                                                     +λ β        .
                                    β        n                               1




       The sign pattern of Θλ is inferred after a symmetrization step.

SIMoNe: inferring Gaussian networks with latent clustering                           11
Neighborhood selection (2)

  The pseudo log-likelihood of the i.i.d Gaussian sample is
                          p        n
   ˜
   Liid (Θ; S) =                       log P(Xk (i)|Xk (Pi); Θi ) ,
                        i=1      k=1
                     n            n                      n
                    = log det(D) − Trace D−1/2 ΘSΘD−1/2 − log(2π),
                     2            2                      2
  where D = diag(Θ).

  Proposition

                           Θpseudo = arg max Liid (Θ; S) − λ Θ
                           ˆ
                             λ
                                             ˜
                                                                       1
                                              Θ:θij =θii

  has the same null entries as inferred by neighborhood selection.


SIMoNe: inferring Gaussian networks with latent clustering                 12
The Gaussian likelihood for an i.i.d. sample

  Let S = n−1 X X be the empirical variance-covariance matrix: S
  is a sufficient statistic of Θ.


  The log-likelihood
                                     n             n           n
               Liid (Θ; S) =           log det(Θ) − Trace(SΘ) + log(2π).
                                     2             2           2



         The MLE = S−1 of Θ is not defined for n < p and never
         sparse.
         The need for regularization is huge.



SIMoNe: inferring Gaussian networks with latent clustering                 13
Penalized log-likelihood

  Banerjee et al., JMLR 2008

                               ˆ
                               Θλ = arg max Liid (Θ; S) − λ Θ             ,
                                                                      1
                                               Θ

  efficiently solved by the graphical L ASSO of Friedman et al, 2008.


  Ambroise, Chiquet, Matias, EJS 2009
  Use adaptive penalty parameters for different coefficients

                                       Liid (Θ; S) − λ PZ Θ   1   ,

  where PZ is a matrix of weights depending on the underlying
  clustering Z.
     Works with the pseudo log-likelihood (computationally
  efficient).
SIMoNe: inferring Gaussian networks with latent clustering                    14
Penalized log-likelihood

  Banerjee et al., JMLR 2008

                               ˆ
                               Θλ = arg max Liid (Θ; S) − λ Θ             ,
                                                                      1
                                               Θ

  efficiently solved by the graphical L ASSO of Friedman et al, 2008.


  Ambroise, Chiquet, Matias, EJS 2009
  Use adaptive penalty parameters for different coefficients
                                       ˜
                                       Liid (Θ; S) − λ PZ Θ       ,
                                                              1


  where PZ is a matrix of weights depending on the underlying
  clustering Z.
     Works with the pseudo log-likelihood (computationally
  efficient).
SIMoNe: inferring Gaussian networks with latent clustering                    14
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   15
The Gaussian model for time-course data (1)
  Let X1 , . . . , Xn be a first order vector autoregressive process

                                 Xt = ΘXt−1 + b + εt ,       t ∈ [1, n]

  where we are looking for Θ = (θij )i,j∈P and
     X0 ∼ N (0p , Σ0 ),
     εt is a Gaussian white noise with covariance σ 2 Ip ,
     cov(Xt , εs ) = 0 for s > t, so that Xt is markovian.

  Graphical interpretation
  since
                            cov (Xt (i), Xt−1 (j)|Xt−1 (Pj))
                             θij =                             ,
                               var (Xt−1 (j)|Xt−1 (Pj))
                                           
                                                        θij = 0
          Xt (i) ⊥ Xt−1 (j)|Xt−1 (Pj) ⇔
                 ⊥                                          or
                                               edge (j i) ∈ network
                                                               /
                                           

SIMoNe: inferring Gaussian networks with latent clustering                16
The Gaussian model for time-course data (2)

  Let
         X be the n × p matrix whose kth row is Xk ,
         S = n−1 Xn Xn be the within time covariance matrix,
         V = n−1 Xn X0 be the across time covariance matrix.


  The log-likelihood
                                                             n
               Ltime (Θ; S, V) = n Trace (VΘ) −                Trace (Θ SΘ) + c.
                                                             2


       The MLE = S−1 V of Θ is still not defined for n < p.


SIMoNe: inferring Gaussian networks with latent clustering                         17
Penalized log-likelihood

  Charbonnier, Chiquet, Ambroise, SAGMB 2010

                       ˆ
                       Θλ = arg max Ltime (Θ; S, V) − λ PZ Θ        1
                                       Θ

  where PZ is a (non-symmetric) matrix of weights depending on
  the underlying clustering Z.

  Major difference with the i.i.d. case
  The graph is directed:

               cov (Xt (i), Xt−1 (j)|Xt−1 (Pj))
     θij =
                  var (Xt−1 (j)|Xt−1 (Pj))
                                              cov (Xt (j), Xt−1 (i)|Xt−1 (Pi))
                                            =                                   .
                                                 var (Xt−1 (i)|Xt−1 (Pi))


SIMoNe: inferring Gaussian networks with latent clustering                          18
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   19
Coupling related problems
  Consider
         T samples concerning the expressions of the same p genes,
             (t)            (t)
         X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with
         covariance matrix S(t) .

  Multiple samples setup                          Go to scheme

  Ignoring the relationships between the tasks leads to
                                          T
                        arg max                L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z).
                      Θ(t) ,t=1...,T     t=1



  Breaking the separability
         Either by modifying the objective function
         or the constraints.
SIMoNe: inferring Gaussian networks with latent clustering                             20
Coupling related problems
  Consider
         T samples concerning the expressions of the same p genes,
             (t)            (t)
         X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with
         covariance matrix S(t) .

  Multiple samples setup                          Go to scheme

  Ignoring the relationships between the tasks leads to
                                          T
                        arg max                L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z).
                      Θ(t) ,t=1...,T     t=1



  Breaking the separability
         Either by modifying the objective function
         or the constraints.
SIMoNe: inferring Gaussian networks with latent clustering                             20
Coupling related problems
  Consider
         T samples concerning the expressions of the same p genes,
             (t)            (t)
         X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with
         covariance matrix S(t) .

  Multiple samples setup Go to scheme
  Remarks
  Ignoring the relationships between the tasks leads to
      In the sequel, the Z is eluded for clarity (no loss of
      generality).       T
         Multitask learning is easily adapted pen (Θ(t) , Z). data yet
                 arg max          L(Θ(t) ; S(t) ) − λ to time-course
                                                           1
         only steady statet=1
                Θ (t) ,t=1...,T
                                version is presented here.


  Breaking the separability
         Either by modifying the objective function
         or the constraints.
SIMoNe: inferring Gaussian networks with latent clustering                    20
Coupling problems through the objective function

  The Intertwined L ASSO

                                              T
                                 max               ˜        ˜
                                                   L(Θ(t) ; S(t) ) − λ Θ(t)   1
                              Θ(t) ,t...,T
                                             t=1

         ¯
         S = n T nt S(t) is an “across-task” covariance matrix.
              1
                  t=1
         ˜                     ¯
         S(t) = αS(t) + (1 − α)S is a mixture between inner/over-tasks
         covariance matrices.


         setting α = 0 is equivalent to pooling all the data and infer
         one common network,
         setting α = 1 is equivalent to treating T independent
         problems.


SIMoNe: inferring Gaussian networks with latent clustering                        21
Coupling problems by grouping variables (1)

  Groups definition
         Groups are the T -tuple composed by the (i, j) entries of
         each Θ(t) , t = 1, . . . , T .
         Most relationships between the genes are kept or removed
         across all tasks simultaneously.


  The graphical group-L ASSO

                                T                                         T              1/2
                                     ˜       (t)     (t)                         (t) 2
                   max               L Θ ;S                  −λ                 θij            .
                Θ(t) ,t...,T
                               t=1                                i,j∈P   t=1
                                                                   i=j




SIMoNe: inferring Gaussian networks with latent clustering                                         22
(2)                          (2)
                       β2       =0                   β2        = 0.3




                            1                              1


                                                                           Group-L ASSO penalty
      =0




                (1)                           (1)
              β1                            β1
                  −1                 1          −1                     1
                                                                           Assume
     (2)
      β1




                          −1
                            (1)
                                                          −1
                                                            (1)
                                                                               2 tasks (T = 2)
                          β2                              β2
                                                                               2 coefficients (p = 2)
                            1                              1

                                                                           Let represent the unit ball
      = 0.3




                (1)                           (1)
              β1                            β1
                  −1                 1          −1                     1
                                                                               2     2             1/2
     (2)




                                                                                           (t) 2
      β1




                          −1                              −1
                                                                                          βi             ≤1
                            (1)                             (1)
                          β2                              β2                  i=1   t=1

SIMoNe: inferring Gaussian networks with latent clustering                                                    23
Coupling problems by grouping variables (2)

  Graphical group-L ASSO modification

         Inside a group, value are most likeliky sign consistent.


  The graphical cooperative-L ASSO

                     T
         max               ˜
                           L S(t) ; Θ(t)
      Θ(t) ,t...,T
                     t=1
                                                                     1/2                      1/2
                                                                                                     
                                                      T                         T
                                                              (t) 2                    (t) 2
                                                                                                    
                               −λ                            θij            +         θij                ,
                                                                 +                        −         
                                      i,j∈P          t=1                        t=1
                                       i=j

  where [u]+ = max(0, u) and [u]− = min(0, u).

SIMoNe: inferring Gaussian networks with latent clustering                                                   24
(2)                          (2)
                       β2       =0                   β2        = 0.3




                                                                           Coop-L ASSO penalty
                                                                           Assume
                            1                              1

                                                                                     2 tasks (T = 2)
      =0




                (1)                           (1)
              β1
                  −1                 1
                                            β1
                                                −1                     1             2 coefficients (p = 2)
     (2)
      β1




                          −1                              −1
                          β2
                            (1)
                                                          β2
                                                            (1)
                                                                           Let represent the unit ball

                                                                                2        2                1/2
                            1                              1                                          2
                                                                                               (t)
                                                                                              βi
                                                                                                      +
      = 0.3




                (1)                           (1)                              i=1      t=1
              β1                            β1
                  −1                 1          −1                     1
                                                                                2       2                 1/2
     (2)




                                                                                                (t)
      β1




                          −1                              −1
                                                                           +                  −βi               ≤1
                            (1)                             (1)                                       +
                          β2                              β2                   i=1     t=1

SIMoNe: inferring Gaussian networks with latent clustering                                                       25
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   26
The overall strategy
  Our basic criteria is of the form
                                     L(Θ; data) − λ PZ Θ     1   .

  What we are looking for
         the edges, through Θ,
         the correct level of sparsity λ,
         the underlying clustering Z with connectivity matrix πZ .

  What SIMoNe does
    1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]}
    2. Select G that maximizes an information criteria
    3. Learn Z on the selected network G
    4. Infer a family of networks with PZ ∝ 1 − πZ
    5. Select GZ that maximizes an information criteria

SIMoNe: inferring Gaussian networks with latent clustering           27
The overall strategy
  Our basic criteria is of the form
                                     L(Θ; data) − λ PZ Θ     1   .

  What we are looking for
         the edges, through Θ,
         the correct level of sparsity λ,
         the underlying clustering Z with connectivity matrix πZ .

  What SIMoNe does
    1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]}
    2. Select G that maximizes an information criteria
    3. Learn Z on the selected network G
    4. Infer a family of networks with PZ ∝ 1 − πZ
    5. Select GZ that maximizes an information criteria

SIMoNe: inferring Gaussian networks with latent clustering           27
SIMoNe

                 Suppose you want toSIMoNE
                                     recover a clustered network:




                                                                   Graph




                                    Target Adjacency Matrix

                                                              Target Network




SIMoNe: inferring Gaussian networks with latent clustering                     28
SIMoNe

                    Start with microarray data
                                        SIMoNE




                                             Data




SIMoNe: inferring Gaussian networks with latent clustering   28
SIMoNe
                                                        SIMoNE




                                             SIMoNE without prior




                                                                  Adjacency Matrix
                                             Data                corresponding to G




SIMoNe: inferring Gaussian networks with latent clustering                            28
SIMoNe
                                                        SIMoNE




                                             SIMoNE without prior




                                                                    Adjacency Matrix
          Penalty matrix PZ                  Data                  corresponding to G


                   Decreasing transformation                       Mixer
                                                     πZ
                                             Connectivity matrix

SIMoNe: inferring Gaussian networks with latent clustering                              28
SIMoNe
                                                        SIMoNE




                                             SIMoNE without prior




                                +


                                                                    Adjacency Matrix     Adjacency Matrix
          Penalty matrix PZ                  Data                  corresponding to G   corresponding to GZ


                   Decreasing transformation                       Mixer
                                                     πZ
                                             Connectivity matrix

SIMoNe: inferring Gaussian networks with latent clustering                                                    28
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   29
Monotask framework: problem decomposition

  Consider the following reordering of Θ

                                          Θii Θii                Θii
                              Θ=                     ,       Θi =        .
                                          Θii  θii                 θii


  Block coordinate descent algorithm

                                 arg max L(Θ; data) − λ pen 1 (Θ)
                                      Θ

  relies on p penalized, convex-optimization problems

                                    arg min f (β; S) + λ pen 1 (β),          (1)
                                     β∈Rp−1

  where f is convex and β = Θii for steady-state data.

SIMoNe: inferring Gaussian networks with latent clustering                     30
Monotask framework: problem decomposition

  Consider the following reordering of Θ

                                          Θii Θii                Θii
                              Θ=                     ,       Θi =        .
                                          Θii  θii                 θii


  Block coordinate descent algorithm

                                 arg max L(Θ; data) − λ pen 1 (Θ)
                                      Θ

  relies on p penalized, convex-optimization problems

                                 arg min f (β; S, V) + λ pen 1 (β),          (1)
                                    β∈Rp

  where f is convex and β = Θi for time-course data.

SIMoNe: inferring Gaussian networks with latent clustering                     30
Monotask framework: algorithms

    1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ                1   )
                 starts from S + λIp positive definite,
                 iterates on the columns of Θ−1 until stabilization,
                 both estimation and selection of Θ.

                                             ˜
    2. steady-state: neighborhood selection (Liid (Θ) − λ Θ                           )
                                                                                  1
                 select signs patterns of Θii with the L ASSO,
                 only one pass per column required,
                 post-symmetrization needed.


    3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ                     1   )
                 select and estimate Θi with the L ASSO,
                 only one pass per column required,
                 both estimation and selection.


SIMoNe: inferring Gaussian networks with latent clustering                                31
Monotask framework: algorithms

    1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ                1   )
                 starts from S + λIp positive definite,
                 iterates on the columns of Θ−1 until stabilization,
                 both estimation and selection of Θ.

                                             ˜
    2. steady-state: neighborhood selection (Liid (Θ) − λ Θ                           )
                                                                                  1
                 select signs patterns of Θii with the L ASSO,
                 only one pass per column required,
                 post-symmetrization needed.


    3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ                     1   )
                 select and estimate Θi with the L ASSO,
                 only one pass per column required,
                 both estimation and selection.


SIMoNe: inferring Gaussian networks with latent clustering                                31
Monotask framework: algorithms

    1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ                1   )
                 starts from S + λIp positive definite,
                 iterates on the columns of Θ−1 until stabilization,
                 both estimation and selection of Θ.

                                             ˜
    2. steady-state: neighborhood selection (Liid (Θ) − λ Θ                           )
                                                                                  1
                 select signs patterns of Θii with the L ASSO,
                 only one pass per column required,
                 post-symmetrization needed.


    3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ                     1   )
                 select and estimate Θi with the L ASSO,
                 only one pass per column required,
                 both estimation and selection.


SIMoNe: inferring Gaussian networks with latent clustering                                31
Multitask framework: problem decomposition (1)
  Consider the (p T ) × (p T ) block-diagonal matrix C composed by
  the empirical covariance matrices of each tasks

                                           0
                                (1)         
                                 S
                         C=
                                     ..     ,
                                             
                                         .
                                                     0       S(T )

  and define
  Remark
  Let us consider multitask algorithms in the steady-state frame-
                      (1)
                       Sii       0
                                                   (1) 
                                                    S
  work (easily adapted to time-course data)  ii 
                             ..          , Cii =  .  .
                                       
             Cii = 
                               .                  . 
                                                      .
                                   (T )              (T )
                        0         Sii             Sii

  The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we
  removed each line and each column pertaining to variable i.

SIMoNe: inferring Gaussian networks with latent clustering           32
Multitask framework: problem decomposition (1)
  Consider the (p T ) × (p T ) block-diagonal matrix C composed by
  the empirical covariance matrices of each tasks

                                           0
                                (1)         
                                 S
                         C=
                                     ..     ,
                                             
                                         .
                                                       0                S(T )

  and define
                                         (1)                                         (1)
                                                                                       
                                       Sii                 0                      Sii
                                                  ..                            =  . .
                                                                                       
                      Cii = 
                                                      .             , Cii
                                                                                    . 
                                                                                      .
                                                             (T )                    (T )
                                         0                 Sii                    Sii

  The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we
  removed each line and each column pertaining to variable i.

SIMoNe: inferring Gaussian networks with latent clustering                                    32
Multitask framework: problem decomposition (2)

  Estimate the ith -columns of the T tasks bind together
                                            T
                           arg max               ˜
                                                 L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) )
                         Θ(t) ,t=1...,T    t=1

  is decomposed into p convex optimization problems

                                    arg min f (β; C) + λ pen 1 (β),
                                  β∈RT ×(p−1)


                                          (t)
  where we set β (t) = Θii and

                                            β (1)
                                                 

                                       β =  .  ∈ RT ×(p−1) .
                                            . 
                                              .
                                                    β (T )

SIMoNe: inferring Gaussian networks with latent clustering                           33
Solving the sub-problem

  Subdifferential approach

                                   min         L(β) = f (β) + pen 1 (β) ,
                              β∈RT ×(p−1)

  β is a minimizer iif 0p ∈ ∂β L(β), with

                                ∂β L(β) =           β f (β)   + λ∂β pen 1 (β).




SIMoNe: inferring Gaussian networks with latent clustering                       34
Solving the sub-problem

  Subdifferential approach

                                   min         L(β) = f (β) + pen 1 (β) ,
                              β∈RT ×(p−1)

  β is a minimizer iif 0p ∈ ∂β L(β), with

                                ∂β L(β) =           β f (β)      + λ∂β pen 1 (β).

  For the graphical Intertwined L ASSO
                                                             T
                                        pen 1 (β) =                β (t)       ,
                                                                           1
                                                             t=1

  where the grouping effect is managed by the function f .


SIMoNe: inferring Gaussian networks with latent clustering                          34
Solving the sub-problem

  Subdifferential approach

                                   min         L(β) = f (β) + pen 1 (β) ,
                              β∈RT ×(p−1)

  β is a minimizer iif 0p ∈ ∂β L(β), with

                                ∂β L(β) =           β f (β)        + λ∂β pen 1 (β).

  For the graphical Group-L ASSO
                                                             p−1
                                                                      [1:T ]
                                       pen 1 (β) =                  βi             ,
                                                                               2
                                                             i=1

                 [1:T ]          (1)            (T )
  where β i  = βi , . . . , βi                               ∈ RT is the vector of the ith com-
  ponent across tasks.
SIMoNe: inferring Gaussian networks with latent clustering                                    34
Solving the sub-problem

  Subdifferential approach

                                   min         L(β) = f (β) + pen 1 (β) ,
                              β∈RT ×(p−1)

  β is a minimizer iif 0p ∈ ∂β L(β), with

                                ∂β L(β) =            β f (β)    + λ∂β pen 1 (β).

  For the graphical Coop-L ASSO
                                   p−1
                                                   [1:T ]                 [1:T ]
               pen 1 (β) =                      βi                 +   −β i              ,
                                                             + 2                   + 2
                                   i=1

                 [1:T ]          (1)            (T )
  where β i  = βi , . . . , βi                               ∈ RT is the vector of the ith com-
  ponent across tasks.
SIMoNe: inferring Gaussian networks with latent clustering                                    34
General active set algorithm:                                                yellow belt
  // 0. INITIALIZATION
  β ← 0, A ← ∅
  while 0 ∈ ∂β L(β) do
          /
      // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
      Find a solution h to the smooth problem

                 h f (β A   + h) + λ∂h pen 1 (β A + h) = 0,              where ∂h pen   1
                                                                                            =   h pen 1   .

         βA ← βA + h
        // 2. IDENTIFY NEWLY ZEROED VARIABLES

        A ← A{i}

        // 3. IDENTIFY NEW NON-ZERO VARIABLES
        // Select a candidate i ∈ Ac

                                                         ∂f (β)
        i ← arg max vj , where vj =            min        ∂βj
                                                                  + λν
               j∈Ac                         ν∈∂β gk
                                                  j




  end
SIMoNe: inferring Gaussian networks with latent clustering                                                    35
General active set algorithm:                                                orange belt
  // 0. INITIALIZATION
  β ← 0, A ← ∅
  while 0 ∈ ∂β L(β) do
          /
      // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
      Find a solution h to the smooth problem

                 h f (β A   + h) + λ∂h pen 1 (β A + h) = 0,              where ∂h pen   1
                                                                                            =   h pen 1   .

         βA ← βA + h
        // 2. IDENTIFY NEWLY ZEROED VARIABLES

        A ← A{i}

        // 3. IDENTIFY NEW NON-ZERO VARIABLES
        // Select a candidate i ∈ Ac which violates the more the optimality
        conditions
                                                         ∂f (β)
        i ← arg max vj , where vj =            min        ∂βj
                                                                  + λν
               j∈Ac                         ν∈∂β gk
                                                j
        if it exists such an i then
               A ← A ∪ {i}
        else
               Stop and return β, which is optimal
        end
  end
SIMoNe: inferring Gaussian networks with latent clustering                                                    35
General active set algorithm:                                            green belt
  // 0. INITIALIZATION
  β ← 0, A ← ∅
  while 0 ∈ ∂β L(β) do
          /
      // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A
      Find a solution h to the smooth problem

                 h f (β A   + h) + λ∂h pen 1 (β A + h) = 0,          where ∂h pen   1
                                                                                        =   h pen 1   .

         βA ← βA + h
        // 2. IDENTIFY NEWLY ZEROED VARIABLES
                                      ∂f (β)
        while ∃i ∈ A : βi = 0 and min  ∂β
                                             + λν = 0 do
                                          ν∈∂β gk            i
                                                i
             A ← A{i}
        end
        // 3. IDENTIFY NEW NON-ZERO VARIABLES
        // Select a candidate i ∈ Ac such that an infinitesimal change of βi
        provides the highest reduction of L
                                         ∂f (β)
        i ← arg max vj , where vj = min   ∂β
                                                + λν
                                            ν∈∂β gk              j
               j∈Ac                                 j
        if vi = 0 then
              A ← A ∪ {i}
        else
              Stop and return β, which is optimal
        end
  end
SIMoNe: inferring Gaussian networks with latent clustering                                                35
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   36
Tuning the penalty parameter
 What does the literature say?


  Theory based penalty choices
                                                                                   √
    1. Optimal order of penalty in the p                       n framework:            n log p
                                                             Bunea et al. 2007, Bickel et al. 2009

    2. Control on the probability of connecting two distinct
       connectivity sets
                             Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009

         practically much too conservative

  Cross-validation
         Optimal in terms of prediction, not in terms of selection
         Problematic with small samples:
         changes the sparsity constraint due to sample size


SIMoNe: inferring Gaussian networks with latent clustering                                           37
Tuning the penalty parameter
 BIC / AIC


  Theorem (Zou et al. 2008)

                                              ˆlasso ˆlasso
                                           df(βλ ) = βλ
                                                              0


  Straightforward extensions to the graphical framework

                                          ˆ            ˆ log n
                               BIC(λ) = L(Θλ ; X) − df(Θλ )
                                                            2

                                               ˆ            ˆ
                                    AIC(λ) = L(Θλ ; X) − df(Θλ )

         Rely on asymptotic approximations, but still relevant for
         small data set
                                  ˜
         Easily adapted to Liid , Liid , Ltime and multitask framework.

SIMoNe: inferring Gaussian networks with latent clustering                38
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   39
MixNet
    ¨   ´
 Erdos-Renyi Mixture for Networks

  The data is now the network itself
  Consider A = (aij )i,j∈P , the adjacency matrix associated to Θ:

                                                 aij = 1{θij =0} .


  Latent structure modeling (Daudin et al., 2008)
  Spread the nodes on a set Q = {1, . . . , q, . . . , Q} of classes with
         α a Q–size vector giving αi = P(i ∈ q),
         ziq = 1{i∈q} are independent hidden variables Zi ∼ M(1, α),
         π a Q × Q matrix giving πq = P(aij = 1|i ∈ q, j ∈ ).

  Connexion probabilities depends on the node class belonging:

                                       aij |{Ziq Zj = 1} ∼ B(πq ).

SIMoNe: inferring Gaussian networks with latent clustering                  40
Estimation strategy



  Likelihoods
         the observed data: P(A|α, π) =                      Z P(A, Z|α, π).
         the complete data: P(A, Z|α, π).

  The     EM    criteria

                                        E log P(A, Z|α, π)|A .


         requires P(Z|A, α, π) which is not tractable!




SIMoNe: inferring Gaussian networks with latent clustering                     41
Variational inference


  Principle
  Approximate P(Z|A, α, π) by Rτ (Z) chosen to minimize

                                       KL(Rτ (Z); P(Z|A, α, π)),

  where Rτ is such as log Rτ (Z) = iq Ziq log τiq and τ are the
  variational parameters to optimize.


  Variational Bayes (Latouche et al.)
         Put appropriate priors on α and π,
         Give good performances especially for the choice of Q
         and is thus relevant in the SIMoNe context.



SIMoNe: inferring Gaussian networks with latent clustering         42
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   43
Network generation

  Let fix
         the number p = card(P) of nodes,
         if the graph is directed or not.


  Affiliation matrix A = (aij )i,j∈P
    1. usual MixNet framework
                 the Q × Q matrix Π, with πq = P(aij = 1|i ∈ q, j ∈ ),
                 the Q-size vector α with αq = P(i ∈ q).

    2. constraint MixNet version
                 the Q × Q matrix Π, with πq = card{(i, j) ∈ P × P : i ∈ q, j ∈ },
                 the Q-size vector α with αq = card({i ∈ P : i ∈ q})/p.



SIMoNe: inferring Gaussian networks with latent clustering                           44
Gaussian data generation
  The Θ matrix
    1. for undirected case, Θ is the concentration matrix
                 compute the normalized Laplacian of A,
                 generate a symmetric pattern of random signs.
    2. for directed case, Θ represents the VAR(1) parameters
                 generate random correlations for aij = 0,
                 normalized by the eigen-value with greatest modulus,
                 generate a pattern of random signs.

  The Gaussian sample X
    1. for undirected case,
                 compute Σ by pseudo-inversion of Θ ,
                 generate the multivariate Gaussian sample with Cholesky
                 decomposition of Σ .
    2. for directed case,
                 Θ permits to generate a stable VAR(1) process.
SIMoNe: inferring Gaussian networks with latent clustering                 45
Gaussian data generation
  The Θ matrix
    1. for undirected case, Θ is the concentration matrix
                 compute the normalized Laplacian of A,
                 generate a symmetric pattern of random signs.
    2. for directed case, Θ represents the VAR(1) parameters
                 generate random correlations for aij = 0,
                 normalized by the eigen-value with greatest modulus,
                 generate a pattern of random signs.

  The Gaussian sample X
    1. for undirected case,
                 compute Σ by pseudo-inversion of Θ ,
                 generate the multivariate Gaussian sample with Cholesky
                 decomposition of Σ .
    2. for directed case,
                 Θ permits to generate a stable VAR(1) process.
SIMoNe: inferring Gaussian networks with latent clustering                 45
Example 1: time-course data with star-pattern

  Simulation settings
    1. 50 networks with p = 100 edges, time series of length n = 100,
    2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
    3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.




SIMoNe: inferring Gaussian networks with latent clustering              46
Example 1: time-course data with star-pattern

      Simulation settings
        1. 50 networks with p = 100 edges, time series of length n = 100,
        2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
        3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.
      Boxplot of Precision values, without and with structure inference
0.8
0.6




                                                                          precision = TP/(TP+FP)
0.4
0.2




         precision wocl.BIC           precision wocl.AIC



SIMoNe: inferring Gaussian networks with latent clustering                                         46
Example 1: time-course data with star-pattern

      Simulation settings
       1. 50 networks with p = 100 edges, time series of length n = 100,
       2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
       3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.
      Boxplot of Recall values, without and with structure inference
1.0
0.8
0.6




                                                                               recall = TP/P (power)
0.4
0.2




         recall wocl.BIC   recall wcl.BIC   recall wocl.AIC   recall wcl.AIC


SIMoNe: inferring Gaussian networks with latent clustering                                             46
Example 1: time-course data with star-pattern

       Simulation settings
        1. 50 networks with p = 100 edges, time series of length n = 100,
        2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
        3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.
       Boxplot of Fallout values, without and with structure inference
0.04




                 ●                                    ●

                                   ●

                                                                        ●
                                   ●
                                                                        ●
                                   ●                                    ●
0.03




                                   ●                                    ●

                                   ●
                                                                        ●




                                                                                    fallout = FP/N (type I error)
0.02
0.01
0.00




          fallout wocl.BIC   fallout wcl.BIC   fallout wocl.AIC   fallout wcl.AIC


SIMoNe: inferring Gaussian networks with latent clustering                                                          46
Example 2: steady-state, multitask framework

  Simulating the tasks
    1. generate a “ancestor” with p = 20 node and K = 20 edges,
    2. generate T = 4 children by adding and deleting δ edges,
    3. generate T = 4 Gaussian samples.




                   Figure: ancestor and children with δ perturbations


SIMoNe: inferring Gaussian networks with latent clustering              47
Example 2: steady-state, multitask framework

  Simulating the tasks
    1. generate a “ancestor” with p = 20 node and K = 20 edges,
    2. generate T = 4 children by adding and deleting δ edges,
    3. generate T = 4 Gaussian samples.




                   Figure: ancestor and children with δ perturbations


SIMoNe: inferring Gaussian networks with latent clustering              47
Example 2: steady-state, multitask framework

  Simulating the tasks
    1. generate a “ancestor” with p = 20 node and K = 20 edges,
    2. generate T = 4 children by adding and deleting δ edges,
    3. generate T = 4 Gaussian samples.




                   Figure: ancestor and children with δ perturbations


SIMoNe: inferring Gaussian networks with latent clustering              47
Example 2: steady-state, multitask framework

  Simulating the tasks
    1. generate a “ancestor” with p = 20 node and K = 20 edges,
    2. generate T = 4 children by adding and deleting δ edges,
    3. generate T = 4 Gaussian samples.




                   Figure: ancestor and children with δ perturbations


SIMoNe: inferring Gaussian networks with latent clustering              47
Multitask: simulation results




           Precision/Recall curve                                  ROC curve
                 precision = TP/(TP+FP)                      fallout = FP/N (type I error)
                  recall = TP/P (power)                        recall = TP/P (power)




SIMoNe: inferring Gaussian networks with latent clustering                                   48
Multitask: simulation results

                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 25, δ = 1



SIMoNe: inferring Gaussian networks with latent clustering                                                                 48
Multitask: simulation results

                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 25, δ = 3



SIMoNe: inferring Gaussian networks with latent clustering                                                                 48
Multitask: simulation results

                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 25, δ = 5



SIMoNe: inferring Gaussian networks with latent clustering                                                                 48
Multitask: simulation results

                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 50, δ = 1



SIMoNe: inferring Gaussian networks with latent clustering                                                                 48
Multitask: simulation results

                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 50, δ = 3



SIMoNe: inferring Gaussian networks with latent clustering                                                                 48
Multitask: simulation results

                            penalty: λmax −→ 0                                        penalty: λmax −→ 0
                1.0




                                                                          1.0
                0.8




                                                                          0.8
                0.6




                                                                          0.6
    precision




                                                                 recall
                0.4




                                                                          0.4
                            CoopLasso                                                                        CoopLasso
                0.2




                                                                          0.2
                            GroupLasso                                                                       GroupLasso
                            Intertwined                                                                      Intertwined
                            Independent                                                                      Independent
                            Pooled                                                                           Pooled
                0.0




                                                                          0.0
                      0.0     0.2     0.4     0.6    0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                  fallout


                                                    Figure: nt = 50, δ = 5



SIMoNe: inferring Gaussian networks with latent clustering                                                                 48
Multitask: simulation results

                            penalty: λmax −→ 0                                         penalty: λmax −→ 0
                1.0




                                                                           1.0
                0.8




                                                                           0.8
                0.6




                                                                           0.6
    precision




                                                                  recall
                0.4




                                                                           0.4
                            CoopLasso                                                                         CoopLasso
                0.2




                                                                           0.2
                            GroupLasso                                                                        GroupLasso
                            Intertwined                                                                       Intertwined
                            Independent                                                                       Independent
                            Pooled                                                                            Pooled
                0.0




                                                                           0.0
                      0.0     0.2     0.4     0.6     0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                   fallout


                                                    Figure: nt = 100, δ = 1



SIMoNe: inferring Gaussian networks with latent clustering                                                                  48
Multitask: simulation results

                            penalty: λmax −→ 0                                         penalty: λmax −→ 0
                1.0




                                                                           1.0
                0.8




                                                                           0.8
                0.6




                                                                           0.6
    precision




                                                                  recall
                0.4




                                                                           0.4
                            CoopLasso                                                                         CoopLasso
                0.2




                                                                           0.2
                            GroupLasso                                                                        GroupLasso
                            Intertwined                                                                       Intertwined
                            Independent                                                                       Independent
                            Pooled                                                                            Pooled
                0.0




                                                                           0.0
                      0.0     0.2     0.4     0.6     0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                   fallout


                                                    Figure: nt = 100, δ = 3



SIMoNe: inferring Gaussian networks with latent clustering                                                                  48
Multitask: simulation results

                            penalty: λmax −→ 0                                         penalty: λmax −→ 0
                1.0




                                                                           1.0
                0.8




                                                                           0.8
                0.6




                                                                           0.6
    precision




                                                                  recall
                0.4




                                                                           0.4
                            CoopLasso                                                                         CoopLasso
                0.2




                                                                           0.2
                            GroupLasso                                                                        GroupLasso
                            Intertwined                                                                       Intertwined
                            Independent                                                                       Independent
                            Pooled                                                                            Pooled
                0.0




                                                                           0.0
                      0.0     0.2     0.4     0.6     0.8   1.0                  0.0    0.2    0.4      0.6   0.8     1.0
                                     recall                                                   fallout


                                                    Figure: nt = 100, δ = 5



SIMoNe: inferring Gaussian networks with latent clustering                                                                  48
Outline

  Statistical models
     Steady-state data
     Time-course data
     Multitask learning

  Algorithms and methods
     Overall view
     Network inference
     Model selection
     Latent structure

  Numerical experiments
    Performance on simulated data
    R package demo: the breast cancer data set


SIMoNe: inferring Gaussian networks with latent clustering   49
Breast cancer
 Prediction of the outcome of preoperative chemotherapy




  Two types of patients
  Patient response can be classified as
    1. either a pathologic complete response (PCR),
    2. or residual disease (not PCR).



  Gene expression data
         133 patients (99 not PCR, 34 PCR)
         26 identified genes (differential analysis)




SIMoNe: inferring Gaussian networks with latent clustering   50
Pooling the data




                              cancer data: pooling approach




   demo/cancer_pooled.swf
SIMoNe: inferring Gaussian networks with latent clustering    51
Multitask approach: PCR / not PCR




                     cancer data: graphical cooperative Lasso




   demo/cancer_mtasks.swf
SIMoNe: inferring Gaussian networks with latent clustering      52
Conclusions

  To sum-up
         SIMoNe embeds most state-of-the-art statistical methods for
         GGM inference based upon 1 -penalization,
         both steady-state and time course data can be dealt with,
         (hopefully) biologist-friendly R package.


  Perspectives
  Adding transversal tools such as
         network comparison,
         bootstrap to limit the number of false positives,
         more critieria to choose the penalty parameter,
         interface to Gene Ontology.

SIMoNe: inferring Gaussian networks with latent clustering             53
Publications

        Ambroise, Chiquet, Matias, 2009.
        Inferring sparse Gaussian graphical models with latent structure
        Electronic Journal of Statistics, 3, 205-238.
        Chiquet, Smith, Grasseau, Matias, Ambroise, 2009.
        SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics,
        25(3), 417-418.
        Charbonnier, Chiquet, Ambroise, 2010.
        Weighted-Lasso for Structured Network Inference from Time
        Course Data., SAGMB, 9.
        Chiquet, Grandvalet, Ambroise, arXiv preprint.
        Inferring multiple Gaussian graphical models.




SIMoNe: inferring Gaussian networks with latent clustering                   54
Publications

        Ambroise, Chiquet, Matias, 2009.
        Inferring sparse Gaussian graphical models with latent structure
        Electronic Journal of Statistics, 3, 205-238.
        Chiquet, Smith, Grasseau, Matias, Ambroise, 2009.
        SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics,
        25(3), 417-418.
        Charbonnier, Chiquet, Ambroise, 2010.
        Weighted-Lasso for Structured Network Inference from Time
        Course Data., SAGMB, 9.
        Chiquet, Grandvalet, Ambroise, arXiv preprint.
        Inferring multiple Gaussian graphical models.
        Working paper: Chiquet, Charbonnier, Ambroise, Grasseau.
        SIMoNe: An R package for inferring Gausssian networks with
        latent structure, Journal of Statistical Softwares.
        Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin.
        Biological analysis of breast cancer by multitasks learning.
SIMoNe: inferring Gaussian networks with latent clustering                   54

Contenu connexe

Plus de Laboratoire Statistique et génome

Plus de Laboratoire Statistique et génome (6)

Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
Sparsity by worst-case quadratic penalties
Sparsity by worst-case quadratic penaltiesSparsity by worst-case quadratic penalties
Sparsity by worst-case quadratic penalties
 
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-LassoSparsity with sign-coherent groups of variables via the cooperative-Lasso
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
 
Weighted Lasso for Network inference
Weighted Lasso for Network inferenceWeighted Lasso for Network inference
Weighted Lasso for Network inference
 
Gaussian Graphical Models with latent structure
Gaussian Graphical Models with latent structureGaussian Graphical Models with latent structure
Gaussian Graphical Models with latent structure
 
Multitask learning for GGM
Multitask learning for GGMMultitask learning for GGM
Multitask learning for GGM
 

Dernier

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Dernier (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

SIMoNe: Statistical Iference for MOdular NEtworks

  • 1. SIMoNe An R package for inferring Gausssian networks with latent clustering Julien Chiquet (and Camille, Christophe, Gilles, Catherine, Yves) ´ Laboratoire Statistique et Genome, ´ ´ ´ La genopole - Universite d’Evry SSB – 13 avril 2010 SIMoNe: inferring Gaussian networks with latent clustering 1
  • 2. Problem Inference n ≈ 10s/100s of slides g ≈ 1000s of genes Which interactions? O(g 2 ) parameters (edges) ! The main statistical issue is the high dimensional setting SIMoNe: inferring Gaussian networks with latent clustering 2
  • 3. Handling the scarcity of data (1) By reducing the number of parameters Assumption Connections will only appear between informative genes select p key genes P differential analysis p “reasonable” compared to n typically, n ∈ [p/5; 5p] the learning dataset inference n size–p vectors of expression (X1 , . . . , Xn ) with Xi ∈ Rp SIMoNe: inferring Gaussian networks with latent clustering 3
  • 4. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning How should we merge the data? organism drug 2 drug 1 drug 3 SIMoNe: inferring Gaussian networks with latent clustering 4
  • 5. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning by inferring each network independently organism drug 2 drug 1 drug 3 (1) (1) (1) (2) (2) (2) (3) (3) (3) (X1 , . . . , Xn1 ), Xi ∈ Rp1 (X1 , . . . , Xn2 ), Xi ∈ Rp2 (X1 , . . . , Xn3 ), Xi ∈ Rp3 inference inference inference SIMoNe: inferring Gaussian networks with latent clustering 4
  • 6. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning by pooling all the available data organism drug 2 drug 1 drug 3 (X1 , . . . , Xn ), Xi ∈ Rp , with n = n1 + n2 + n3 . inference SIMoNe: inferring Gaussian networks with latent clustering 4
  • 7. Handling the scarcity of data (2) By collecting as many observations as possible Multitask learning Go to learning by breaking the separability organism drug 2 drug 1 drug 3 (1) (1) (1) (2) (2) (2) (3) (3) (3) (X1 , . . . , Xn1 ), Xi ∈ Rp1 (X1 , . . . , Xn2 ), Xi ∈ Rp2 (X1 , . . . , Xn3 ), Xi ∈ Rp3 inference SIMoNe: inferring Gaussian networks with latent clustering 4
  • 8. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). G5 G4 G6 G2 G3 G7 G0 G1 G9 G8 SIMoNe: inferring Gaussian networks with latent clustering 5
  • 9. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). G5 G4 G6 G2 G3 G7 G0 G1 G9 G8 SIMoNe: inferring Gaussian networks with latent clustering 5
  • 10. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). B3 B2 B4 A3 B1 B5 A1 A2 C2 C1 SIMoNe: inferring Gaussian networks with latent clustering 5
  • 11. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). B3 B2 B4 A3 B1 B5 A1 A2 C2 C1 SIMoNe: inferring Gaussian networks with latent clustering 5
  • 12. Handling the scarcity of data (3) By introducing some prior Priors should be biologically grounded 1. few genes effectively interact (sparsity), 2. networks are organized (latent clustering), 3. steady-state or time-course data (directedness relies on the modelling). B3 B2 B4 A3 B1 B5 A1 A2 C2 C1 SIMoNe: inferring Gaussian networks with latent clustering 5
  • 13. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 6
  • 14. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 6
  • 15. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 6
  • 16. The graphical models: general settings Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between X(i) and X(j) if and only if or j non null partial correlation between X(i) and X(j) SIMoNe: inferring Gaussian networks with latent clustering 7
  • 17. The graphical models: general settings Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between X(i) and X(j) ? if and only if or j non null partial correlation between X(i) and X(j) SIMoNe: inferring Gaussian networks with latent clustering 7
  • 18. The graphical models: general settings Assumption A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp . Collecting gene expression 1. Steady-state data leads to an i.i.d. sample. 2. Time-course data gives a time series. Graphical interpretation i conditional dependency between Xt (i) and Xt−1 (j) ? if and only if or j non null partial correlation between Xt (i) and Xt−1 (j) SIMoNe: inferring Gaussian networks with latent clustering 7
  • 19. The general statistical approach Let Θ be the parameters to infer (the edges). A penalized likelihood approach ˆ Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z), Θ L is the model log-likelihood, Z is a latent clustering of the network, pen 1 is a penalty function tuned by λ > 0. It performs 1. regularization (needed when n p), 2. selection (sparsity induced by the 1 -norm), 3. model-driven inference (penalty adapted according to Z). SIMoNe: inferring Gaussian networks with latent clustering 8
  • 20. The general statistical approach Let Θ be the parameters to infer (the edges). A penalized likelihood approach ˆ Θλ = arg max L(Θ; data) − λ pen 1 (Θ, Z), Θ L is the model log-likelihood, Z is a latent clustering of the network, pen 1 is a penalty function tuned by λ > 0. It performs 1. regularization (needed when n p), 2. selection (sparsity induced by the 1 -norm), 3. model-driven inference (penalty adapted according to Z). SIMoNe: inferring Gaussian networks with latent clustering 8
  • 21. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 9
  • 22. The Gaussian model for an i.i.d. sample Let X ∼ N (0p , Σ) with X1 , . . . , Xn i.i.d. copies of X, X be the n × p matrix whose kth row is Xk , Θ = (θij )i,j∈P Σ−1 be the concentration matrix. Graphical interpretation Since corij|P{i,j} = −θij / θii θjj for i = j,   θij = 0 X(i) ⊥ X(j)|X(P{i, j}) ⇔ ⊥ or edge (i, j) ∈ network. /  Θ describes the undirected graph of conditional dependencies. SIMoNe: inferring Gaussian networks with latent clustering 10
  • 23. Neighborhood selection (1) Let Xi be the ith column of X, Xi be X deprived of Xi . θij Xi = Xi β + ε, where βj = − . θii ¨ Meinshausen and Bulhman, 2006 Since sign(corij|P{i,j} ) = sign(βj ), select the neighbors of i with 1 2 arg min Xi − Xi β 2 +λ β . β n 1 The sign pattern of Θλ is inferred after a symmetrization step. SIMoNe: inferring Gaussian networks with latent clustering 11
  • 24. Neighborhood selection (2) The pseudo log-likelihood of the i.i.d Gaussian sample is p n ˜ Liid (Θ; S) = log P(Xk (i)|Xk (Pi); Θi ) , i=1 k=1 n n n = log det(D) − Trace D−1/2 ΘSΘD−1/2 − log(2π), 2 2 2 where D = diag(Θ). Proposition Θpseudo = arg max Liid (Θ; S) − λ Θ ˆ λ ˜ 1 Θ:θij =θii has the same null entries as inferred by neighborhood selection. SIMoNe: inferring Gaussian networks with latent clustering 12
  • 25. The Gaussian likelihood for an i.i.d. sample Let S = n−1 X X be the empirical variance-covariance matrix: S is a sufficient statistic of Θ. The log-likelihood n n n Liid (Θ; S) = log det(Θ) − Trace(SΘ) + log(2π). 2 2 2 The MLE = S−1 of Θ is not defined for n < p and never sparse. The need for regularization is huge. SIMoNe: inferring Gaussian networks with latent clustering 13
  • 26. Penalized log-likelihood Banerjee et al., JMLR 2008 ˆ Θλ = arg max Liid (Θ; S) − λ Θ , 1 Θ efficiently solved by the graphical L ASSO of Friedman et al, 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefficients Liid (Θ; S) − λ PZ Θ 1 , where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally efficient). SIMoNe: inferring Gaussian networks with latent clustering 14
  • 27. Penalized log-likelihood Banerjee et al., JMLR 2008 ˆ Θλ = arg max Liid (Θ; S) − λ Θ , 1 Θ efficiently solved by the graphical L ASSO of Friedman et al, 2008. Ambroise, Chiquet, Matias, EJS 2009 Use adaptive penalty parameters for different coefficients ˜ Liid (Θ; S) − λ PZ Θ , 1 where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally efficient). SIMoNe: inferring Gaussian networks with latent clustering 14
  • 28. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 15
  • 29. The Gaussian model for time-course data (1) Let X1 , . . . , Xn be a first order vector autoregressive process Xt = ΘXt−1 + b + εt , t ∈ [1, n] where we are looking for Θ = (θij )i,j∈P and X0 ∼ N (0p , Σ0 ), εt is a Gaussian white noise with covariance σ 2 Ip , cov(Xt , εs ) = 0 for s > t, so that Xt is markovian. Graphical interpretation since cov (Xt (i), Xt−1 (j)|Xt−1 (Pj)) θij = , var (Xt−1 (j)|Xt−1 (Pj))   θij = 0 Xt (i) ⊥ Xt−1 (j)|Xt−1 (Pj) ⇔ ⊥ or edge (j i) ∈ network /  SIMoNe: inferring Gaussian networks with latent clustering 16
  • 30. The Gaussian model for time-course data (2) Let X be the n × p matrix whose kth row is Xk , S = n−1 Xn Xn be the within time covariance matrix, V = n−1 Xn X0 be the across time covariance matrix. The log-likelihood n Ltime (Θ; S, V) = n Trace (VΘ) − Trace (Θ SΘ) + c. 2 The MLE = S−1 V of Θ is still not defined for n < p. SIMoNe: inferring Gaussian networks with latent clustering 17
  • 31. Penalized log-likelihood Charbonnier, Chiquet, Ambroise, SAGMB 2010 ˆ Θλ = arg max Ltime (Θ; S, V) − λ PZ Θ 1 Θ where PZ is a (non-symmetric) matrix of weights depending on the underlying clustering Z. Major difference with the i.i.d. case The graph is directed: cov (Xt (i), Xt−1 (j)|Xt−1 (Pj)) θij = var (Xt−1 (j)|Xt−1 (Pj)) cov (Xt (j), Xt−1 (i)|Xt−1 (Pi)) = . var (Xt−1 (i)|Xt−1 (Pi)) SIMoNe: inferring Gaussian networks with latent clustering 18
  • 32. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 19
  • 33. Coupling related problems Consider T samples concerning the expressions of the same p genes, (t) (t) X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with covariance matrix S(t) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T arg max L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z). Θ(t) ,t=1...,T t=1 Breaking the separability Either by modifying the objective function or the constraints. SIMoNe: inferring Gaussian networks with latent clustering 20
  • 34. Coupling related problems Consider T samples concerning the expressions of the same p genes, (t) (t) X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with covariance matrix S(t) . Multiple samples setup Go to scheme Ignoring the relationships between the tasks leads to T arg max L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) , Z). Θ(t) ,t=1...,T t=1 Breaking the separability Either by modifying the objective function or the constraints. SIMoNe: inferring Gaussian networks with latent clustering 20
  • 35. Coupling related problems Consider T samples concerning the expressions of the same p genes, (t) (t) X1 , . . . , Xnt is the tth sample drawn from N (0p , Σ(t) ), with covariance matrix S(t) . Multiple samples setup Go to scheme Remarks Ignoring the relationships between the tasks leads to In the sequel, the Z is eluded for clarity (no loss of generality). T Multitask learning is easily adapted pen (Θ(t) , Z). data yet arg max L(Θ(t) ; S(t) ) − λ to time-course 1 only steady statet=1 Θ (t) ,t=1...,T version is presented here. Breaking the separability Either by modifying the objective function or the constraints. SIMoNe: inferring Gaussian networks with latent clustering 20
  • 36. Coupling problems through the objective function The Intertwined L ASSO T max ˜ ˜ L(Θ(t) ; S(t) ) − λ Θ(t) 1 Θ(t) ,t...,T t=1 ¯ S = n T nt S(t) is an “across-task” covariance matrix. 1 t=1 ˜ ¯ S(t) = αS(t) + (1 − α)S is a mixture between inner/over-tasks covariance matrices. setting α = 0 is equivalent to pooling all the data and infer one common network, setting α = 1 is equivalent to treating T independent problems. SIMoNe: inferring Gaussian networks with latent clustering 21
  • 37. Coupling problems by grouping variables (1) Groups definition Groups are the T -tuple composed by the (i, j) entries of each Θ(t) , t = 1, . . . , T . Most relationships between the genes are kept or removed across all tasks simultaneously. The graphical group-L ASSO T T 1/2 ˜ (t) (t) (t) 2 max L Θ ;S −λ θij . Θ(t) ,t...,T t=1 i,j∈P t=1 i=j SIMoNe: inferring Gaussian networks with latent clustering 22
  • 38. (2) (2) β2 =0 β2 = 0.3 1 1 Group-L ASSO penalty =0 (1) (1) β1 β1 −1 1 −1 1 Assume (2) β1 −1 (1) −1 (1) 2 tasks (T = 2) β2 β2 2 coefficients (p = 2) 1 1 Let represent the unit ball = 0.3 (1) (1) β1 β1 −1 1 −1 1 2 2 1/2 (2) (t) 2 β1 −1 −1 βi ≤1 (1) (1) β2 β2 i=1 t=1 SIMoNe: inferring Gaussian networks with latent clustering 23
  • 39. Coupling problems by grouping variables (2) Graphical group-L ASSO modification Inside a group, value are most likeliky sign consistent. The graphical cooperative-L ASSO T max ˜ L S(t) ; Θ(t) Θ(t) ,t...,T t=1  1/2 1/2  T T (t) 2 (t) 2   −λ θij + θij ,  + −  i,j∈P t=1 t=1 i=j where [u]+ = max(0, u) and [u]− = min(0, u). SIMoNe: inferring Gaussian networks with latent clustering 24
  • 40. (2) (2) β2 =0 β2 = 0.3 Coop-L ASSO penalty Assume 1 1 2 tasks (T = 2) =0 (1) (1) β1 −1 1 β1 −1 1 2 coefficients (p = 2) (2) β1 −1 −1 β2 (1) β2 (1) Let represent the unit ball 2 2 1/2 1 1 2 (t) βi + = 0.3 (1) (1) i=1 t=1 β1 β1 −1 1 −1 1 2 2 1/2 (2) (t) β1 −1 −1 + −βi ≤1 (1) (1) + β2 β2 i=1 t=1 SIMoNe: inferring Gaussian networks with latent clustering 25
  • 41. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 26
  • 42. The overall strategy Our basic criteria is of the form L(Θ; data) − λ PZ Θ 1 . What we are looking for the edges, through Θ, the correct level of sparsity λ, the underlying clustering Z with connectivity matrix πZ . What SIMoNe does 1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]} 2. Select G that maximizes an information criteria 3. Learn Z on the selected network G 4. Infer a family of networks with PZ ∝ 1 − πZ 5. Select GZ that maximizes an information criteria SIMoNe: inferring Gaussian networks with latent clustering 27
  • 43. The overall strategy Our basic criteria is of the form L(Θ; data) − λ PZ Θ 1 . What we are looking for the edges, through Θ, the correct level of sparsity λ, the underlying clustering Z with connectivity matrix πZ . What SIMoNe does 1. Infer a family of networks G = {Θλ : λ ∈ [λmax , 0]} 2. Select G that maximizes an information criteria 3. Learn Z on the selected network G 4. Infer a family of networks with PZ ∝ 1 − πZ 5. Select GZ that maximizes an information criteria SIMoNe: inferring Gaussian networks with latent clustering 27
  • 44. SIMoNe Suppose you want toSIMoNE recover a clustered network: Graph Target Adjacency Matrix Target Network SIMoNe: inferring Gaussian networks with latent clustering 28
  • 45. SIMoNe Start with microarray data SIMoNE Data SIMoNe: inferring Gaussian networks with latent clustering 28
  • 46. SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Data corresponding to G SIMoNe: inferring Gaussian networks with latent clustering 28
  • 47. SIMoNe SIMoNE SIMoNE without prior Adjacency Matrix Penalty matrix PZ Data corresponding to G Decreasing transformation Mixer πZ Connectivity matrix SIMoNe: inferring Gaussian networks with latent clustering 28
  • 48. SIMoNe SIMoNE SIMoNE without prior + Adjacency Matrix Adjacency Matrix Penalty matrix PZ Data corresponding to G corresponding to GZ Decreasing transformation Mixer πZ Connectivity matrix SIMoNe: inferring Gaussian networks with latent clustering 28
  • 49. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 29
  • 50. Monotask framework: problem decomposition Consider the following reordering of Θ Θii Θii Θii Θ= , Θi = . Θii θii θii Block coordinate descent algorithm arg max L(Θ; data) − λ pen 1 (Θ) Θ relies on p penalized, convex-optimization problems arg min f (β; S) + λ pen 1 (β), (1) β∈Rp−1 where f is convex and β = Θii for steady-state data. SIMoNe: inferring Gaussian networks with latent clustering 30
  • 51. Monotask framework: problem decomposition Consider the following reordering of Θ Θii Θii Θii Θ= , Θi = . Θii θii θii Block coordinate descent algorithm arg max L(Θ; data) − λ pen 1 (Θ) Θ relies on p penalized, convex-optimization problems arg min f (β; S, V) + λ pen 1 (β), (1) β∈Rp where f is convex and β = Θi for time-course data. SIMoNe: inferring Gaussian networks with latent clustering 30
  • 52. Monotask framework: algorithms 1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 ) starts from S + λIp positive definite, iterates on the columns of Θ−1 until stabilization, both estimation and selection of Θ. ˜ 2. steady-state: neighborhood selection (Liid (Θ) − λ Θ ) 1 select signs patterns of Θii with the L ASSO, only one pass per column required, post-symmetrization needed. 3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 ) select and estimate Θi with the L ASSO, only one pass per column required, both estimation and selection. SIMoNe: inferring Gaussian networks with latent clustering 31
  • 53. Monotask framework: algorithms 1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 ) starts from S + λIp positive definite, iterates on the columns of Θ−1 until stabilization, both estimation and selection of Θ. ˜ 2. steady-state: neighborhood selection (Liid (Θ) − λ Θ ) 1 select signs patterns of Θii with the L ASSO, only one pass per column required, post-symmetrization needed. 3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 ) select and estimate Θi with the L ASSO, only one pass per column required, both estimation and selection. SIMoNe: inferring Gaussian networks with latent clustering 31
  • 54. Monotask framework: algorithms 1. steady-state: Covsel/GLasso (Liid (Θ) − λ Θ 1 ) starts from S + λIp positive definite, iterates on the columns of Θ−1 until stabilization, both estimation and selection of Θ. ˜ 2. steady-state: neighborhood selection (Liid (Θ) − λ Θ ) 1 select signs patterns of Θii with the L ASSO, only one pass per column required, post-symmetrization needed. 3. time-course: VAR(1) inference (Ltime (Θ) − λ Θ 1 ) select and estimate Θi with the L ASSO, only one pass per column required, both estimation and selection. SIMoNe: inferring Gaussian networks with latent clustering 31
  • 55. Multitask framework: problem decomposition (1) Consider the (p T ) × (p T ) block-diagonal matrix C composed by the empirical covariance matrices of each tasks 0  (1)  S C=  .. ,  . 0 S(T ) and define Remark Let us consider multitask algorithms in the steady-state frame-  (1) Sii 0   (1)  S work (easily adapted to time-course data)  ii  ..  , Cii =  .  .   Cii =   .   .  . (T ) (T ) 0 Sii Sii The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we removed each line and each column pertaining to variable i. SIMoNe: inferring Gaussian networks with latent clustering 32
  • 56. Multitask framework: problem decomposition (1) Consider the (p T ) × (p T ) block-diagonal matrix C composed by the empirical covariance matrices of each tasks 0  (1)  S C=  .. ,  . 0 S(T ) and define (1) (1)     Sii 0 Sii .. =  . .     Cii =   .  , Cii   .  . (T ) (T ) 0 Sii Sii The (p − 1) T × (p − 1) T matrix Cii is the matrix C where we removed each line and each column pertaining to variable i. SIMoNe: inferring Gaussian networks with latent clustering 32
  • 57. Multitask framework: problem decomposition (2) Estimate the ith -columns of the T tasks bind together T arg max ˜ L(Θ(t) ; S(t) ) − λ pen 1 (Θ(t) ) Θ(t) ,t=1...,T t=1 is decomposed into p convex optimization problems arg min f (β; C) + λ pen 1 (β), β∈RT ×(p−1) (t) where we set β (t) = Θii and β (1)   β =  .  ∈ RT ×(p−1) .  .  . β (T ) SIMoNe: inferring Gaussian networks with latent clustering 33
  • 58. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). SIMoNe: inferring Gaussian networks with latent clustering 34
  • 59. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). For the graphical Intertwined L ASSO T pen 1 (β) = β (t) , 1 t=1 where the grouping effect is managed by the function f . SIMoNe: inferring Gaussian networks with latent clustering 34
  • 60. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). For the graphical Group-L ASSO p−1 [1:T ] pen 1 (β) = βi , 2 i=1 [1:T ] (1) (T ) where β i = βi , . . . , βi ∈ RT is the vector of the ith com- ponent across tasks. SIMoNe: inferring Gaussian networks with latent clustering 34
  • 61. Solving the sub-problem Subdifferential approach min L(β) = f (β) + pen 1 (β) , β∈RT ×(p−1) β is a minimizer iif 0p ∈ ∂β L(β), with ∂β L(β) = β f (β) + λ∂β pen 1 (β). For the graphical Coop-L ASSO p−1 [1:T ] [1:T ] pen 1 (β) = βi + −β i , + 2 + 2 i=1 [1:T ] (1) (T ) where β i = βi , . . . , βi ∈ RT is the vector of the ith com- ponent across tasks. SIMoNe: inferring Gaussian networks with latent clustering 34
  • 62. General active set algorithm: yellow belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1 = h pen 1 . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES A ← A{i} // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac ∂f (β) i ← arg max vj , where vj = min ∂βj + λν j∈Ac ν∈∂β gk j end SIMoNe: inferring Gaussian networks with latent clustering 35
  • 63. General active set algorithm: orange belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1 = h pen 1 . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES A ← A{i} // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac which violates the more the optimality conditions ∂f (β) i ← arg max vj , where vj = min ∂βj + λν j∈Ac ν∈∂β gk j if it exists such an i then A ← A ∪ {i} else Stop and return β, which is optimal end end SIMoNe: inferring Gaussian networks with latent clustering 35
  • 64. General active set algorithm: green belt // 0. INITIALIZATION β ← 0, A ← ∅ while 0 ∈ ∂β L(β) do / // 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO β A Find a solution h to the smooth problem h f (β A + h) + λ∂h pen 1 (β A + h) = 0, where ∂h pen 1 = h pen 1 . βA ← βA + h // 2. IDENTIFY NEWLY ZEROED VARIABLES ∂f (β) while ∃i ∈ A : βi = 0 and min ∂β + λν = 0 do ν∈∂β gk i i A ← A{i} end // 3. IDENTIFY NEW NON-ZERO VARIABLES // Select a candidate i ∈ Ac such that an infinitesimal change of βi provides the highest reduction of L ∂f (β) i ← arg max vj , where vj = min ∂β + λν ν∈∂β gk j j∈Ac j if vi = 0 then A ← A ∪ {i} else Stop and return β, which is optimal end end SIMoNe: inferring Gaussian networks with latent clustering 35
  • 65. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 36
  • 66. Tuning the penalty parameter What does the literature say? Theory based penalty choices √ 1. Optimal order of penalty in the p n framework: n log p Bunea et al. 2007, Bickel et al. 2009 2. Control on the probability of connecting two distinct connectivity sets Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009 practically much too conservative Cross-validation Optimal in terms of prediction, not in terms of selection Problematic with small samples: changes the sparsity constraint due to sample size SIMoNe: inferring Gaussian networks with latent clustering 37
  • 67. Tuning the penalty parameter BIC / AIC Theorem (Zou et al. 2008) ˆlasso ˆlasso df(βλ ) = βλ 0 Straightforward extensions to the graphical framework ˆ ˆ log n BIC(λ) = L(Θλ ; X) − df(Θλ ) 2 ˆ ˆ AIC(λ) = L(Θλ ; X) − df(Θλ ) Rely on asymptotic approximations, but still relevant for small data set ˜ Easily adapted to Liid , Liid , Ltime and multitask framework. SIMoNe: inferring Gaussian networks with latent clustering 38
  • 68. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 39
  • 69. MixNet ¨ ´ Erdos-Renyi Mixture for Networks The data is now the network itself Consider A = (aij )i,j∈P , the adjacency matrix associated to Θ: aij = 1{θij =0} . Latent structure modeling (Daudin et al., 2008) Spread the nodes on a set Q = {1, . . . , q, . . . , Q} of classes with α a Q–size vector giving αi = P(i ∈ q), ziq = 1{i∈q} are independent hidden variables Zi ∼ M(1, α), π a Q × Q matrix giving πq = P(aij = 1|i ∈ q, j ∈ ). Connexion probabilities depends on the node class belonging: aij |{Ziq Zj = 1} ∼ B(πq ). SIMoNe: inferring Gaussian networks with latent clustering 40
  • 70. Estimation strategy Likelihoods the observed data: P(A|α, π) = Z P(A, Z|α, π). the complete data: P(A, Z|α, π). The EM criteria E log P(A, Z|α, π)|A . requires P(Z|A, α, π) which is not tractable! SIMoNe: inferring Gaussian networks with latent clustering 41
  • 71. Variational inference Principle Approximate P(Z|A, α, π) by Rτ (Z) chosen to minimize KL(Rτ (Z); P(Z|A, α, π)), where Rτ is such as log Rτ (Z) = iq Ziq log τiq and τ are the variational parameters to optimize. Variational Bayes (Latouche et al.) Put appropriate priors on α and π, Give good performances especially for the choice of Q and is thus relevant in the SIMoNe context. SIMoNe: inferring Gaussian networks with latent clustering 42
  • 72. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 43
  • 73. Network generation Let fix the number p = card(P) of nodes, if the graph is directed or not. Affiliation matrix A = (aij )i,j∈P 1. usual MixNet framework the Q × Q matrix Π, with πq = P(aij = 1|i ∈ q, j ∈ ), the Q-size vector α with αq = P(i ∈ q). 2. constraint MixNet version the Q × Q matrix Π, with πq = card{(i, j) ∈ P × P : i ∈ q, j ∈ }, the Q-size vector α with αq = card({i ∈ P : i ∈ q})/p. SIMoNe: inferring Gaussian networks with latent clustering 44
  • 74. Gaussian data generation The Θ matrix 1. for undirected case, Θ is the concentration matrix compute the normalized Laplacian of A, generate a symmetric pattern of random signs. 2. for directed case, Θ represents the VAR(1) parameters generate random correlations for aij = 0, normalized by the eigen-value with greatest modulus, generate a pattern of random signs. The Gaussian sample X 1. for undirected case, compute Σ by pseudo-inversion of Θ , generate the multivariate Gaussian sample with Cholesky decomposition of Σ . 2. for directed case, Θ permits to generate a stable VAR(1) process. SIMoNe: inferring Gaussian networks with latent clustering 45
  • 75. Gaussian data generation The Θ matrix 1. for undirected case, Θ is the concentration matrix compute the normalized Laplacian of A, generate a symmetric pattern of random signs. 2. for directed case, Θ represents the VAR(1) parameters generate random correlations for aij = 0, normalized by the eigen-value with greatest modulus, generate a pattern of random signs. The Gaussian sample X 1. for undirected case, compute Σ by pseudo-inversion of Θ , generate the multivariate Gaussian sample with Cholesky decomposition of Σ . 2. for directed case, Θ permits to generate a stable VAR(1) process. SIMoNe: inferring Gaussian networks with latent clustering 45
  • 76. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. SIMoNe: inferring Gaussian networks with latent clustering 46
  • 77. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. Boxplot of Precision values, without and with structure inference 0.8 0.6 precision = TP/(TP+FP) 0.4 0.2 precision wocl.BIC precision wocl.AIC SIMoNe: inferring Gaussian networks with latent clustering 46
  • 78. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. Boxplot of Recall values, without and with structure inference 1.0 0.8 0.6 recall = TP/P (power) 0.4 0.2 recall wocl.BIC recall wcl.BIC recall wocl.AIC recall wcl.AIC SIMoNe: inferring Gaussian networks with latent clustering 46
  • 79. Example 1: time-course data with star-pattern Simulation settings 1. 50 networks with p = 100 edges, time series of length n = 100, 2. two classes, hubs and leaves, with proportions α = (0.1, 0.9), 3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise. Boxplot of Fallout values, without and with structure inference 0.04 ● ● ● ● ● ● ● ● 0.03 ● ● ● ● fallout = FP/N (type I error) 0.02 0.01 0.00 fallout wocl.BIC fallout wcl.BIC fallout wocl.AIC fallout wcl.AIC SIMoNe: inferring Gaussian networks with latent clustering 46
  • 80. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
  • 81. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
  • 82. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
  • 83. Example 2: steady-state, multitask framework Simulating the tasks 1. generate a “ancestor” with p = 20 node and K = 20 edges, 2. generate T = 4 children by adding and deleting δ edges, 3. generate T = 4 Gaussian samples. Figure: ancestor and children with δ perturbations SIMoNe: inferring Gaussian networks with latent clustering 47
  • 84. Multitask: simulation results Precision/Recall curve ROC curve precision = TP/(TP+FP) fallout = FP/N (type I error) recall = TP/P (power) recall = TP/P (power) SIMoNe: inferring Gaussian networks with latent clustering 48
  • 85. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 1 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 86. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 3 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 87. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 25, δ = 5 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 88. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 1 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 89. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 3 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 90. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 50, δ = 5 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 91. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 1 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 92. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 3 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 93. Multitask: simulation results penalty: λmax −→ 0 penalty: λmax −→ 0 1.0 1.0 0.8 0.8 0.6 0.6 precision recall 0.4 0.4 CoopLasso CoopLasso 0.2 0.2 GroupLasso GroupLasso Intertwined Intertwined Independent Independent Pooled Pooled 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 recall fallout Figure: nt = 100, δ = 5 SIMoNe: inferring Gaussian networks with latent clustering 48
  • 94. Outline Statistical models Steady-state data Time-course data Multitask learning Algorithms and methods Overall view Network inference Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set SIMoNe: inferring Gaussian networks with latent clustering 49
  • 95. Breast cancer Prediction of the outcome of preoperative chemotherapy Two types of patients Patient response can be classified as 1. either a pathologic complete response (PCR), 2. or residual disease (not PCR). Gene expression data 133 patients (99 not PCR, 34 PCR) 26 identified genes (differential analysis) SIMoNe: inferring Gaussian networks with latent clustering 50
  • 96. Pooling the data cancer data: pooling approach demo/cancer_pooled.swf SIMoNe: inferring Gaussian networks with latent clustering 51
  • 97. Multitask approach: PCR / not PCR cancer data: graphical cooperative Lasso demo/cancer_mtasks.swf SIMoNe: inferring Gaussian networks with latent clustering 52
  • 98. Conclusions To sum-up SIMoNe embeds most state-of-the-art statistical methods for GGM inference based upon 1 -penalization, both steady-state and time course data can be dealt with, (hopefully) biologist-friendly R package. Perspectives Adding transversal tools such as network comparison, bootstrap to limit the number of false positives, more critieria to choose the penalty parameter, interface to Gene Ontology. SIMoNe: inferring Gaussian networks with latent clustering 53
  • 99. Publications Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical models. SIMoNe: inferring Gaussian networks with latent clustering 54
  • 100. Publications Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical models. Working paper: Chiquet, Charbonnier, Ambroise, Grasseau. SIMoNe: An R package for inferring Gausssian networks with latent structure, Journal of Statistical Softwares. Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin. Biological analysis of breast cancer by multitasks learning. SIMoNe: inferring Gaussian networks with latent clustering 54